XDA spent the week pitting Gemma 4 against Qwen 3.5 on the same desktop, and the comments thread proved a quieter point: most readers already had a local-LLM stack running. The category has moved past hobbyist toys. Quantised 7B and 14B models are good enough for code review, summarisation, and routine writing, the new wave of mid-size MoE models has narrowed the gap to closed frontier models, and the apps that wrap llama.cpp have started to look like real software.
We tested 8 of the best apps for running local LLMs on Windows, macOS, and Linux. The benchmark was the boring stuff: how fast they get a model running on a Ryzen laptop, how cleanly they handle GPU offload on an RTX card, whether the chat UI is actually pleasant, and how much they break when the next big model drops. Pricing matters less than usual in this category since most of the strong options are free.
What to look for in a local-LLM app
A handful of criteria separate the tools that survive a week of daily use from the ones that get uninstalled:
- Backend portability. llama.cpp is the de facto runtime. Apps that wrap it cleanly get bug fixes for free. Apps that maintain their own forks lag behind on new model architectures.
- Quantisation support. GGUF is the format that actually ships. If an app cannot load a recent GGUF file, it is a dead end.
- GPU offload. CUDA on Nvidia, ROCm on AMD, Metal on Apple Silicon. The tools differ a lot in how much of each they actually use.
- Chat UI vs API. Some users want a polished chat window. Others want a local OpenAI-compatible endpoint to plug into editor extensions. The strongest apps offer both.
- Model discovery. Hugging Face is the catalogue. Apps that bake search and one-click download save real time.
- Privacy posture. Some of the apps run entirely offline. Others phone home for analytics by default and need a flag to stop.
Quick comparison
| App | Best for | Platforms | Free plan | Standout feature |
|---|---|---|---|---|
| Ollama | One-line installs and a CLI you can script | Windows, macOS, Linux | Yes (open source) | Drop-in OpenAI-compatible API on localhost |
| LM Studio | Polished chat UI with built-in model search | Windows, macOS, Linux | Yes (free for personal use) | Hugging Face integration with quant filtering |
| Jan | Fully open-source chat client that respects offline mode | Windows, macOS, Linux | Yes (open source) | No telemetry and a clean settings story |
| GPT4All | Lightweight chat for laptops without a GPU | Windows, macOS, Linux | Yes (open source) | CPU-first quants tuned for low-RAM machines |
| Msty | Multi-model split view for side-by-side comparisons | Windows, macOS, Linux | Yes (free tier) | Compare two local models in one window |
| Open WebUI | Self-hosted chat front end that runs in a browser | Docker (any OS) | Yes (open source) | Multi-user mode and per-chat model switching |
| Llamafile | Single executable per model, no installer | Windows, macOS, Linux | Yes (open source) | Run a model by double-clicking one file |
| Text Generation WebUI | Power-user playground with sampler tuning and extensions | Windows, macOS, Linux | Yes (open source) | Deepest control over generation parameters |
The 8 best apps for running local LLMs on desktop
1. Ollama — best one-line install for daily use
Ollama is the closest the category has to a default. A single installer drops a CLI and a background service, then ollama run llama3.2 pulls a quantised model and starts chatting. The same daemon exposes an OpenAI-compatible API on localhost:11434, which means every editor extension and notebook that speaks OpenAI works without changes. The model library covers most of the popular families with sensible default quants.
Where it falls short: The first-party UI is minimal. Ollama is a runtime, not a chat app, so you either talk to it from a terminal or pair it with a separate front end. Custom prompts and templates live in a Modelfile, which is powerful but adds a step.
Pricing:
- Free: open-source, no licence fee
- Paid: none
Platforms: Windows, macOS, Linux
Download: ollama.com
Bottom line: Pick Ollama for local LLMs if you want a backend that “just works” and you are happy to bring your own UI.
2. LM Studio — best chat UI with built-in model search
LM Studio is the polished chat client most people land on after they outgrow web demos. The model browser plugs directly into Hugging Face, filters by quant level and architecture, and shows whether a file will actually fit in your VRAM. The chat window supports system prompts, presets, multi-turn editing, and a local server mode that exposes the same OpenAI-compatible endpoint Ollama does.
Where it falls short: The licence allows free personal use but requires a paid plan for business contexts, which is worth knowing before you put it on a company laptop. The app is closed source.
Pricing:
- Free: personal use
- Paid: Work plan for business use
Platforms: Windows, macOS, Linux
Download: lmstudio.ai
Bottom line: Pick LM Studio for local LLMs if you want a single window that handles model discovery, quant selection, chat, and a local API.
3. Jan — best fully open-source chat client
Jan is what happens when a team builds LM Studio’s experience as open source from the ground up. The model store is curated, the chat UI is clean, and the project has a stated policy of running fully offline with no telemetry. The settings panel makes it obvious which switches affect network calls, which is unusual in this category.
Where it falls short: Performance lags LM Studio by a hair on the same hardware, partly because the team prioritises portability over hyper-specific GPU tuning. The mobile and remote-API stories are newer than the desktop chat.
Pricing:
- Free: open-source, no licence fee
- Paid: none
Platforms: Windows, macOS, Linux
Download: jan.ai
Bottom line: Pick Jan for local LLMs if you want LM Studio’s UX without the closed-source licence and without trusting an analytics opt-out toggle.
4. GPT4All — best for low-spec laptops without a GPU
GPT4All has been around since the early local-LLM scene and still does the boring work better than most. The default model list is tuned for CPU inference, the small quants run on machines without a dedicated GPU, and the chat UI now includes local document chat that points at a folder on disk. For users who tried to run a 7B model on an older laptop and bounced off the slowness, the curated small-model selection is the right starting point.
Where it falls short: GPU acceleration is supported but is not where the project’s focus sits. The chat UI is functional rather than beautiful.
Pricing:
- Free: open-source, no licence fee
- Paid: none
Platforms: Windows, macOS, Linux
Download: gpt4all.io
Bottom line: Pick GPT4All for local LLMs if your hardware is modest and you want a chat client that ships with models tuned for it.
5. Msty — best for comparing two models side by side
Msty is a less obvious pick that fills a specific gap: it can talk to two local models at once and show their answers side by side. Combined with hooks for remote APIs, this makes it the easiest way to benchmark a new Qwen release against a Gemma quant on the same prompt without juggling two windows. Knowledge stacks let you attach folders or URLs to a chat for retrieval.
Where it falls short: The free tier covers most personal use, but a few power features sit behind a paid plan. The model search is narrower than LM Studio’s.
Pricing:
- Free: feature-rich personal plan
- Paid: Aurum plan for advanced features
Platforms: Windows, macOS, Linux
Download: msty.app
Bottom line: Pick Msty for local LLMs if you actively compare models and want a chat client that was designed for that workflow.
6. Open WebUI — best browser front end for a shared family or team server
Open WebUI runs as a containerised web app and talks to Ollama (or any OpenAI-compatible backend) over the network. The interface looks like the ChatGPT web app, supports multi-user accounts with role-based access, and handles model switching per conversation. For a household or a small team that wants one local model server everyone can use from any browser, this is the cleanest answer.
Where it falls short: It assumes you already have Ollama (or equivalent) running somewhere. Multi-user features need a bit of setup. It is a browser app, so there is no native desktop polish.
Pricing:
- Free: open-source, no licence fee
- Paid: none
Platforms: Docker, accessible from any modern browser on Windows, macOS, or Linux
Download: openwebui.com
Bottom line: Pick Open WebUI for local LLMs if you want a shared chat front end for a home lab or a small team and you are comfortable running a container.
7. Llamafile — best zero-install option
Llamafile distributes a model and the llama.cpp runtime as a single executable that runs on Windows, macOS, and Linux without any setup. Download one file, double-click it, and a local chat UI opens in a browser. The format relies on a clever cross-platform binary trick from the Cosmopolitan project, which means the same file works across operating systems.
Where it falls short: No model browser. You manage models as files. Updates require swapping the executable. Some antivirus tools flag the binary, which is a recurring complaint in the GitHub issues.
Pricing:
- Free: open-source, no licence fee
- Paid: none
Platforms: Windows, macOS, Linux
Download: github.com/Mozilla-Ocho/llamafile
Bottom line: Pick Llamafile for local LLMs if you want the absolute lowest-ceremony way to share a working model with someone who has never heard of Hugging Face.
8. Text Generation WebUI — best power-user playground
Text Generation WebUI (sometimes called oobabooga) is the kitchen-sink option. Multiple backends, every sampler under the sun, an extensions system that adds RAG, character cards, voice, and image-grounded chat. Researchers and tinkerers who care about sampler tuning, contrastive decoding, and obscure quant formats land here.
Where it falls short: Setup is fiddlier than the other options on this list, with Python environments and CUDA toolkits in the mix. The UI is information-dense in a way that overwhelms casual users.
Pricing:
- Free: open-source, no licence fee
- Paid: none
Platforms: Windows, macOS, Linux
Download: github.com/oobabooga/text-generation-webui
Bottom line: Pick Text Generation WebUI for local LLMs if you want every knob exposed and you are comfortable in a Python environment.
How to pick the right one
If you want the simplest path to a working setup, install Ollama and pair it with a chat front end you like.
If you want one app that does everything in a polished window, install LM Studio.
If open source matters to you, install Jan.
If your laptop is older or has no GPU, install GPT4All and stick to its curated small models.
If you actively compare models, install Msty.
If you want a shared chat server for the household, run Open WebUI with Ollama behind it.
If you want zero ceremony, download a Llamafile for the model you care about.
If you want every knob, install Text Generation WebUI and budget an afternoon for the first run.
FAQ
Do local LLMs work on a laptop without a discrete GPU?
Yes. Quantised 3B and 7B models run on integrated graphics or pure CPU, slowly but usefully. GPT4All and Llamafile both ship small models tuned for this case.
How much VRAM do I need to run a local LLM?
For a comfortable experience with a 7B model at Q4 quantisation, around 6 GB of VRAM. For 14B at Q4, around 10 GB. For 70B class models, count on 24 GB or more, or split across CPU RAM and GPU at lower speeds.
Is Ollama the best app for local LLMs?
It is the best backend for most users. If you also want a polished chat UI in the same window, LM Studio or Jan is closer to “best app”. Ollama plus a separate UI is the most common stack.
Are local LLMs really private?
Yes, with one caveat. Inference runs entirely on your machine. The catch is that some apps phone home for analytics or update checks by default. Jan and GPT4All make the off switch obvious. LM Studio has it under settings.
Can I use a local LLM with my code editor?
Yes. Any app that exposes an OpenAI-compatible endpoint (Ollama, LM Studio, Jan, Msty) can be set as the base URL in editor extensions that target OpenAI. Continue, Cursor’s bring-your-own-key mode, and most VS Code extensions accept this.