Private LLM Private, Uncensored AI Chat for iPhone, iPad, and Mac

No Cloud, No Tracking, No Logins.

Run AI Offline on Your iPhone, iPad, and Mac

Private LLM runs entirely on your iPhone, iPad, or Mac. Your conversations never leave the device, and no internet is required after the first model download. No account, no tracking, no logs. One purchase unlocks the app across every Apple device you own and your Family Sharing group.
A close-up view of an iPhone screen displaying the interface of the Private LLM app, where a text prompt is entered into a chat-like interface, highlighting the app's ability to run sophisticated language models locally on the device for enhanced privacy and offline functionality

Run DeepSeek R1, Llama 3.3, Qwen3, and Gemma 3 Locally

Private LLM runs the leading open-source models directly on your Apple devices — DeepSeek R1 Distill, Llama 3.3 70B, Qwen3 4B, Phi 4, Google Gemma 3, and more. Every conversation stays on-device, and every model is quantized in-house for the best possible quality on your hardware.
Screenshot of the Private LLM app on an iPhone, displaying a user-friendly interface with a list of downloadable Large Language Models (LLMs) available for offline use, showcasing a variety of model names and descriptions, emphasizing the app's capability for personalized AI experiences while highlighting its privacy and offline functionality.

Local AI in Siri and Apple Shortcuts — No Code

Private LLM plugs directly into Siri and the Shortcuts app. Build AI-driven workflows that summarise text, generate writing, or pipe responses into any of the 70+ apps that support the x-callback-url specification. No code required.
An iPhone displaying the Private LLM app interface with an Apple Shortcut integration, showcasing a seamless user experience for personalizing AI interactions on iOS

One Purchase, No Subscription — Family Sharing for Six

Ditch the subscriptions for a smarter choice with Private LLM. A single purchase unlocks the app across all Apple platforms—iPhone, iPad, and Mac—while enabling Family Sharing for up to six relatives. This approach not only simplifies access but also amplifies the value of your investment, making digital privacy and intelligence universally available in your family.
Screenshot of the Private LLM interface on macOS, featuring a user typing a prompt into the application's text input field, ready to receive instant, offline responses from the local language model

AI Writing Tools Built Into macOS

Select any text in any macOS app, right-click, and Private LLM rewrites, summarises, or corrects it — entirely on-device. Supports English and major Western European languages.
Screenshot showing the Private LLM integration within the macOS system-wide services menu.

Built by Two Engineers, Not VCs

Private LLM is built by two engineers in the EU — bootstrapped, no VC funding, no growth-hacking roadmap. We are the only app on the App Store with OmniQuant and GPTQ quantization, which produce measurably better output than the RTN quantization used by MLX and llama.cpp wrapper apps like Ollama and LM Studio. We answer to users, not investors — which is why your data stays on-device and always will.
An iPhone displaying the Private LLM app interface with an Apple Shortcut integration, showcasing a seamless user experience for personalizing AI interactions on iOS

OmniQuant and GPTQ Quantization: Better Output, Less Memory

Private LLM uses OmniQuant and GPTQ quantization. When LLMs are quantized for on-device inference, outlier weight values hurt text generation quality. OmniQuant modulates outlier weights with a learnable, optimization-based clipping mechanism that minimizes quantization error. GPTQ uses approximate second-order (Hessian) information to minimize reconstruction error on the weights that matter most. The affine RTN quantization used by MLX-based apps like LM Studio, and the block-wise RTN variants used by llama.cpp-based apps like Ollama, skip this kind of per-weight optimization — which is why those apps produce lower-quality output on the same Apple hardware. We constantly explore advanced quantization methods, work that wrapper apps built on third-party inference engines cannot take on. OmniQuant and GPTQ paired with optimized model-specific Metal kernels let Private LLM deliver text generation that is both fast and high-quality on Apple hardware.

Download the Best Open Source LLMs

iOS

DeepSeek R1 Distill Based Models

For iPhones/iPads with 16GB+ RAM
DeepSeek R1 Distill Qwen 14B

Google Gemma 3 1B Based Models

For iPhones/iPads with 4GB+ RAM
Gemma 3 1B IT 💎Gemma 3 1B IT Abliterated (Uncensored)Amoral Gemma 3 1B v2 (Uncensored)

Google Gemma 2 2B Based Models

For iPhones/iPads with 4GB+ RAM
Gemma-2 2B IT 💎SauerkrautLM Gemma-2 2B IT

Qwen 2.5 Based Models

For iPhones/iPads with 8GB+ RAM
Qwen 2.5 Coder 7B

Qwen 2.5 14B Based Models

For iPhones/iPads with 16GB+ RAM
Qwen 2.5 Coder 14BEVA Qwen2.5 14B v0.2 (Role-Play/Story Writing)

Phi-3 Mini 3.8B Based Models

For iPhones/iPads with 6GB+ RAM
Phi-3 Mini 4K InstructKappa-3 Phi Abliterated (Uncensored)

Google Gemma Based Models

For iPhones/iPads with 8GB+ RAM
Gemma 2B IT 💎Gemma 1.1 2B IT 💎

Llama 2 7B Based Models

For iPhones/iPads with 6GB+ RAM
Airoboros l2 7b 3.0Spicyboros 7b 2.2 🌶️

H2O Danube Based Models

For iPhones/iPads with 4GB+ RAM
H2O Danube 1.8B Chat

StableLM 3B Based Models

For iPhones/iPads with 4GB+ RAM
StableLM 2 Zephyr 1.6B 🪁Nous-Capybara-3B V1.9Rocket 3B 🚀

TinyLlama 1.1B Based Models

For iPhones/iPads with 4GB+ RAM
TinyLlama 1.1B Chat 🦙TinyDolphin 2.8 1.1B Chat 🐬

Yi 6B Based Models

For iPhones/iPads with 6GB+ RAM
Yi 6B Chat 🇨🇳
macOS

DeepSeek R1 Distill Based Models

For Apple Silicon Macs with 32GB+ RAM
Fuse O1 DeepSeek R1 QwQ SkyT1 32BDeepSeek R1 Distill Qwen 32B Abliterated (Uncensored)

DeepSeek R1 Distill Based Models

For Apple Silicon Macs with 48GB+ RAM
DeepSeek R1 Distill Llama 70BR1 1776 Distill Llama 70B

Google Gemma 3 1B Based Models

For Apple Silicon Macs with 8GB+ RAM
Gemma 3 1B IT 💎Gemma 3 1B IT Abliterated (Uncensored)Amoral Gemma 3 1B v2 (Uncensored)

Phi-4 14B Based Models

For Apple Silicon Macs with 16GB+ RAM
Phi-4

Meta Llama 3.1 70B Based Models

For Apple Silicon Macs with 64GB+ RAM
Meta Llama 3.1 70B Instruct 🦙

Qwen 2.5 14B Based Models

For Apple Silicon Macs with 16GB+ RAM
Qwen 2.5 Coder 14BEVA Qwen2.5 14B v0.2 (Role-Play/Story Writing)

Google Gemma 2 2B Based Models

For Apple Silicon Macs with 8GB+ RAM
Gemma-2 2B IT 💎SauerkrautLM Gemma-2 2B IT

Phi-3 Mini 3.8B Based Models

For Apple Silicon Macs with 8GB+ RAM
Phi-3 Mini 4K InstructKappa-3 Phi Abliterated (Uncensored)

Google Gemma Based Models

For Apple Silicon Macs with 8GB+ RAM
Gemma 2B IT 💎Gemma 1.1 2B IT 💎

Mixtral 8x7B Based Models

For Apple Silicon Macs with 32GB+ RAM
Mixtral-8x7B-Instruct-v0.1Dolphin 2.6 Mixtral 8x7B 🐬Nous Hermes 2 Mixtral 8x7B DPO ☤

Llama 33B Based Models

For Apple Silicon Macs with 24GB+ RAM
WizardLM 33B v1.0 (Uncensored)

Llama 2 13B Based Models

For Apple Silicon Macs with 16GB+ RAM
Wizard LM 13BSpicyboros 13B 🌶️Synthia 13B 1.2XWin-LM-13BMythomax L2 13B

CodeLlama 13B Based Models

For Apple Silicon Macs with 16GB+ RAM
WhiteRabbitNeo-13B-v1

Llama 2 7B Based Models

For Apple Silicon Macs with 8GB+ RAM
airoboros-l2-7b-3.0Spicyboros 7b 2.2 🌶️Xwin-LM-7B v0.1

Solar 10.7B Based Models

For Apple Silicon Macs with 16GB+ RAM
Nous-Hermes-2-SOLAR-10.7B ☤

Phi-2 3B Based Models

For Apple Silicon Macs with 8GB+ RAM
Phi-2 Orange 🍊Phi-2 Orange Version 2 🍊Dolphin 2.6 Phi-2 (Uncensored) 🐬

StableLM 3B Based Models

For Apple Silicon Macs with 8GB+ RAM
StableLM Zephyr 3B 🪁

Yi 6B Based Models

For Apple Silicon Macs with 8GB+ RAM
Yi 6B Chat 🇨🇳

Yi 34B Based Models

For Apple Silicon Macs with 24GB+ RAM
Yi 34B Chat 🇨🇳
How Can We Help?

Whether you've got a question or you're facing an issue with Private LLM, we're here to help you out. Just drop your details in the form below, and we'll get back to you as soon as we can.