Question 1

What is Private LLM?

Accepted Answer

Private LLM is your private AI chatbot, designed for privacy, convenience, and creativity. It operates entirely offline on your iPhone, iPad, and Mac, ensuring your data stays secure and confidential. Private LLM is a one-time purchase on the App Store, allowing you unlimited access without any subscription fees. nb: We hate subscriptions, and we aren’t hypocrites to subject our users to what we hate.

Question 2

How does Private LLM differentiate itself from other local AI solutions like Ollama, LM Studio, and others based on llama.cpp and MLX?

Accepted Answer

Firstly, Private LLM stands out from other local AI solutions through its advanced model quantization techniques like OmniQuant and GPTQ. Unlike the naive Round-To-Nearest (RTN) quantization used by other competing apps based on the MLX and llama.cpp frameworks, OmniQuant and GPTQ quantization are optimization based methods. These methods allow for more precise control over the quantization range, effectively maintaining the integrity of the original weight distribution. As a result, Private LLM achieves superior model performance and accuracy, nearly matching the performance of an un-quantized 16 bit floating point (fp16) model, but with significantly reduced computational requirements at inference time.

While the process of quantizing models with OmniQuant and GPTQ are computationally intensive, it's a worthwhile investment. This advanced approach ensures that the perplexity (a measure of model's text generation quality) of the quantized model remains much closer to that of the original fp16 model than is possible with the naive RTN quantization. This ensures that Private LLM users enjoy a seamless, efficient, and high-quality AI experience, setting us apart other similar applications.

Secondly, unlike almost every other competing offline LLM app, Private LLM isn’t based on llama.cpp or MLX. This means advanced features that aren’t available in llama.cpp and MLX (and by extension apps that use them) like attention sinks and sliding window attention are available in Private LLM, but unavailable[1] elsewhere. This also means that our app is significantly faster than competition on the same hardware (YouTube videos comparing performance).

Finally, we are machine learning engineers and carefully tune quantization and parameters in each model to maximize the text generation quality. For instance, we do not quantize the embeddings and gate layers in Mixtral models because quantizing them badly affects the model’s perplexity (needless to mention, our competition naively quantize everything). Similarly with the Gemma models, quantizing the weight tied embeddings hurts the model’s perplexity, so we don’t (while our competitors do). Also, on the Microsoft phi-4 model, we selectively keep a few critical layers unquantized (dynamic quantization) to maintain optimal text generation quality.

By prioritizing accuracy and computational efficiency without compromising on privacy and offline functionality, Private LLM provides a unique solution for iOS and macOS users seeking a powerful, private, and personalized AI experience.

Question 3

What devices can run Private LLM, and what are the recommended specifications?

Accepted Answer

Running large language models (LLMs) on-device is a memory-intensive process, as it requires significant RAM to load and execute models efficiently. Moreover, Private LLM usually isn’t the only app running on your iPhone, iPad, or Mac. Other apps, especially memory-hungry ones, can compete for system resources, impacting the performance of Private LLM.

On iPhones, older devices like the iPhone SE 2nd Gen (3GB RAM) can run smaller models such as Llama 3.2 1B, and Qwen 2.5 0.5B/1.5B, but the experience may be limited due to hardware constraints. Starting with the iPhone 12 (4GB RAM), performance improves with access to slightly larger 3B models. For the best experience, we recommend using the iPhone 15 Pro or newer, equipped with 8GB of RAM. These devices are capable of running larger models such as Llama 3.1 8B, or Qwen 2.5 7B with ease. While Private LLM can technically be installed on devices older than the iPhone 12, we no longer recommend purchasing the app for such devices, as user feedback has shown that outdated hardware significantly limits the experience. Users with older devices can still buy the app, but support and optimal performance are not guaranteed.

On iPads, the story is similar. Devices with at least 4GB RAM can run models comparable to those on mid-range iPhones. For the best results, the top-of-the-line iPad Pro with 16GB RAM is ideal, as it supports even larger models like Qwen 2.5 14B or Google Gemma 2 9B. This unmatched capability makes the iPad Pro a powerful choice for running Private LLM.

On Macs, the transition to Apple Silicon has set new benchmarks for local AI performance. Although Private LLM can be installed on Intel Macs, we strongly recommend using Apple Silicon based Macs for a significantly smoother experience. On Apple Silicon Macs with 8GB of RAM, you can run models comparable to those supported on the latest iPhones, such as Llama 3.1 8B and Qwen 2.5 7B. Macs with 16GB RAM, like the top-tier iPad Pro, can handle even larger models such as Qwen 2.5 14B or Google Gemma 2 9B. With 32GB RAM, Macs can run larger models like Phi-4, Qwen 2.5 32B, and for the ultimate experience, Apple Silicon Macs with at least 48GB RAM deliver optimal performance with models like Llama 3.3 70B.

Private LLM is designed to bring the power of local AI to a wide range of Apple devices, but for the best performance, we strongly recommend devices with more memory. If you’re still unsure about your device’s compatibility or need further assistance, join our Discord community to connect with us and other users who can help!

Question 4

Why is Private LLM not free?

Accepted Answer

Private LLM is a bootstrapped product built by two developers, free from VC funding. Our competitors like Ollama and LM Studio are VC backed companies. Some of them have onerous clauses in hidden in their terms of use that forbid usage for commercial or production purposes. We don’t impose any restrictions on how our users use our app. Our bootstrapped, one-time payment model isn’t perfect and has its downsides, like not being able to buy ads, influencer posts and gold checkmarks on Twitter. But the flip side of it is that we don’t have pressure from VCs to aggressively surveil and monetize our users; and we can focus 100% on building the product for our users and us.

At Private LLM, we prioritize quality and independence. To achieve superior performance, we carefully quantize every model using advanced techniques like OmniQuant and GPTQ. This process requires substantial resources, including renting GPUs, which are far from free. All our competitors use RTN (round to nearest) quantization, which is very cheap in terms of resources but results in poor quality quantized models. As a small, independent business, we spend a lot of time and resources on quantizing models with SOTA quantization algorithms, because it’s a worthwhile tradeoff in terms of quality. The result is an unparalleled AI experience that stands out in terms of accuracy and speed.

Privacy is another core value of Private LLM. We process everything locally on your device, with zero data collection or tracking. Making such a claim isn’t easy, especially when you're VC-backed and under pressure to find scalable revenue streams. By staying independent, we ensure your data always remains private.

Free products may seem enticing, especially when influencers shill them aggressively, but in reality, they often deliver inferior text generation quality. Private LLM takes a different approach, offering text generation that is leagues ahead in coherence, accuracy, and context. By charging a one-time fee, Private LLM provides an AI solution that is user-focused, privacy-first, and delivers high quality text output that our competitors cannot come anywhere close to.

Question 5

What languages does Private LLM support?

Accepted Answer

Private LLM offers a range of models to cater to diverse language needs. Our selection includes the Llama 3, Qwen 2.5 and Gemma 3 families, all supporting multiple languages. Llama 3 is proficient in English, German, French, Italian, Portuguese, Hindi, Spanish and Thai; Qwen 2.5 extends support to over 29 languages including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai and Arabic; and Gemma 3 supports more than 140 languages worldwide. For users seeking models tailored to specific non-English languages, Private LLM provides options such as SauerkrautLM Gemma-2 2B IT for German, DictaLM 2.0 Instruct for Hebrew, RakutenAI 7B Chat for Japanese, and Yi 6B Chat or Yi 34B Chat for Chinese. This diverse selection ensures that users can choose the model that best fits their language requirements.

Question 6

Why can't Private LLM load models directly from HuggingFace?

Accepted Answer

Private LLM ensures superior text generation quality and performance by utilizing advanced quantization strategies like OmniQuant and GPTQ, which take numerous hours to carefully quantize each model on GPUs. This meticulous process preserves the model's weight distribution more effectively, resulting in faster inference, improved model fidelity, and higher-quality text generation. Our 3-bit OmniQuant models outperform or match the performance of 4-bit RTN-quantized models used by other platforms. Unlike apps that support readily available GGUF files from Hugging Face, Private LLM quantizes models in-house, ensuring they are optimized for speed, accuracy, and quality. This rigorous approach is one of the reasons Private LLM is a paid app, offering much better quality compared to slower and less capable local AI chat apps.

Question 7

How can I request a model to be added?

Accepted Answer

We regularly add new models to Private LLM based on user feedback, as shown in our release notes. To request a specific model, join our Discord community and share your suggestion in the #suggestions channel. We review all requests and prioritize popular ones for future updates.

Question 8

Does Private LLM support reading documents or files?

Accepted Answer

Private LLM does not currently support reading documents or files, a feature often referred to as Retrieval-Augmented Generation (RAG). This functionality involves using external documents to enrich the model’s responses, but its effectiveness relies heavily on context length—the maximum amount of text the model can process in a single prompt. A longer context length allows for more detailed and accurate responses, but it is computationally demanding, particularly on local devices. Competitors like Ollama typically support a default context length of 2k tokens, while LM Studio defaults to a context length of 1500 tokens. In comparison, Private LLM provides 8K tokens on iPhone and iPad, and an impressive 32K tokens on Macs, making it one of the most capable local AI solutions in this regard. However, all current local AI implementations, including Private LLM, face challenges with hallucinations when processing long textual content. This limitation arises because models can generate incorrect or fabricated information when overwhelmed by extensive or incomplete input. Private LLM’s OmniQuant quantization significantly reduces hallucinations compared to RTN quantization used by our competition, but does not completely eliminate them. While we aim to introduce document-reading capabilities in the future, server-based solutions currently offer the most reliable results for RAG, as they are better equipped to handle larger context lengths and computational demands.

Question 9

Can Private LLM access the internet or real-time data?

Accepted Answer

Absolutely not. Private LLM is dedicated to ensuring your privacy, operating solely offline without any internet access for its functions or accessing real-time data. An internet connections is only required when you opt to download updates or new models, during which no personal data is collected or transmitted, exchanged or collected. Our privacy philosophy aligns with Apple's stringent privacy and security guidelines, and our app upholds the highest standards of data protection. It's worth noting that, on occasion, users might inquire if Private LLM can access the internet, leading to potential model hallucinations suggesting it can. However, these responses should not be taken as factual. If users would like to independently verify Private LLM’s privacy guarantees, we recommend using network monitoring tools like Little Snitch. This way, you can see for yourself that our app maintains strict privacy controls. For those interested in accessing real-time information, Private LLM integrates seamlessly with Apple Shortcuts, allowing you to pull data from RSS feeds, web pages, and even apps like Calendar, Reminders, Notes and more. This feature offers a creative workaround for incorporating current data into your interactions with Private LLM, while still maintaining its offline privacy-first ethos. If you have any questions or need further clarification, please don't hesitate to reach out to us.

Question 10

How can I access Private LLM on all my Apple devices?

Accepted Answer

After a one-time purchase, you can download and use Private LLM on all your Apple devices. The app supports Family Sharing, allowing you to share it with your family members.

Question 11

How can I use Private LLM for summarisation?

Accepted Answer

Private LLM can analyse and summarise lengthy paragraphs of text in seconds. Just paste in the content, and the AI will generate a concise summary, all offline. You could also use Private LLM for rephrasing and paraphrasing with prompts like:Give me a TLDR on this: [paste content here]You’re an expert copywriter. Please rephrase the following in your own words: [paste content]Paraphrase the following text so that it sounds more original: [paste content]

Question 12

Can Private LLM help with brainstorming or problem-solving?

Accepted Answer

Absolutely! Private LLM can generate insightful suggestions and ideas, making it a powerful tool for brainstorming and problem-solving tasks. Here are some example brainstorming prompts that you can try asking Private LLM. Please feel free to experiment and try out your own prompts.Can you give me some potential themes for a science fiction novel?I’m planning to open a vegan fast-food restaurant. What are the weaknesses of this idea?I run a two year old software development startup with one product that has PMF, planning on introducing a new software product in a very different market. Use the six hats method to analyse this.Utilise the Golden Circle Model to create a powerful brand for a management consulting business.

Question 13

What are the Sampling Temperature and Top-P settings and what they do?

Accepted Answer

Sampling temperature and Top-P are universal inference parameters for all autoregressive causal decoder only transformer (aka GPT) models, and are not specific to Private LLM. The app has them set to reasonable defaults (0.7 for Sampling temperature and 0.95 for Top-p), But you can always tweak them and see what happens. Please bear in mind that changes to these parameters do not take effect until the app is restarted.

These parameters control the tradeoff between deterministic text generation and creativity. Low values lead to boring but coherent response, higher values lead to creative but sometimes incoherent responses.

Question 14

How does Private LLM work?

Accepted Answer

Private LLM works offline and uses a decoder only transformer (aka GPT) model that you can casually converse with. It can also help you with summarising paragraphs of text, generating creative ideas, and provide information on a wide range of topics.

Question 15

Can I use Private LLM with the Shortcuts app?

Accepted Answer

Yes. Private LLM has two app intents that you can use with Siri and the Shortcuts app. Please look for Private LLM in the Shortcuts app. Additionally, Private LLM also supports the  x-callback-url  specification which is also  supported by Shortcuts  and many other apps. Here’s an  example shortcut  using the x-callback-url functionality in Private LLM.

Question 16

How can I store model weights on an external drive (DAS/NAS) in Private LLM for Mac?

Accepted Answer

Private LLM is a fully sandboxed macOS app, meaning models are stored within the app’s container at: ~/Library/Containers/ie.numen.personalgpt/Data/Library/Application Support/ie.numen.personalgpt/models. On macOS Sequoia and later, you can take advantage of the system’s ability to move large apps to an external disk. Moving the app will also move the models stored within it. This is the recommended method for freeing up space if your Mac’s internal storage is running low.

Question 17

Why am I having issues downloading models in Private LLM?

Accepted Answer

If you're experiencing difficulties downloading models in Private LLM, it’s often due to temporary connectivity issues with Hugging Face, where our models are hosted. You can quickly check the Hugging Face status page to see if there are any ongoing outages. In some cases, network restrictions from corporate, school, or national firewalls may limit access to Hugging Face, which can affect downloads. If you're on such a network, we recommend switching to a home Wi-Fi or mobile hotspot, or using a VPN to bypass these restrictions.

For users in China or Hong Kong, Private LLM automatically switches to hf-mirror.com to improve download reliability when your device locale is set to these regions. This helps ensure smoother access without additional setup on your end.

If you're still encountering issues after checking your connection, try restarting the app or your device. For ongoing problems, hop into our Discord community and share the details in the support channel—we’re always around to help troubleshoot!

Question 18

Why can't I run shortcuts in the background on iOS like I can on macOS with Private LLM?

Accepted Answer

The difference in functionality between iOS and macOS regarding background processing stems primarily from Apple's hardware usage policies. On iOS, Apple restricts background execution of tasks that require intensive GPU usage. This limitation is enforced to preserve battery life and maintain system performance. According to Apple's guidelines, apps attempting to run a Metal kernel in the background will be terminated immediately to prevent unauthorized resource use. For Private LLM, while we can run operations in the background on macOS leveraging the GPU, iOS versions are constrained to CPU processing when the app is not in the foreground. Running Private LLM's AI-driven tasks on the CPU is technically possible, but it would be significantly slower—more than 10× slower compared to GPU processing. This slow performance wouldn't provide the seamless, efficient user experience we strive for. We are hopeful that future updates to iOS might offer more flexibility in how background processes can utilize system resources, including potential GPU access for apps like Private LLM. Until then, we continue to optimize our iOS app within the current constraints to ensure you get the best possible performance without compromising the health of your device or the efficiency of your applications. For more technical details, you can refer to Apple's official documentation on preparing your Metal app to run in the background: Apple Developer Documentation.

iOS 26 introduces a new background GPU-access entitlement (com.apple.developer.background-tasks.continued-processing.gpu). However, it only activates following a direct user interaction (tap, swipe, etc.) from within the app, so it won't enable running local LLMs from Shortcuts entirely in the background.

Question 19

Why do models crash or fall back to built-in models during loading?

Accepted Answer

When attempting to load models in Private LLM, you might occasionally see an error message stating "Falling back on built-in model due to a crash while loading the model." This typically occurs when your device doesn't have enough available memory to load your selected model. These crashes can happen because your device has limited available RAM, other apps are using significant memory in the background, or multiple resource-intensive processes are running simultaneously. To resolve this issue, first try closing memory-intensive background apps. If the problem persists after restarting Private LLM, you can switch to a smaller model, restart your device to clear memory, or check if your device meets the recommended RAM requirements for the model. If you continue experiencing crashes after trying these solutions, consider using a smaller model that better matches your device's capabilities.

Question 20

Why does Private LLM abruptly stop generating text sometimes?

Accepted Answer

This could be due to the device running low on memory, or if the task given to Private LLM is particularly complex. In such cases, consider closing memory hungry apps that might be running in the background and try breaking down the request into smaller, more manageable tasks for the LLM to process. In the latter case, simply responding with “Continue”, “Go on” or “Tell me” also works.

Question 21

Can I use Private LLM on Android?

Accepted Answer

Yes, Private LLM is now available for Android as an early beta! While our primary focus remains on delivering the best AI experience for Apple devices, we're expanding to Android to serve more users who value privacy and offline AI. The Android version is currently available as a direct APK download (not yet on Google Play Store) and includes many of the core features from our iOS/macOS versions. As this is an early beta release, you may encounter some limitations or stability issues compared to our more mature Apple platform versions. We highly encourage you to join our Discord community to share your feedback, report any issues, and help us improve the Android experience. Your input at this stage is incredibly valuable as we work toward a full release on the Google Play Store. You can download the APK from our website.

Question 22

How can I request a refund for Private LLM?

Accepted Answer

We’re sorry to hear you’re considering a refund. You can request a refund through the Apple App Store. Simply navigate to your Apple account's purchase history, find Private LLM, and click on 'Report a Problem' to initiate the refund process. We would also love to hear from you about how we can improve. Please reach out to us with your feedback.

Question 23

Where can I ask more questions?

Accepted Answer

We would love to hear from you! Join our Discord community to share your thoughts and get support from other users. Prefer a private conversation? Use the contact form on our website to drop us an email directly.

Frequently Asked Questions