Run Local LLMs on iPhone or Mac Easily Using Private LLM


In the rapidly evolving world of artificial intelligence, Large Language Models (LLMs) have become powerful tools for a wide range of applications. However, concerns about privacy, security, and internet connectivity have left many users seeking a more secure and flexible solution. Private LLM is an innovative app that addresses these concerns by allowing users to run LLMs directly on their iPhone, iPad, and Mac, providing a secure, offline, and customizable on-device chatbot experience without an API key.

Private LLM offers a host of features that set it apart from other AI chatbot apps:

  1. Offline Functionality: Ensures privacy and security by processing data locally on your device, without an internet connection.
  2. Support for Open-Source AI Models: Provides access to a wide range of models, including Llama 3.2, Google Gemma 2, Phi-3, and several other finetunes of these model families, each with unique strengths and styles.
  3. Integration with iOS and macOS Features: Seamlessly integrates with Siri, Apple Shortcuts, and macOS services, allowing users to create powerful AI-driven workflows.
  4. One-Time Purchase: Provides a cost-effective solution with a single purchase price for all Apple devices, including Family Sharing for up to six family members. Users can download new models for free, with no subscription required. If you're not satisfied, you can request a refund through the App Store.
  5. Uncensored Models: We offer a selection of uncensored finetunes, including Llama 3.2 1B/3B Instruct Abliterated for older iPhones and Tiger Gemma 9B v3 or Llama 3.1 8B Instruct Abliterated for newer iOS devices. Additionally, models like Dolphin 2.9 Llama 3 8B 🐬, Dolphin 2.6 Phi-2 🐬, and TinyDolphin 2.8 1.1B Chat 🐬 provide a flexible and unrestricted AI chatbot experience, perfect for NSFW topics, roleplay scenarios, and other creative interactions. Check out the full list of available uncensored models on our models list.
  6. Superior Model Performance with OmniQuant Quantization: Private LLM outperforms other local AI solutions through its advanced model quantization technique known as OmniQuant. Unlike Ollama’s Round-To-Nearest (RTN) method, OmniQuant better preserves model weight distribution, leading to faster inference and improved text generation quality. In fact, Private LLM’s 3-bit OmniQuant models often match or exceed the performance of Ollama’s 4-bit RTN models, offering similar or better results in a more compact package. Unlike many competitors, Private LLM doesn’t rely on pre-made GGUF files from Hugging Face, instead custom-quantizing models using OmniQuant to ensure optimal performance. Learn more in our Ollama vs Private LLM comparison.

Getting Started with Private LLM for iOS and macOS

Private LLM is the best way to run on-device LLM inference on Apple devices, from the latest models to older ones. The app comes with built-in models that work well even on older devices, ensuring that all users can enjoy the benefits of local GPT. Users can download Private LLM directly from the App Store.

Screenshot of Phi-3 Mini 4K Instruct Running Locally on iPhone
Phi-3 Mini 4K Instruct Running on iPhone
Screenshot of Mistral 7B Instruct Running on Mac
Mistral 7B Instruct Running on Mac

Understanding System Prompts

A system prompt in an LLM is a set of instructions or guidelines that help steer the model's output in a desired direction. It allows users to tailor the AI's behavior for specific tasks or scenarios.

System Prompt Example on Private LLM for iPad
System Prompt Example on Private LLM for iPad

For example, if you wanted to use Private LLM as a creative writing assistant, you could provide a system prompt like: "You are an imaginative story writer. Expand on the user's writing prompts to create vivid, detailed scenes that draw the reader in. Focus on sensory details, character development, and plot progression."

Or if you needed help studying for a history exam, your prompt might be: "Act as an expert history tutor. When the user asks about a historical event or period, provide a concise summary of the key facts, dates, and figures. Then ask follow-up questions to test their understanding and help reinforce the material."

The image displays the system prompt screen of the Private LLM app on an iPad.
The intuitive system prompt screen lets you engage in dynamic conversations with your very own secure, on-device AI assistant

Common Use Cases for System Prompts Include:

  • Roleplaying, where the prompt sets the character and context for the AI to adopt
  • Task-oriented prompts that instruct the AI to perform actions like summarization, translation, code generation, or data analysis
  • Persona prompts to give the AI a consistent personality and communication style
  • Creative prompts for brainstorming ideas, writing poetry or jokes, or generating artwork descriptions

The possibilities are endless. By crafting targeted system prompts, users can get the most out of LLMs and adapt them to their unique needs. Experimenting with different prompts is a great way to explore the on-device capabilities of models like those in Private LLM.

Understanding Sampling Settings

Private LLM provides users with control over the sampling settings used by the AI model during text generation. The two main settings are Temperature and Top-P.

Temperature

Temperature is a value that ranges from 0 to 1 and controls the randomness of the model's output. A lower temperature (closer to 0) makes the output more focused and deterministic, which is ideal for analytical tasks or answering multiple-choice questions. A higher temperature (closer to 1) introduces more randomness, making the output more diverse and creative. This is better suited for open-ended, generative tasks like story writing or brainstorming.

For example, if you're using Private LLM to generate a poem, you might set the temperature to around 0.8 to encourage the model to come up with novel and imaginative lines. On the other hand, if you're asking the AI to solve a math problem, a temperature of 0.2 would ensure it stays focused on finding the correct answer.

Screenshot showing Private LLM's sampling setting on iPhone
Private LLM's Sampling Settings on iPhone

Top-P

Top-P, also known as nucleus sampling, is another way to control the model's output. It works by selecting the next token (word or subword) from the most probable options until the sum of their probabilities reaches the specified Top-P value. A Top-P of 1 means the model will consider all possible tokens, while a lower value like 0.5 will only consider the most likely ones that together make up 50% of the probability mass.

Top-P can be used in combination with temperature to fine-tune the model's behavior. For instance, if you're generating a news article, you might set the temperature to 0.6 and Top-P to 0.9. This would ensure the output is coherent and relevant (by limiting the randomness with temperature) while still allowing for some variation in word choice (by considering a larger pool of probable tokens with Top-P).

Private LLM Settings
Private LLM Settings

In general, it's recommended to adjust either temperature or Top-P, but not both at the same time. Experiment with different values to find the sweet spot for your specific use case. Keep in mind that the optimal settings may vary depending on the model and the type of task you're performing.

Choosing the Right AI Model for iPhone and iPad

When selecting a model in Private LLM for your iPhone or iPad, consider the capabilities of your device. Here’s what we recommend:

  • iPhone 15 Pro and newer: Llama 3.1 8B or Llama 3.2 3B or any other fine-tunes from these model families.
  • Older iPhones with 6GB RAM (e.g., iPhone 12 Pro, 13 Pro, 14 Pro): Meta Llama 3.2 3B, Google Gemma 2 2B, Phi-3 Mini 3.8B, Mistral 7B models.
  • Other older iPhones with at least 4GB RAM: Meta Llama 3.2 1B model family.
  • iPad Pro with 16GB RAM: Google Gemma 2 9B or any other finetunes from this model family.
Screenshot of Meta Llama 3 8B Instruct Running on iPad
Meta Llama 3 8B Instruct Running on iPad

Choosing the Right AI Model for Mac

For Mac users, Private LLM offers robust model support for various hardware configurations:

  • Macs with at least 8GB RAM: Meta Llama 3.2 1B/3B, Llama 3.1 8B, or Google Gemma 2 2B.
  • Macs with at least 16GB RAM: Google Gemma 2 9B or its uncensored finetune, Tiger Gemma 9B.
  • Macs with at least 32GB RAM: Mixtral 8x7B based models
  • Macs with 64GB or more RAM: Meta Llama 3.1 70B.

Choose a model based on your Mac’s specifications and desired use cases for optimal performance.

Understanding Context Length

Context length refers to the maximum amount of text that an LLM can process in a single input. It's important to choose a model with the appropriate context length for your needs.

iPhone or iPad

For iOS devices, if you plan to feed large amounts of text into Private LLM, opt for models like Llama 3 8B or Google Gemma, which offer a full 8K context. This allows the AI to understand and respond to longer passages, making it ideal for tasks like document analysis or extended conversations on your device. For older iOS devices, H2O Danube 1.8B provides a 4K context, while TinyLlama 1.1B offers a 2K context for more lightweight applications.

Mac

On Mac, the Mistral Instruct v0.2, Nous Hermes 2 Mistral 7B DPO, and BioMistral 7B models now support a full 32K context. This extended context length enables the AI to process and respond to even longer passages, making it perfect for tasks involving large documents, complex analyses, or in-depth discussions. For users who require a balance between context length and performance, models like Llama 3 8B or Google Gemma, with their 8K context, remain excellent choices.

Downloading New AI Models

To download new models in Private LLM, navigate to the app's settings and select the desired model. Keep in mind that model sizes vary, typically ranging from a few hundred megabytes to several gigabytes.

Due to iOS limitations, you must keep the app open during the download process on your iPhone or iPad, as background downloads are not supported. Additionally, users can only download one model at a time on iOS devices. These limitations do not apply to Mac.

Be patient, as larger models may take some time to download fully. Remember, with Private LLM's one-time purchase, you can download new models for free forever, with no subscription required.

Downloadable LLM Models on Private LLM for iPad
Downloadable LLM Models on Private LLM for iPad
Downloadable LLM Models on Private LLM for Mac
Downloadable LLM Models on Private LLM for Mac

Integrating with iOS and macOS Features and Custom Workflows

Private LLM's integration with Apple Shortcuts is one of its most powerful features. Users can automate tasks and create custom workflows by combining Private LLM with this built-in app. This integration allows users to incorporate on-device AI into their daily routines and develop personalized, efficient solutions for various tasks.

Screenshot of iPhone showcasing shortcuts created with Private LLM
Shortcuts created with Private LLM on iPhone
Screenshot of Mac showcasing shortcuts created with Private LLM
Shortcuts created with Private LLM on Mac

By crafting specific prompts and integrating them with Apple Shortcuts, users can guide the AI model to produce desired outputs. This technique, known as prompt engineering, enables the creation of powerful workflows tailored to individual needs. For example, the "Ask Llama 3" shortcut demonstrates a simple way to interact with a private on-device AI. Users can dictate their query in English, which is processed by the "Dictate Text" action and sent to Private LLM via the "Start a new chat with Private LLM" action. The AI's response can be displayed instantly or copied to the clipboard for later use. This user-friendly shortcut showcases the seamless integration of voice commands and AI chatbot functionality, making it an excellent introduction to the potential of combining these technologies.

Example of Ask Llama 3 Shortcut on iPhone
Ask Llama 3 Shortcut Example on iPhone
Screenshot of Shortcuts called unChatGPTfy created using Private LLM
unChatGPTfy Shortcuts created using Private LLM on Mac

With Private LLM's support for various models, users can experiment with different prompts and settings to find the perfect combination for their needs, creating workflows such as generating meeting summaries from voice recordings, translating text between languages, creating personalized workout plans, and analyzing emails for better organization. The possibilities are vast, and users can unlock the full potential of on-device AI in their daily lives through Private LLM and Apple Shortcuts.

Additionally, Private LLM supports the popular x-callback-url specification, which is compatible with over 70 popular iOS and macOS applications. Private LLM can be used to seamlessly add on-device AI functionality to these apps. Furthermore, Private LLM integrates with macOS services to offer grammar and spelling checking, rephrasing, and shortening of text in any app using the macOS context menu.

Screenshot showing the Private LLM integration within the macOS system-wide services menu.
Private LLM integration within the macOS system-wide services menu

Simplify Prompt Management with Apple Shortcuts in Private LLM

For users familiar with ChatGPT's custom GPTs, we recommend creating Apple Shortcuts for your frequently used prompts in Private LLM. You can even copy and paste prompts from your custom GPTs on ChatGPT directly into Apple Shortcuts for ease of use. For inspiration, explore the community's shared shortcuts here.

Uncensored AI Models

We provide uncensored finetunes of popular models, allowing users to select models best suited to their device specifications. For users with older iPhones, we recommend Llama 3.2 1B/3B Instruct Abliterated for a smooth performance. For those using newer iOS devices, Tiger Gemma 9B v3 and Llama 3.1 8B Instruct Abliterated deliver excellent results. Explore our full list of uncensored or Abliterated models to find one that meets your needs.

Uncensored AI models can be highly versatile for a variety of unique and personalized use cases. Here are a few common scenarios where these models shine:

  1. Roleplay and Creative Writing: Uncensored models are popular for their ability to engage in imaginative storytelling, in-depth character roleplay, or generating detailed fictional narratives that unrestricted models often cannot. This makes them ideal for users interested in writing assistance, script development, or other creative work.

  2. NSFW and Adult Conversations: Some users seek uncensored models for unrestricted conversations on sensitive or adult topics. These models are useful for those who prefer or require AI that doesn’t filter or limit discussions based on certain content categories.

  3. Psychology and Emotive Interactions: For those exploring therapeutic prompts, emotional support, or psychology-based interactions, uncensored models offer greater flexibility and depth without automatic content filters. This may be helpful in providing nuanced responses to personal, philosophical, or existential questions.

  4. Exploratory Research and Thought Experimentation: Uncensored models can be particularly helpful for brainstorming, generating unfiltered opinions or perspectives, and diving into controversial topics for research purposes. Scholars, writers, and content creators may find these models helpful in testing boundaries and gaining fresh perspectives on complex ideas.

Screenshot of Dolphin 2.9 Llama 3 8B Uncensored running on iPhone
Dolphin 2.9 Llama 3 8B Uncensored running on iPhone
Screenshot of Dolphin 2.6 Mixtral 8x7b Uncensored running on Mac
Dolphin 2.6 Mixtral 8x7b Uncensored running on Mac

Note: Uncensored models offer a richer, less-filtered experience but may produce unsuitable or offensive content. Use them responsibly and ethically, and be aware of the potential risks associated with uncensored AI models.

Regional Language Model Recommendations

To better serve users across the globe, Private LLM also provides models for various languages. Llama 3.2 officially supports a range of languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, making it an excellent choice for multilingual applications.

In addition, Private LLM offers specific models tailored for non-English languages:

These language models further enhance Private LLM's utility, providing a more native conversational experience for speakers of German, Hebrew, Japanese, and Chinese.

Download Private LLM to Run LLMs Locally on iPhone, iPad, and Mac

Private LLM is the best way to run on-device LLM inference on Apple devices, providing a secure, offline, and customizable experience without an API key. With its support for over 30 models, seamless integration with iOS and macOS features, and the ability to create powerful custom workflows through prompt engineering, Private LLM empowers users to harness the full potential of on-device AI. Whether you're using the built-in model on an older device or exploring uncensored conversations with Dolphin models for NSFW content and AI roleplay, Private LLM offers an unparalleled on-device GPT.


Download Private LLM on the App Store
Stay connected with Private LLM! Follow us on X for the latest updates, tips, and news. Want to chat with fellow users, share ideas, or get help? Join our vibrant community on Discord to be part of the conversation.