The emergence of compact language models is enabling powerful language models to run directly on smartphones without the need for internet connectivity. Six open-source LLMs, including Gemma 2B, Phi-2, Falcon-RW-1B, StableLM-3B, TinyLlama, and LLaMA-2-7B, have been optimized for mobile use. These models vary in size and performance, with some outperforming larger models on certain benchmarks.

Gemma 2B by Google and Phi-2 by Microsoft are highlighted for their impressive performance despite their small sizes. Falcon-RW-1B and StableLM-3B offer efficiency and balance between performance and model size. TinyLlama and LLaMA-2-7B are optimized for computational efficiency and can be quantized to lower bit-widths for mobile deployment.

These models leverage techniques like FlashAttention and RoPE positional embeddings to enhance efficiency while maintaining strong performance. They can be integrated into existing mobile apps with minimal changes and are suitable for on-device deployment on smartphones. While some models require devices with sufficient RAM, they offer a compelling option for developers looking to create intelligent language-based features that run directly on smartphones.

