The significance of size in large language models (LLMs) is a crucial aspect, influencing their operational capabilities and environments. Recently, Stability AI, renowned for its pioneering Stable Diffusion text-to-image generative AI technology, unveiled one of its most compact models to date, the Stable LM 2 1.6B. This model marks a significant stride in the text content generation arena, building on the company’s previous launches of 3 billion and 7 billion parameter models in April 2023. The Stable LM 2 1.6B, introduced in 2024, is Stability AI’s second release of the year, following the earlier debut of Stable Code 3B.
This new iteration of Stable LM is not just compact but also robust, aiming to democratize access and engage a broader range of developers in the generative AI landscape. It incorporates multilingual capabilities, supporting seven languages including English, Spanish, German, Italian, French, Portuguese, and Dutch. The model reflects recent advancements in language modeling algorithms, achieving a harmonious balance between speed and performance, as envisioned by Stability AI.
Carlos Riquelme, the Head of the Language Team at Stability AI, shared insights with VentureBeat, stating, “Larger models trained on similar datasets and methodologies generally outperform smaller ones. However, as advancements are made in model algorithms and training data quality and quantity, newer smaller models occasionally surpass older, larger ones.”
The argument for the smaller Stable LM model’s superiority is grounded in its performance. It outshines other small language models with less than 2 billion parameters in most benchmarks. This includes comparisons with Microsoft’s Phi-2 (2.7B), TinyLlama 1.1B, and Falcon 1B. Impressively, the new Stable LM even surpasses some larger models, including Stability AI’s own previous Stable LM 3B model.
Riquelme elaborated, “The Stable LM 2 1.6B model has shown superior performance compared to some larger models developed a few months prior. This trend of improvement over time is akin to the evolution seen in computers, televisions, and microchips, which have become smaller, thinner, and more efficient.”
However, it is important to acknowledge the limitations inherent in the smaller size of Stable LM 2 1.6B. As Stability AI cautions, smaller, low-capacity language models may exhibit issues such as high rates of hallucination or the generation of potentially toxic language.
The shift towards smaller yet more potent LLM options has been a focus for Stability AI over recent months. In December 2023, the company released the StableLM Zephyr 3B model, offering enhanced performance in a more compact form than its initial April models.
Riquelme explained that the new Stable LM 2 models are trained on an expanded dataset, including multilingual documents in six languages in addition to English. An interesting aspect of this training process is the strategic presentation of different data types at various stages, potentially enhancing the model’s learning efficacy.
In a novel approach, Stability AI offers the new models in both pre-trained and fine-tuned formats, as well as a unique format described as the “last model checkpoint before the pre-training cooldown.” This strategy is aimed at providing developers with more versatile tools and foundations for innovation and customization.
Riquelme detailed the training process, explaining that as the model sequentially updates, its performance improves. Initially, the model starts with no knowledge, progressively learning from the data. However, as training nears completion, the model’s adaptability may decrease. To address this, Stability AI chose to release the model just before the final training stage, providing a more adaptable platform for specialization in various tasks or datasets.
“We’re not certain of the outcome, but we strongly believe in the community’s ability to creatively utilize these new tools and models,” Riquelme stated, expressing confidence in the innovative potential of the AI community.
Stability AI’s commitment to this endeavor reflects a broader trend in the AI industry, where the pursuit of smaller, more efficient models is paralleled by a focus on transparency and expanded data utilization. This approach signals a new era in AI development, where the emphasis is on creating more accessible, adaptable, and powerful tools, tailored to a wide array of applications and challenges in the field.