Alibaba Cloud, the cloud division of the Chinese tech giant Alibaba, has released a new math-focused large language model (LLM) named Qwen2-Math, positioning it as the world’s most powerful AI model in mathematical problem-solving. This launch marks a significant development in the field of AI, particularly in the STEM sectors, where math capabilities are critical.
Qwen2-Math is part of the broader Qwen family of LLMs, which Alibaba began releasing in August 2023 under the sub-brand “Tongyi Qianwen.” The Qwen lineup includes several open-source models, such as Qwen-7B, Qwen-72B, and Qwen-1.8B, along with multimodal variants like Qwen-Audio and Qwen-VL. These models have garnered significant attention, especially in China, with over 90,000 enterprises adopting Qwen models in their operations within the first year.
The newly unveiled Qwen2-Math series, designed specifically for English language mathematical tasks, outperforms notable competitors, including OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Math-Gemini Specialized 1.5 Pro. The flagship model, Qwen2-Math-72B-Instruct, boasts an impressive 84% accuracy on the MATH Benchmark, which consists of 12,500 challenging competition-level math problems. It also excels on the GSM8K benchmark for grade school math, achieving a 96.7% score, and performs well in collegiate-level math with a 47.8% score.
Interestingly, while Qwen2-Math stands out in its performance, Alibaba did not include comparisons with Microsoft’s Orca-Math model in its benchmarks. The Orca-Math, a 7-billion parameter model, closely competes with the Qwen2-Math-7B-Instruct, scoring 86.81% compared to Qwen2-Math-7B-Instruct’s 89.9%. Despite the smaller size, the 1.5 billion parameter variant of Qwen2-Math still delivers strong results, scoring 84.2% on GSM8K and 44.2% on college math benchmarks.
Qwen2-Math’s advanced capabilities are aimed at solving complex mathematical problems, a challenging area for LLMs, which have historically struggled with math tasks despite the mathematical foundations of coding. Alibaba’s researchers express hope that Qwen2-Math will become a valuable tool in the community for tackling intricate mathematical challenges.
While Qwen2-Math is not entirely open-source, it offers a permissive licensing model for commercial use. Enterprises and individuals with less than 100 million monthly active users can use the model commercially without additional permissions, making it accessible to a wide range of users, from startups to large enterprises.
Featured Image courtesy of DALL-E by ChatGPT
Follow us for more updates on Qwen2-Math.