Sarvam AI's 24B LLM: Indian Languages & Reasoning | en

Sarvam AI, a Bengaluru-based startup, has recently launched a groundbreaking 24-billion-parameter large language model (LLM) meticulously crafted to excel in Indian languages and tackle intricate reasoning tasks, including mathematics and programming. This innovative model, christened Sarvam-M (with "M" signifying Mistral), represents a significant advancement in the realm of open-weights hybrid models. It builds upon the foundation of Mistral Small, a compact yet remarkably powerful open-source language model, enhancing its capabilities through specialized training and optimization techniques.

Sarvam-M: A Hybrid Approach to Language Modeling

Sarvam-M stands out due to its hybrid approach, combining the strengths of an open-source foundation with proprietary enhancements. This design philosophy enables Sarvam AI to leverage the collective knowledge and community support surrounding the Mistral Small model while simultaneously tailoring it to meet the specific needs of the Indian market. The model’s architecture and training methodologies are key to understanding its performance and capabilities. Sarvam AI’s decision to build upon an existing open-source model like Mistral Small is a strategic one, allowing them to benefit from prior research and development efforts. This also facilitates collaboration and transparency, as the open-source foundation allows for external scrutiny and contributions. The proprietary enhancements developed by Sarvam AI then layer on top of this foundation, adding unique capabilities and optimizations tailored to the Indian context. This hybrid approach strikes a balance between leveraging existing resources and developing proprietary technology, ultimately leading to a more robust and effective language model. Furthermore, the choice of Mistral Small as the base model is noteworthy. Mistral Small is recognized for its efficiency and performance, making it an ideal starting point for building a larger, more capable model. Its relatively small size also makes it easier to fine-tune and optimize, allowing Sarvam AI to achieve significant performance gains without requiring massive computational resources. The hybrid approach also allows for iterative development, where improvements to the open-source base model can be easily incorporated into Sarvam-M, ensuring that the model remains up-to-date with the latest advancements in the field. By combining the strengths of open-source and proprietary technologies, Sarvam AI has created a language model that is both powerful and adaptable.

Supervised Fine-Tuning: Precision and Accuracy

To elevate the model’s accuracy and precision, Sarvam AI employed a meticulous process of supervised fine-tuning. This involved training the model on a carefully curated dataset of examples specifically designed to improve its performance on a variety of tasks. By exposing the model to a diverse range of scenarios and providing it with clear, labelled data, the supervised fine-tuning process enables Sarvam-M to learn intricate patterns and relationships within the data, resulting in more accurate and reliable outputs. The selection and quality of the training data are crucial to the success of supervised fine-tuning. Sarvam AI likely invested significant effort in collecting, cleaning, and annotating the data used for this process. This involved identifying relevant datasets, removing noise and inconsistencies, and labeling the data with accurate and informative tags. The curated dataset likely included a wide range of examples covering various aspects of Indian languages, mathematics, and programming. This diversity ensures that the model is exposed to a broad range of scenarios and can generalize well to new and unseen data. The supervised fine-tuning process also involves carefully selecting the appropriate training parameters, such as the learning rate, batch size, and number of epochs. These parameters determine how quickly and effectively the model learns from the data. Sarvam AI likely experimented with different parameter settings to find the optimal configuration for Sarvam-M. Furthermore, the supervised fine-tuning process allows for targeted improvements in specific areas of performance. For example, if the model is struggling with a particular type of question-answering task, the training data can be augmented with more examples of that type of task. This allows for a more focused and efficient approach to improving the model’s performance. The use of labelled data in supervised fine-tuning also allows for easy monitoring of the model’s progress. The model’s performance can be evaluated on a validation set, and the results can be used to identify areas where further training is needed. This iterative process of training, evaluation, and refinement is essential for achieving high levels of accuracy and precision.

Reinforcement Learning with Verifiable Rewards: Decision-Making Prowess

In addition to supervised fine-tuning, Sarvam AI incorporated reinforcement learning with verifiable rewards to enhance the model’s decision-making capabilities. This technique involves training the model to learn from feedback tied to clear, measurable goals, such as correctly solving a mathematical problem. By rewarding the model for achieving these goals, the reinforcement learning process encourages it to make better decisions and optimize its performance over time. This approach is particularly effective for tasks that require complex reasoning and problem-solving skills. Reinforcement learning (RL) differs significantly from supervised learning in its approach to training. While supervised learning relies on labelled data to guide the model’s learning process, RL uses a reward signal to encourage the model to make decisions that lead to desired outcomes. In the context of Sarvam-M, this means training the model to solve problems by rewarding it for correct answers and penalizing it for incorrect ones. The use of verifiable rewards is crucial in this process. It ensures that the reward signal is accurate and unambiguous, allowing the model to learn effectively. For example, in the case of mathematical problems, the reward could be based on whether the model’s answer matches the correct solution. This verifiable reward provides a clear and objective measure of the model’s performance. The reinforcement learning process also involves carefully designing the environment in which the model is trained. This environment should include a diverse range of tasks and challenges that allow the model to learn and generalize its skills. For example, the environment could include a variety of mathematical problems, programming challenges, and reasoning tasks. The model’s performance in this environment is then used to adjust its parameters and improve its decision-making abilities. One of the key advantages of RL is its ability to handle complex tasks that require sequential decision-making. For example, solving a complex mathematical problem may involve a series of steps, each of which requires a decision to be made. RL allows the model to learn how to make these decisions in a way that maximizes its overall reward. Furthermore, RL can be used to improve the model’s robustness and adaptability. By exposing the model to a wide range of scenarios and challenges, it can learn to handle unexpected situations and adapt to new environments. This is particularly important for language models, which often encounter diverse and unpredictable inputs.

Optimized for Real-Time Use: Efficiency and Responsiveness

Recognizing the importance of real-time performance, Sarvam AI meticulously optimized Sarvam-M to respond more efficiently and accurately when generating answers, especially during real-time use. This involved fine-tuning the model’s architecture and algorithms to minimize latency and maximize throughput, ensuring that users can receive timely and relevant responses to their queries. The optimization efforts focused on reducing computational overhead and improving the model’s ability to handle concurrent requests, making it suitable for deployment in high-demand environments. Real-time performance is a critical factor for many applications of language models, such as chatbots, virtual assistants, and search engines. Users expect these applications to respond quickly and accurately, and any delay can lead to frustration and dissatisfaction. Therefore, Sarvam AI recognized the importance of optimizing Sarvam-M for real-time use. The optimization process likely involved several techniques, including model compression, quantization, and hardware acceleration. Model compression reduces the size of the model without significantly affecting its performance. This can be achieved by removing redundant or less important parameters from the model. Quantization reduces the precision of the model’s parameters, which can also reduce its size and improve its speed. Hardware acceleration leverages specialized hardware, such as GPUs, to accelerate the model’s computations. In addition to these techniques, Sarvam AI likely also optimized the model’s algorithms to minimize latency. This involved identifying and eliminating bottlenecks in the model’s processing pipeline. For example, the model’s attention mechanism, which is responsible for identifying the most relevant parts of the input sequence, may have been optimized to reduce its computational complexity. Sarvam AI also focused on improving the model’s ability to handle concurrent requests. This is important for applications that handle a large volume of traffic, such as search engines and social media platforms. The optimization efforts likely involved techniques such as batching and caching. Batching combines multiple requests into a single batch, which can be processed more efficiently. Caching stores frequently accessed data in memory, which can reduce the need to access slower storage devices. By combining these optimization techniques, Sarvam AI was able to significantly improve the real-time performance of Sarvam-M, making it suitable for deployment in high-demand environments.

Benchmarking Performance: Setting New Standards

Sarvam AI’s claim that Sarvam-M sets a new benchmark for models of its size in Indian languages and math and programming tasks is supported by extensive benchmarking data. The startup conducted rigorous evaluations of the model’s performance on a variety of standard benchmarks, comparing its results to those of other state-of-the-art models. The results of these evaluations demonstrate the significant improvements achieved by Sarvam-M in several key areas. Benchmarking is a crucial step in evaluating the performance of language models. It allows for an objective comparison of different models and provides insights into their strengths and weaknesses. Sarvam AI’s decision to conduct extensive benchmarking of Sarvam-M demonstrates their commitment to transparency and their confidence in the model’s performance. The benchmarking process likely involved selecting a set of standard benchmarks that are relevant to the model’s intended applications. These benchmarks should cover a wide range of tasks and scenarios, and they should be widely recognized within the AI community. Sarvam AI then evaluated Sarvam-M on these benchmarks and compared its results to those of other state-of-the-art models. This comparison allowed them to determine whether Sarvam-M sets a new benchmark for models of its size. In addition to comparing Sarvam-M to other models, Sarvam AI likely also conducted ablation studies to assess the impact of different components of the model on its performance. Ablation studies involve removing or modifying specific components of the model and then evaluating its performance on the benchmarks. This allows researchers to understand the contribution of each component to the overall performance of the model. The results of the benchmarking and ablation studies were then used to identify areas where the model could be further improved. This is an iterative process that involves continuously evaluating the model’s performance and making adjustments to its architecture and training procedures.

Indian Language Benchmarks: A 20% Average Performance Gain

According to the blog post released by SarvamAI, Sarvam-M exhibits major improvements over the base model, with average performance gains of 20% on Indian language benchmarks. This substantial improvement underscores the effectiveness of the supervised fine-tuning process in enhancing the model’s understanding and generation of Indian languages. The model’s ability to handle the nuances and complexities of these languages is crucial for its adoption and use in the Indian market. The specific benchmarks used to assess performance included tasks such as text classification, question answering, and machine translation, covering a diverse range of linguistic challenges. The 20% average performance gain on Indian language benchmarks is a significant achievement. It demonstrates that Sarvam AI’s efforts to tailor the model to the Indian market have been successful. The ability to understand and generate Indian languages is crucial for a wide range of applications, such as customer service, education, and content creation. The specific benchmarks used to assess performance likely included a variety of tasks that are relevant to these applications. Text classification involves categorizing text into different categories, such as sentiment analysis and topic detection. Question answering involves answering questions based on a given text. Machine translation involves translating text from one language to another. These tasks require a deep understanding of the nuances and complexities of Indian languages. Sarvam AI’s success in improving the model’s performance on these benchmarks demonstrates their expertise in Indian language processing. The model’s ability to handle these languages effectively will make it a valuable tool for businesses and organizations operating in the Indian market. Furthermore, the 20% performance gain suggests that the supervised fine-tuning process was highly effective. This process involved training the model on a carefully curated dataset of Indian language examples. The data was likely annotated with information about the language’s grammar, vocabulary, and cultural context. This allowed the model to learn the intricacies of Indian languages and improve its performance.

Math Tasks: A 21.6% Average Performance Gain

In addition to Indian languages, Sarvam-M also demonstrates impressive performance gains on math tasks, with an average improvement of 21.6%. This significant increase in accuracy and problem-solving ability highlights the effectiveness of the reinforcement learning with verifiable rewards technique in enhancing the model’s reasoning capabilities. The model’s ability to solve mathematical problems is essential for its application in areas such as financial modeling, scientific research, and data analysis. The benchmarks used to evaluate performance on math tasks included problems from various domains, such as algebra, calculus, and statistics. The model was assessed on its ability to not only provide correct answers but also to demonstrate its reasoning process and justify its solutions. A 21.6% average performance gain on math tasks is also a remarkable accomplishment. It indicates that Sarvam AI’s reinforcement learning techniques have been highly effective in improving the model’s reasoning abilities. The ability to solve mathematical problems is essential for a wide range of applications, such as financial modeling, scientific research, and data analysis. These applications require models that can not only perform calculations but also understand the underlying mathematical concepts. Sarvam AI’s success in improving the model’s performance on math tasks demonstrates their expertise in reasoning and problem-solving. The benchmarks used to evaluate performance likely included problems from various domains, such as algebra, calculus, and statistics. These problems require different types of reasoning skills, such as logical deduction, pattern recognition, and mathematical manipulation. The model was assessed on both its ability to provide correct answers and its ability to demonstrate its reasoning process. This is important because it ensures that the model is not simply memorizing answers but is actually understanding the underlying mathematical concepts. The model’s ability to justify its solutions is also crucial, as it allows users to trust its results and understand its reasoning.

Programming Tests: A 17.6% Average Performance Gain

Sarvam-M’s performance on programming tests is equally noteworthy, with an average gain of 17.6%. This improvement reflects the model’s ability to understand and generate code in various programming languages, making it a valuable tool for software developers and engineers. The model’s proficiency in programming is crucial for its application in areas such as code generation, bug detection, and automated testing. The benchmarks used to assess performance on programming tests included tasks such as code completion, code repair, and code generation from natural language descriptions. The model was evaluated on its ability to generate syntactically correct and semantically meaningful code that satisfies the given requirements. The 17.6% average performance gain on programming tests further highlights the capabilities of Sarvam-M. Sarvam AI successfully trained the model to improve its ability to understand and generate code in various programming languages, making it a valuable tool for software developers and engineers. Proficiency in programming enables the LLM to assist in code generation, bug detection, and automated testing, thereby increasing productivity and efficiency in software development workflows. The tasks comprising the programming tests likely included: Code completion, where the model predicts and suggests code snippets based on the context, code repair, where the model identifies and fixes errors within code, and code generation from natural language descriptions, where the model translates human-readable instructions into functional code. Evaluations focused on whether the generated code was syntactically correct and semantically meaningful, fulfilling the specific requirements of the task.

Combined Tasks: Exceptional Performance

The model performs even better on tasks that combine Indian languages and math, illustrating its versatility and ability to handle complex scenarios that require both linguistic and reasoning skills. For example, it achieved an 86% improvement on a romanized Indian language version of the GSM-8K benchmark. This remarkable improvement underscores the model’s ability to leverage its knowledge of both Indian languages and mathematical concepts to solve challenging problems. The GSM-8K benchmark is a widely used dataset that tests a model’s ability to solve grade school math problems expressed in natural language. The model’s performance on this benchmark demonstrates its ability to understand the problem statement, identify the relevant information, and apply the appropriate mathematical operations to arrive at the correct solution. The 86% improvement achieved by Sarvam-M is a testament to its advanced reasoning capabilities and its ability to handle complex, multi-faceted tasks. The ability to combine Indian languages and math effectively showcases the versatility of Sarvam-M. Achieving superior results in tasks necessitating both linguistic and logical capabilities highlights its competence in handling complex, real-world scenarios. The 86% upswing against the GSM-8K benchmark which employs math problems expressed in natural language indicates that Sarvam-M understands and processes intricate instructions blending math and language.

Comparison with Other Models: Sarvam-M Holds Its Own

Sarvam AI’s blog post draws comparisons between Sarvam-M and other prominent language models, emphasizing its competitive performance. This comparative analysis provides valuable insights into the model’s strengths and weaknesses, allowing users to make informed decisions about its suitability for their specific needs. The blog post highlights the fact that Sarvam-M outperforms Llama-2 7B on most benchmarks and is comparable to larger dense models like Llama-3 70B, and models like Gemma 27B, which are pre-trained on significantly more tokens. These comparisons underscore the efficiency of Sarvam-M’s training methodology and its ability to achieve competitive performance with a relatively smaller parameter size. The ability to achieve comparable performance with fewer parameters translates into lower computational costs and faster inference speeds, making Sarvam-M a more practical and accessible solution for many users. Comparing Sarvam-M to other models places its capabilities in context, helping potential users understand its relative strengths. The details underscore Sarvam AI’s dedication to delivering a model that rivals existing solutions. Outperforming Llama-2 7B emphasizes Sarvam-M’s edge over an established model. Showing comparability with larger models like Llama-3 70B and Gemma 27B is notable, indicating that Sarvam-M achieves competitive performance without demanding extensively larger resources during both training and inference.

English Knowledge-Based Benchmarks: Room for Improvement

Despite its impressive performance on Indian languages and reasoning tasks, Sarvam AI acknowledges that Sarvam-M still needs improvement in English knowledge-based benchmarks like MMLU. In these benchmarks, Sarvam-M performs about 1 percentage point lower than the baseline model. This slight dip in performance suggests that the model’s training data may have been biased towards Indian languages and reasoning tasks, resulting in a slightly weaker understanding of English knowledge. However, Sarvam AI is actively working to address this issue by incorporating more English language data into the model’s training set and by fine-tuning the model’s architecture to better handle English knowledge-based tasks. The company is committed to achieving parity with other state-of-the-art models on English language benchmarks, ensuring that Sarvam-M is a versatile and globally competitive language model. Having identified areas needing improvement enables Sarvam AI to target those specific challenges, rather than broadly retooling their model. Acknowledging a gap in English knowledge-based benchmarks reinforces Sarvam AI’s commitment to realistic assessment and iterative progress. Addressing this issue reveals a focus on global competitiveness by fine-tuning the model for superior execution regardless of language type.

Versatility and Applications: A Wide Range of Possibilities

Sarvam-M is built for versatility and designed to support a wide range of applications, including conversational agents, translation, and educational tools. Its ability to understand and generate Indian languages, coupled with its reasoning capabilities, makes it a valuable asset for businesses and organizations operating in the Indian market. Sarvam-M finds application across several industries due to its understanding of Indian languages combined with its sharp reasoning. The versatility of the AI is clear in the proposed applications. Businesses gain the ability to better interact with customers, reduce language obstacles, and provide new avenues for learning. The wide range of functions allows Sarvam-M a significant role in India’s technological progress.

Conversational Agents: Enhancing Customer Service

Sarvam-M can be used to power conversational agents that can interact with customers in their native languages, providing personalized and efficient customer service. These agents can handle a wide range of tasks, such as answering frequently asked questions, providing product information, and resolving customer complaints. By enabling customers to communicate in their preferred language, Sarvam-M can improve customer satisfaction and loyalty. The conversational agents powered by Sarvam-M can be deployed on various platforms, such as websites, mobile apps, and messaging platforms, providing customers with a seamless and convenient communication experience. Deploying Sarvam-M to bolster conversational agents drastically enhances customer engagement. By interacting within customers’ native languages, these agents provide tailored, efficient support. The ability to handle frequent questions, offer tailored product information and speedily resolve complaints enhances service quality, resulting in improved customer satisfaction and enduring loyalty.

Translation: Breaking Down Language Barriers

Sarvam-M’s translation capabilities can be used to break down language barriers and facilitate communication between people who speak different languages. The model can translate text and speech between English and various Indian languages, enabling businesses to expand their reach into new markets and individuals to connect with people from different cultures. The translation services powered by Sarvam-M can be integrated into various applications, such as document translation tools, website translation plugins, and real-time translation apps, providing users with seamless and accurate translation capabilities. By reducing communication obstacles, Sarvam-M’s capacity to translate fosters greater international cooperation. Its accurate translations promote efficient communication with those speaking diverse languages. Its incorporation into website plugins, document translation applications, and others offers immediate, precise translations for countless users.

Educational Tools: Personalized Learning Experiences

Sarvam-M can be used to develop educational tools that provide personalized learning experiences for students of all ages. The model can generate customized learning materials, provide feedback on student work, and answer student questions. By tailoring the learning experience to each student’s individual needs and learning style, Sarvam-M can improve student engagement and academic performance. The educational tools powered by Sarvam-M can be deployed on various platforms, such as online learning platforms, mobile apps, and interactive textbooks, providing students with access to personalized learning resources anytime, anywhere. Serving as the engine of customized learning resources, Sarvam-M accommodates students with custom feedback alongside educational materials, improving individual progress. Providing targeted answers to student questions promotes engagement and comprehension. The resources are accessible anywhere through platforms, such as mobile applications and virtual textbooks.

Access and Availability: Empowering Developers

Sarvam AI has made Sarvam-M readily accessible to developers and researchers, fostering innovation and collaboration within the AI community. The model is available for download on Hugging Face, a popular platform for sharing and accessing open-source AI models. Developers can also test the model on Sarvam AI’s playground, a web-based interface that allows users to experiment with the model’s capabilities and explore its potential applications. In addition, Sarvam AI offers APIs that allow developers to integrate Sarvam-M into their own applications and services. By providing easy access to the model and its associated tools, Sarvam AI is empowering developers to build innovative solutions that leverage the power of AI. Easy access to Sarvam-M gives both developers and researches the ability to innovate and team up within the AI community by deploying it alongside essential resources and tools. The model can be downloaded on Hugging Face, tested on an accessible interface, or incorporated into apps or services through APIs. Developers can harness AI power to create cutting edge solutions.

Future Plans: Building a Sovereign AI Ecosystem in India

Sarvam AI plans to release models regularly as part of its effort to build a sovereign AI ecosystem in India. This model is the first in that series of contributions. The company is committed to developing and deploying AI technologies that are aligned with the needs and values of the Indian people. By fostering a strong domestic AI industry, Sarvam AI aims to reduce India’s reliance on foreign technologies and promote economic growth and social development. The company’s vision is to create an AI ecosystem that is both innovative and inclusive, ensuring that all Indians have access to the benefits of AI. Sarvam AI’s commitment extends beyond the present as they lay a base for a sustainable, independent AI ecosystem within India. Future models will maintain focus on uniquely addressing India’s requirements as well as ethics.

In late April, the Indian government selected Sarvam to build the country’s sovereign LLM as part of the IndiaAI Mission, a national effort to strengthen domestic capabilities in emerging technologies. This selection underscores the government’s confidence in Sarvam AI’s ability to deliver on its vision of a sovereign AI ecosystem in India. The IndiaAI Mission is a comprehensive initiative that aims to promote research and development in AI, foster innovation and entrepreneurship, and create a skilled workforce to support the AI industry. By partnering with Sarvam AI, the government is taking a significant step towards achieving its goals and establishing India as a global leader in AI. The Indian government’s official selection underscores Sarvam AI’s integral role in developing India’s sovereign LLM as part of the larger national IndiaAI Mission to boost homegrown emerging technologies and promote AI innovation, research, as well as a skilled workforce. The government and Sarvam AI’s collaboration greatly expedites India’s ascension to worldwide AI domination.

updated at 2025-05-27

# LLM # Fine-Tuning # Mistral