Malaysia's Open-Source AI Opportunity

The dawn of the DeepSeek R1 large language model (LLM) earlier this year signaled a transformative moment for generative artificial intelligence (Gen AI). This event marked a significant leap forward, not just technologically, but also from a commercial and strategic standpoint. DeepSeek demonstrated that sophisticated LLMs could be developed at significantly lower costs than previously believed, and crucially, that this innovation wasn’t confined to Silicon Valley.

The emergence of DeepSeek presents profound implications for Malaysia’s AI ecosystem, extending beyond the ongoing tech rivalry between the US and China, and the temporary cooling of the Malaysian stock market’s data center enthusiasm.

The Significance of Open Source

A key aspect of DeepSeek’s LLMs is their foundation in open-source technology. Models like DeepSeek R1 are available under open-source or open-weight licenses, which means they can be freely downloaded, modified, and used. This open-source nature has substantial implications for the evolution and commercialization of LLMs.

For years, Chinese tech giants like Baidu, Alibaba, and Tencent have been actively developing open-source AI models. This strategy, backed by Chinese universities and government initiatives, adopts an "open innovation" approach, aiming to accelerate research and development, and potentially surpass the United States in AI capabilities.

However, the commitment to open-source AI extends beyond China. Meta and Google have also released open-weight LLMs, motivated by competitive factors. The rationale behind this is rooted in the business strategy of “commoditizing the complement”. If a company relies heavily on Gen AI, it may be more beneficial to invest in open-source alternatives rather than relying solely on proprietary models like ChatGPT. Even if proprietary LLMs are still utilized, the availability of good open-source models weakens the pricing power of key vendors like OpenAI.

This strategy mirrors actions taken by Oracle, a producer of servers and networking equipment. Oracle supported the open-source Linux operating system to curb the pricing dominance of Microsoft’s Windows OS.

Regardless of the motivations, the availability of high-quality, open-weight LLMs significantly reduces costs for countries like Malaysia, opening new doors for innovation.

Benefits for Government and Businesses

For the Malaysian government, open-source LLMs offer the opportunity to operate their own AI models without the need to transfer sensitive data to commercial third parties or foreign nations. This strengthens data autonomy and sovereignty. They gain control over their data, ensuring it remains within national borders and under their jurisdiction. This is especially critical when dealing with sensitive government information, citizen data, and national security matters. Utilizing open-source LLMs allows for greater transparency and auditability. The government can inspect the model’s code, understand how it works, and identify potential vulnerabilities or biases. This level of control is not typically available with proprietary models.

Moreover, it fosters the growth of local talent in AI. By engaging with open-source AI, it provides opportunities for local experts to be trained, collaborate, and contribute to the development of this technology.

For Malaysian companies, especially startups, open-weight LLMs create a level playing field. They can access the same fundamental LLMs as their counterparts in China and the US, fostering innovation and competition. Startups often face resource constraints, and the high cost of proprietary LLMs can be a significant barrier to entry. Open-source LLMs remove this barrier, enabling these companies to experiment, innovate, and develop AI-powered products and services without incurring exorbitant expenses.

Using an Open source allows companies to modify LLMs, tailoring them to meet specific needs. This enables Malaysian businesses to build applications that are better suited to their particular market or industry. Open-source LLMs can empower companies to develop highly specialized AI applications that provide a competitive advantage.

Addressing Cultural and Political Biases

The rise of Chinese AI also spotlights a significant challenge: cultural and political bias. Chinese LLMs are often trained to reflect the Chinese Communist Party (CCP)’s historical narrative and political perspectives, adhering to the censorship norms within mainland China. This can manifest in various ways, including the generation of biased content that favors the CCP’s ideology or suppresses dissenting viewpoints.

Even without explicit censorship, AI models inherently carry biases present in their training data. If primarily trained on English texts, the models will reflect Western cultural viewpoints and biases. This can lead to outputs that are inappropriate, insensitive, or even offensive to people from other cultures. It accentuates the importance of diversity in the training corpus.

The good news is that LLMs can be retrained relatively easily. Similar to how Chinese LLMs receive guardrails to promote CCP loyalty, other open-source projects have demonstrated that DeepSeek R1 can be post-trained to mitigate perceived biases. This post-training process, also known as fine-tuning, involves feeding the LLM additional data that is specifically designed to address biases and promote fairness.

Localization and Cultural Sensitivity

This experience emphasizes the need for countries like Malaysia to develop their own capacity to localize, train, and post-train LLMs to align with local conditions. Models that don’t take into account Malaysia’s racial and religious sensitivities, social hierarchies, or local slang could underperform or generate harmful content.

Failing to account for Malaysia’s linguistic nuances can lead to misunderstandings and frustration among users. Similarly, neglecting the country’s diverse traditions and beliefs can result in the generation of content that is offensive to certain groups. Fine-tuning LLMs can guarantee content is aligned to cultural references and contexts. This means modifying models to understand and respond appropriately to nuances in Malaysian English, Bahasa Malaysia, and other local languages. It also involves incorporating culturally relevant examples into the models’ training data.

Malaysia already possesses some LLM development capabilities. For instance, the local startup Mesolitica released the open-source MaLLaM LLM in January, demonstrating a more nuanced understanding of Bahasa Malaysia than mainstream LLMs like ChatGPT. This demonstrates that local organizations can develop solutions fine-tuned to local languages. This highlights the potential for Malaysian companies to contribute to the development of culturally appropriate LLMs.

However, awareness among Malaysian policymakers regarding the potential of open-source AI, and the importance of local LLM development, remains unclear. This lack of awareness can hinder the adoption of open-source AI and limit the opportunities for Malaysian companies and researchers to participate in this technological revolution.

National AI Strategy

The National AI Roadmap, drafted in 2021, makes little mention of open source. Similarly, recent documents from the new National AI Office (NAIO) also don’t emphasize open-source AI. By disregarding open source initiatives, it places Malaysia at a considerable disadvantage.

While predicting the future of AI development remains challenging, the open-source nature of the current generation of LLMs provides Malaysia with an exceptional opportunity to catch up with technology leaders. Malaysia can leverage open-source AI to accelerate its AI development, reduce costs, and address its unique cultural and linguistic needs.

Seizing the Opportunity

To capitalize on this, Malaysia needs to update its policies to accommodate the emergence of smaller and more affordable LLMs. This includes simplifying the adoption of these models, making Gen AI more accessible to small and medium enterprises, and enabling local deployment, particularly in rural areas with limited internet access. Simplified adoption can be achieved by providing regulatory clarity. Businesses need clear guidelines on how to use AI in compliance with data privacy and other relevant regulations. The Government needs to streamline the procurement process. The requirements should be less stringent, making it easier for SMEs to incorporate AI into their workflow. In addressing network bandwidths, Malaysia can develop LLMs better optimized to run on lower bandwidth networks.

Expanding Malaysia’s capacity to develop LLMs, making them more relevant to local languages and mindful of local culture, is crucial. Investing in LLM training, potentially anchored at local universities, can be considered a public good, fostering domestic talent and propelling local research and development. This is beneficial for establishing local expertise. Investing in research will mean LLM engineers and scientists are domestically sourced.

Data Autonomy and National Security

Hosting its own LLMs is vital for ensuring national data autonomy. The data collected by LLMs can be valuable, and instead of being exploited by foreign entities, this information should be stored and utilized by local organizations. This guarantees that data is kept in the country.

Here’s a more detailed breakdown of how Malaysia can specifically capitalize on the open-source AI movement:

  • Policy Updates: Existing policies should be reviewed and updated to reflect the current AI landscape, with a specific focus on the opportunities and benefits of open-source LLMs. This includes streamlining regulations for data usage (while maintaining appropriate privacy safeguards), providing funding and incentives for open-source AI research and development, and promoting the adoption of open-source AI solutions throughout the government sector. Review and update Malaysia’s existing policies to fully reflect the opportunities and benefits of open-source AI. Policies should be updated with the goal of accelerating AI innovation while safeguarding data privacy.

  • Investment in Talent Development: Building a skilled workforce is crucial. Malaysia needs to invest in educational programs and training initiatives focused on AI, machine learning, and natural language processing. These programs should emphasize open-source tools and technologies, ensuring that graduates are well-equipped to contribute to the local AI ecosystem. Scholarships, research grants, and industry partnerships can further encourage students to pursue careers in AI. Malaysia can collaborate with established AI research communities.

  • University-Led Research: Local universities should be at the forefront of AI research and development. The government can provide funding to establish AI research centers at universities, focusing on areas such as LLM customization, cultural adaptation, and the development of new open-source AI tools tailored to the specific needs of Malaysia. The creation of collaborative platforms between universities and industry can accelerate the transfer of knowledge and technology. Providing universities with cloud resources will enable AI faculty and research students to have state of the art compute to train AI models.

  • Support for Startups and SMEs: Open-source AI offers a significant opportunity for startups and SMEs to innovate and compete. Malaysia should provide targeted support to these businesses, including access to funding, mentorship, and technical expertise. This support could include grants for developing AI-powered products and services, incubators and accelerators focused on AI, and programs that connect startups with potential customers and investors. This can be conducted via existing agency such as MDEC, Cradle and others.

  • Data Governance Framework: Establishing a robust data governance framework is essential for ensuring the responsible and ethical use of AI. This framework should address issues such as data privacy, security, and bias, and should be aligned with international best practices. It should also promote the sharing of data within the AI ecosystem, while protecting sensitive information. This could be achieved through the creation of a national data repository and the establishment of clear guidelines for data access and usage. The establishment of an independent AI ethics body.

  • Public-Private Partnerships: Collaboration between the public and private sectors is critical for driving AI innovation. The government can partner with private companies to develop and deploy AI solutions in areas such as healthcare, education, and transportation. These partnerships can leverage the expertise and resources of both sectors, leading to more effective and impactful outcomes. The government should also clarify intellectual property guidelines around the use of open source LLMs.

  • National AI Infrastructure: Investing in a national AI infrastructure, including high-performance computing resources and data storage facilities, is essential for supporting AI research and development. This infrastructure should be accessible to researchers, startups, and businesses across the country, providing them with the tools they need to innovate and compete. Malaysia must ensure the provision for high throughput internet access.

  • Cultural Adaptation of LLMs: Malaysia should invest in projects focused on adapting open-source LLMs to reflect the country’s unique cultural and linguistic landscape. This includes developing models that are fluent in Bahasa Malaysia and other local languages, and that are sensitive to the diverse cultures and traditions of Malaysia. This requires a multidisciplinary approach involving linguists, cultural experts, and AI engineers. Fine-tuning AI models to handle local slang is also crucial.

  • Cybersecurity Considerations: As AI becomes more integrated into critical infrastructure, cybersecurity must be a top priority. Malaysia needs to invest in research and development of AI-specific cybersecurity solutions, and to establish clear guidelines for securing AI systems. This includes developing robust mechanisms for detecting and mitigating AI-related threats. Provide education on how to mitigate specific types of AI-related cyberattacks, such as adversarial attacks and data poisoning.

  • Promoting Ethical AI: Ensuring that AI is developed and used in an ethical and responsible manner is essential. Malaysia should establish a national AI ethics framework, which outlines the principles and values that should guide AI development and deployment. This framework should address issues such as fairness, transparency, and accountability. Establish a clear process for reporting and addressing ethical concerns related to AI.

By taking these proactive steps, Malaysia can harness the power of open-source AI to drive economic growth, improve public services, and address some of the country’s most pressing challenges. The window of opportunity is open, and Malaysia must act decisively to seize it. The commitment to fostering a strong, inclusive, and ethical AI ecosystem will be critical for realizing the full potential of this transformative technology. The focus should always be on empowerment, innovation, and long-term sustainable development.