Ant Group's AI Gambit: Training LLMs on Domestic Chips | en

The High Stakes in the Global AI Hardware Race

The landscape of artificial intelligence development is increasingly defined not just by algorithmic breakthroughs but by access to the sophisticated hardware required to train and run massive models. At the heart of this hardware equation lies the graphics processing unit (GPU), a component initially designed for rendering images but now indispensable for the parallel processing demands of AI. For years, Nvidia Corporation has stood as the undisputed titan in this arena, its advanced GPUs becoming the gold standard, powering innovation across Silicon Valley and beyond. However, this dominance has placed the company, and its customers, directly in the crosshairs of geopolitical tensions.

Washington’s imposition of stringent export controls aimed at curbing China’s access to cutting-edge semiconductor technology has fundamentally reshaped the market. These restrictions specifically target high-performance GPUs, like those produced by Nvidia, deemed critical for advanced AI applications, including those with potential military uses. The immediate effect was a scramble within China’s burgeoning tech sector. Companies heavily invested in AI, from established giants to ambitious start-ups, faced the sudden prospect of being cut off from the essential tools driving the next wave of technological progress. This created an urgent imperative: find viable alternatives or risk falling behind in a globally competitive field. The challenge wasn’t merely about replacing one chip with another; it involved navigating a complex web of performance differentials, software compatibility issues, and the sheer scale required for training models with hundreds of billions, or even trillions, of parameters.

Ant Group Charts a Course Toward Compute Independence

Against this backdrop of supply chain uncertainty and escalating technological rivalry, Ant Group, the fintech behemoth affiliated with Alibaba Group Holding, has signaled a significant stride towards greater computational self-sufficiency. Recent revelations, detailed in a research paper by the company’s Ling team – the division spearheading its large language model (LLM) initiatives – indicate a successful deviation from the Nvidia-centric path. The core of this achievement lies in their ability to effectively train a sophisticated AI model using domestically produced GPUs.

The model in question, named Ling-Plus-Base, is no lightweight. It’s designed using a Mixture-of-Experts (MoE) architecture, a technique gaining traction for its efficiency in scaling up LLMs. Boasting a substantial 300 billion parameters, Ling-Plus-Base operates in a league comparable to other prominent global models. The crucial differentiator, however, is the hardware underpinning its training. According to the research findings, this powerful model can be nurtured to maturity on what the team describes as ‘lower-performance devices.’ This carefully chosen phrase points directly towards the utilization of processing units that fall outside the scope of US export restrictions, strongly implying the use of chips designed and manufactured within China.

This development is more than just a technical workaround; it represents a potential strategic pivot. By demonstrating the capacity to train state-of-the-art models without relying exclusively on the highest-tier, restricted foreign hardware, Ant Group is not only mitigating supply chain risks but also potentially unlocking significant cost efficiencies.

The Economic Equation: Slashing Training Costs

One of the most compelling figures emerging from the Ling team’s research is a reported 20 percent reduction in computing costs during the critical pre-training phase of the Ling-Plus-Base model. Pre-training is notoriously resource-intensive, involving feeding the model vast datasets to learn language patterns, context, and knowledge. It constitutes a major portion of the overall expense associated with developing foundational LLMs. Achieving a one-fifth cost reduction in this phase, therefore, translates into substantial savings, potentially freeing up capital for further research, development, or deployment at scale.

How is this cost saving achieved? While the paper doesn’t detail the exact cost breakdown, several factors likely contribute:

Hardware Procurement: Domestically produced GPUs, even if less powerful individually than Nvidia’s top offerings, may come at a lower purchase price or offer more favorable volume discounts within the Chinese market, especially considering the constrained supply of high-end Nvidia chips.
Energy Efficiency: While not explicitly stated, optimizing training for potentially less power-hungry (though perhaps less performant per unit) domestic chips could contribute to lower operational energy costs, a significant factor in running large data centers.
Algorithmic and Architectural Optimization: The use of the MoE architecture itself is key. MoE models activate only specific ‘expert’ sub-networks for a given input, rather than engaging the entire model like dense architectures. This inherent sparsity can significantly reduce the computational load during both training and inference, making it feasible to achieve good results even with less raw processing power per chip. Ant’s success suggests sophisticated software and algorithmic tuning to maximize the efficiency of the available domestic hardware.

This cost reduction is not merely an accounting benefit; it lowers the barrier to entry for developing large-scale models and could accelerate the pace of AI innovation within the company and potentially across the broader Chinese tech ecosystem if the methods prove replicable.

Performance Parity: Bridging the Hardware Gap?

Cost savings are attractive, but they mean little if the resulting AI model underperforms significantly. Ant’s Ling team addresses this directly, asserting that Ling-Plus-Base achieves performance comparable to other well-regarded models in the field. Specifically, they benchmarked their creation against models like Qwen2.5-72B-Instruct (developed by parent company Alibaba) and DeepSeek-V2.5-1210-Chat, another prominent Chinese LLM.

The claim of ‘comparable performance’ despite using ‘lower-performance devices’ is noteworthy. It suggests that Ant has potentially found effective ways to compensate for any raw computational deficit through:

Advanced Model Architecture: The MoE design is instrumental here, efficiently distributing the workload.
Software Optimization: Tailoring the training software stack (like parallelization frameworks and numerical libraries) specifically for the architecture of the domestic GPUs being used is crucial. This often involves significant engineering effort.
Data Curation and Training Techniques: Sophisticated methods for selecting training data and refining the training process itself can significantly impact final model quality, sometimes compensating for hardware limitations.

It’s important to approach performance claims with nuance. ‘Comparable’ can encompass a range of outcomes across various benchmarks (e.g., language understanding, reasoning, generation, coding). Without access to detailed benchmark results across multiple standardized tests, a precise comparison remains challenging. However, the assertion itself signals Ant’s confidence that its approach does not necessitate a crippling trade-off between cost/accessibility and capability. It demonstrates a pathway to maintaining competitiveness even within the constraints imposed by hardware restrictions.

The researchers themselves highlighted the broader implications: ‘These results demonstrate the feasibility of training state-of-the-art large-scale MoE models on less powerful hardware, enabling a more flexible and cost-effective approach to foundational model development with respect to computing resource selection.’ This points towards a democratization of sorts, allowing cutting-edge AI development to proceed even when access to the absolute pinnacle of processing power is limited.

Understanding the Mixture-of-Experts (MoE) Advantage

The Mixture-of-Experts architecture is central to Ant Group’s reported success. It represents a departure from traditional ‘dense’ neural network models where every input activates every parameter. In an MoE model:

The model is composed of numerous smaller, specialized ‘expert’ networks.
A ‘gating network’ or ‘router’ mechanism learns to direct incoming data (tokens, in the case of LLMs) to the most relevant expert(s) for processing.
Only the selected expert(s) – often just one or two out of potentially hundreds – perform computations for that specific piece of data.

This approach offers several key advantages, particularly relevant in the context of hardware constraints:

Scalability: MoE allows models to grow to enormous parameter counts (trillions are becoming feasible) without a proportional increase in the computational cost for processing each input token during inference or even during training steps. This is because only a fraction of the total parameters are active at any given time.
Training Efficiency: While training MoE models has its own complexities (like load balancing across experts), the reduced computation per token can translate into faster training times or, as Ant demonstrates, the ability to train effectively on less powerful hardware within reasonable timeframes.
Specialization: Each expert can potentially specialize in different types of data, tasks, or knowledge domains, potentially leading to higher quality outputs in specific areas.

Leading AI labs worldwide have embraced MoE, including Google (GShard, Switch Transformer), Mistral AI (Mixtral models), and within China, companies like DeepSeek and Alibaba (whose Qwen models incorporate MoE elements). Ant’s Ling-Plus-Base firmly places it within this vanguard, leveraging architectural innovation to navigate hardware realities.

The Domestic Hardware Ecosystem: Filling the Nvidia Void

While the Ant research paper refrained from explicitly naming the hardware used, subsequent reporting, notably by Bloomberg, indicated that the feat involved domestically designed chips. This includes processors potentially originating from Ant’s affiliate, Alibaba, which has its own chip design unit T-Head (producing CPUs like the Yitian 710 and previously exploring AI accelerators), and crucially, Huawei Technologies.

Huawei, despite facing intense US sanctions itself, has been aggressively developing its Ascend series of AI accelerators (like the Ascend 910B) as a direct alternative to Nvidia’s offerings within the Chinese market. These chips are reportedly being adopted by major Chinese tech firms. The ability of Ant Group to effectively utilize such hardware for a model as large as Ling-Plus-Base would represent a significant validation of these domestic alternatives.

It’s crucial to note that Ant Group hasn’t entirely abandoned Nvidia. The reports suggest that Nvidia chips remain part of Ant’s AI development toolkit, likely used for tasks where their specific performance characteristics or mature software ecosystem (like CUDA) offer advantages, or for legacy systems. The move isn’t necessarily about complete replacement overnight but about building viable, parallel pathways that reduce strategic vulnerability and control costs. This hybrid approach allows the company to leverage the best available tools while cultivating independence. Ant Group itself maintained a degree of corporate discretion, declining to comment officially on the specific chips used.

A Wider Trend: China’s Collective Push for AI Self-Reliance

Ant Group’s initiative is not occurring in isolation. It mirrors a broader strategic push across China’s technology sector to innovate around the limitations imposed by US export controls. The ‘tech war’ has catalyzed efforts to achieve greater self-sufficiency in critical technologies, particularly semiconductors and AI.

Other major players are pursuing similar goals:

ByteDance: The parent company of TikTok is also reportedly working to secure and utilize alternative chips, including domestic options, for its AI ambitions, which span recommendation algorithms, generative AI, and more.
DeepSeek: This AI start-up, known for its powerful open-source models, explicitly mentions training efficiency and has developed models using the MoE architecture, aligning with strategies that are less dependent on having vast fleets of only the most powerful GPUs.
Baidu, Tencent, and others: All major Chinese cloud and tech companies are investing heavily in AI and are inevitably exploring hardware diversification strategies, including optimizing for domestic chips and potentially developing their own custom silicon.

The collective message is clear: while access to Nvidia’s top-tier products remains desirable, the Chinese tech industry is actively developing and validating alternative solutions. This involves a multi-pronged approach: embracing efficient model architectures like MoE, intense software optimization for different hardware backends, and supporting the development and adoption of domestically produced chips.

Beyond Language Models: Ant’s AI Expansion in Healthcare

Ant Group’s AI endeavors extend beyond foundational LLMs. Concurrent with the news about its training efficiencies, the company unveiled significant upgrades to its suite of AI solutions tailored for the healthcare sector. This initiative leverages a distinct, self-developed healthcare-centric AI model.

The upgraded solutions feature multimodal capabilities (processing various data types like text, images, and potentially other medical data) and sophisticated medical reasoning. These are integrated into what Ant describes as ‘all-in-one machines,’ presumably devices or platforms designed for clinical settings or health management.

While seemingly separate from the Ling-Plus-Base LLM news, there’s a potential underlying connection. The ability to train powerful AI models more cost-effectively, potentially using a mix of hardware including domestic options, could underpin the economic viability of developing and deploying specialized models for sectors like healthcare. Lowering the foundational costs of AI development allows resources to be channeled into domain-specific applications, potentially accelerating the rollout of practical AI tools in critical industries. This healthcare push underscores Ant’s ambition to apply its AI expertise broadly, moving beyond its fintech roots.

Implications for the Future: A Fork in the AI Road?

Ant Group’s successful training of a large-scale MoE model using non-Nvidia, likely domestic, GPUs carries significant implications:

Validation for Domestic Chips: It serves as a crucial proof point for the viability of Chinese-designed AI accelerators like Huawei’s Ascend, potentially boosting their adoption within China.
Competitive Landscape: It demonstrates that Chinese companies can remain competitive in cutting-edge AI development despite restrictions, leveraging architectural and software innovation.
Cost Dynamics: The 20% cost reduction highlights a potential competitive advantage for companies able to effectively utilize alternative hardware, potentially influencing global AI pricing and accessibility.
Nvidia’s Position: While Nvidia remains dominant globally, this trend underscores the challenges it faces in the significant Chinese market due to regulations and the rise of local competitors. It may accelerate Nvidia’s development of export-compliant chips tailored for China, but also validates the alternative path.
Technological Bifurcation?: In the long term, continued divergence in hardware access and software optimization could lead to partially distinct AI ecosystems, with models and tools optimized for different underlying silicon.

The journey undertaken by Ant Group’s Ling team is emblematic of the resourcefulness being spurred by geopolitical constraints. By cleverly combining advanced model architectures like MoE with a willingness to optimize for and utilize available domestic hardware, they have charted a course that ensures continued progress in the critical field of artificial intelligence, potentially reshaping the cost structures and strategic dependencies that define the industry. It’s a testament to the idea that innovation often flourishes most vibrantly under pressure.

updated at 2025-03-26

# AIGC # Qwen # Alibaba