Shanghai Fund's AI Training Claims Challenge DeepSeek

Decoding Goku’s SASR Training Framework

Shanghai Goku Technologies, founded in 2015, has introduced a novel AI training framework dubbed SASR, or step-wise adaptive hybrid training. This approach aims to address the perceived limitations of prevalent methods like supervised fine-tuning (SFT) and reinforcement learning (RL). Goku argues that SASR, inspired by the way humans develop reasoning skills, offers a more adaptive and efficient pathway to build advanced AI models.

SFT and RL are considered cornerstones in the AI training process, employed by industry giants like OpenAI and DeepSeek. DeepSeek has explicitly emphasized the critical role of these techniques in optimizing the performance of its V3 model, which was released in December and sparked significant interest within the technology sector.

According to Goku’s research paper, co-authored with researchers from Shanghai Jiao Tong University and its newly formed AI subsidiary, Shanghai AllMind Artificial Intelligence Technology, SASR demonstrates superior performance compared to SFT, RL, and static hybrid training methodologies. “Experimental results demonstrate that SASR outperforms SFT, RL and static hybrid training methods,” the Goku team asserted in their research paper.

The Implications of Goku’s Advancement

Goku’s AI training breakthrough reportedly underlines China’s continued progress in the AI field. It potentially highlights the limitations of current policies implemented by the U.S. government, intending to hinder China’s AI advancement through hardware restrictions. Jensen Huang, CEO of Nvidia, has recently commented on the perceived ineffectiveness of these restrictions, stating that “China has 50 percent of the world’s AI developers.”

DeepSeek, a Chinese AI startup that emerged from the High-Flyer hedge fund, has gained widespread recognition for showcasing China’s potential for AI leadership through advanced algorithms and integration of hardware and software.

AllMind’s Role in Goku’s AI Strategy

The establishment of AllMind, coinciding with the publication of Goku’s research, indicates a strategic move to dedicate resources to AI research and development. Chinese business registry records indicate that AllMind was officially registered on the same day Goku released its research.

Wang Xiao, the founder of Goku and the legal representative of AllMind, has stated that the new entity was created to explore new AI boundaries. This mirrors the approach taken by High-Flyer, which established DeepSeek as a separate entity in 2023.

As of the end of last year, Goku managed over 15 billion yuan (approximately US$2.1 billion) in both domestic and international assets, utilizing AI-driven strategies, according to information available on its official website.

Delving Deeper into SASR: A Step-Wise Adaptive Hybrid Training Framework

Goku’s SASR framework presents an interesting alternative in the landscape of AI model training. To truly appreciate its potential impact, a more detailed understanding of its components and workings is essential.

The “step-wise” aspect of SASR implies a multi-stage training process where the AI model undergoes iterative refinement. Each step likely involves specific objectives and utilizes distinct training data to nurture particular capabilities within the model. This phased approach can offer benefits like mitigating the challenges of training complex models from scratch and allowing for tailored optimization at each stage. It allows developers to control the precise development of desired aspects.

The “adaptive” element suggests that the training process isn’t static but responds dynamically to the model’s performance and characteristics. This adaptability could involve adjusting hyperparameters, modifying the training data distribution, or dynamically weighting the contribution of different training objectives. An adaptive process allows the AI to learn and improve more effectively, and may be less prone to catastrophic forgetting, in which an AI model unintentionally degrades a capability when training it to acquire a new one.

The “hybrid” nature of SASR reveals that it combines elements of different training methodologies. This is a vital aspect because there are strengths and weaknesses in SFT and RL. A blend of methods enables the model to leverage the benefits of each approach while addressing its limitations. By integrating these three characteristics, SARS is in theory better tuned to developing logic and reasoning. This makes it more likely to create a more flexible and capable AI model with better overall performance. Also, by integrating the three characteristics, the model is less likely to become narrowly focused in its skill set.

Comparing SASR with Traditional Methods

Supervised fine-tuning (SFT) traditionally relies on a large, labeled dataset where the AI model learns to map inputs to desired outputs. The effectiveness of SFT can be dependent on the scale and quality of these datasets. It rewards consistent behavior and generally makes for a reliable model. Reinforcement learning (RL) involves training the model through trial and error, rewarding or penalizing actions to maximize a specific objective. This simulates real-world conditions more readily.

SASR attempts to integrate the two while overcoming each method’s limitations. For example, SFT can be heavily dependent on the quality and comprehensiveness of the labeled data. In many real-world scenarios, obtaining sufficient, accurate data can be both time-consuming and expensive. Also, due to the fact that an SFT model is trained directly on specific data sets, the model can tend to become overly specialized, making it less capable when handling situations that aren’t directly specified in this carefully curated data set. RL, while not requiring labeled data, can be unstable and prone to reward hacking. Reward hacking occurs when the AI model discovers unintended ways to maximize its reward, potentially leading to undesired behavior. RL can also be an incredibly time-consuming method of training, as success is not guaranteed, meaning developers must design ways of guiding the model to better results, or even start again from scratch. RL also tends to be energy-intensive as a method of model training.

Goku’s framework has the possibility of being an improvement over the limitations of SFT and RL. The company’s SARS algorithm helps the model retain a range of abilities, remain flexible in the face of new challenges, and make good use of labeled and unlabeled data sets. However, further and continued testing is required to verify the initial results documented in the company’s paper. More detailed information about the specific parameters, configurations, and datasets used in Goku’s SASR experiments would be beneficial for the community.

Algorithmic Innovation and Hardware Constraints

The news of Goku’s SASR framework is particularly relevant in the context of US-China tech relations. For some time, the US government has attempted to curtail China’s rise in the AI domain by restricting access to advanced computing hardware, particularly high-end GPUs from companies like Nvidia. The idea behind these restrictions is that limiting China’s access to powerful hardware will slow down their AI development efforts. The US’s motivation is centered on maintaining its competitive advantage in the field, ensuring national security, and restricting possibly detrimental applications of AI research.

However, comments by Nvidia CEO Jensen Huang and advancements emerging from Chinese AI labs seem to suggest these policies may not be as effective as intended. Huang has famously noted that China possesses a significant portion of the world’s AI developer talent, and that restricting hardware access may incentivize them to find alternative solutions. By restricting external access, the US may actually be encouraging internal innovation and independence in ways that could be mutually beneficial. Huang has also said that these restrictions may hurt American companies more than Chinese ones, who will simply go elsewhere for chip manufacturing services.

Goku’s claimed AI breakthrough suggests that algorithmic innovation can potentially offset hardware limitations, at least to some extent. If Chinese researchers can develop more efficient training algorithms, they may be able to achieve comparable AI performance with less powerful hardware. This could have significant implications for the global AI landscape, as it suggests China may be able to continue advancing its AI capabilities despite ongoing restrictions. China may also seek to develop and refine its own local supply chains for AI hardware, further diminishing the effects of restrictions.

This is not to suggest that hardware is irrelevant. Advanced GPUs are still critical for training cutting-edge AI models, and access to the latest hardware undoubtedly offers a significant competitive advantage. However, Goku’s work demonstrates the importance of investing in both hardware and software, and that progress in one area can potentially compensate for limitations in the other. A holistic approach that considers all components is therefore essential for all actors in this new AI-dominated landscape. With the right support for talented workers, the appropriate regulatory environment, and well-calibrated strategic investments, developers in any region can advance AI to new heights.

The Rise of Chinese AI: Beyond DeepSeek

DeepSeek’s emergence as a prominent player in the AI arena has been a catalyst, demonstrating China’s determination to become a global leader in this transformative technology. However, DeepSeek is simply one example, and the rise of Goku, with its SASR training framework, further illustrates the growing strength and innovation within the Chinese AI ecosystem. This increasingly diverse landscape reflects a broadening focus on AI across multiple sectors and for a growing range of applications.

Several factors contribute to this momentum. First, China has a vast pool of data, which is essential for training AI models. With a large population and widespread adoption of digital technologies, Chinese companies have access to massive datasets that can be used to develop and refine their AI algorithms. These massive sets of data allow AI models to be trained more completely and efficiently, possibly giving it a competitive advantage over algorithms trained on less information.

Second, China has a strong emphasis on STEM education, producing a large number of talented engineers and scientists. This has created a highly skilled workforce capable of driving innovation in AI and related fields. The commitment to STEM produces a workforce ready to tackle current challenges and innovate new solutions to meet future challenges. China has made massive investments in educational institutions, which has helped increase the quality of its graduates.

Third, the Chinese government has made AI a strategic priority, providing significant funding and support for research and development. This has created a fertile environment for AI startups and fostered collaboration between academia and industry. These investments help drive innovation without being subject to short-term market volatility, allowing more theoretical and long-term projects to succeed.

Finally, Chinese companies are often willing to take a more pragmatic and risk-taking approach to innovation, which allows them to move quickly and experiment with new ideas. This allows developers the flexibility to adapt their work as they go, potentially accelerating breakthroughs by using a more agile workflow.

As a result of these factors, China is rapidly catching up to the US in terms of AI capabilities. While the US still holds a lead in certain areas, such as fundamental research and high-end hardware, China is making significant strides in areas such as computer vision, natural language processing, and robotics. This suggests progress in hardware is not the only route to advances in AI and related fields.

The emergence of companies like Goku and DeepSeek suggests that China is well-positioned to continue its rise in the AI domain in the years to come. Also, the emergence of new companies helps create competition and innovation. The effects of this progress can reverberate across many industries.

Shanghai Goku Technologies: The Company Behind the Innovation

Shanghai Goku Technologies is a quantitative trading fund founded in 2015. It manages significant assets using AI-driven strategies. The company’s stated mission is to “combine technology and fundamental analysis” to give better returns for its clients. Besides its core business in asset management, Goku has demonstrated a commitment to pushing the boundaries of AI research. AllMind Artificial Intelligence Technology, the AI subsidiary, represents a strategic move to formalize and accelerate its AI research efforts. The integration of technological research and data-driven financial analysis is a key component of the company’s plan.

Details about the company’s internal structure and operational dynamics remain relatively scarce. However, its public statements and recent activities offer insights into its approach. The company’s slogan, which translates to “logic and truth are the only principles we obey”, reflects a data-driven and analytical culture. The investment in AI research and development indicates a long-term vision and an awareness of the transformative potential of AI, not only within the financial sector but also across various industries. It is likely that Goku intends to leverage insights from AI research to improve its trading strategies and gain a competitive edge in the market. The company has stated that its long-term vision includes more than the financial sector. This suggests the AI subsidiary may become focused on other sectors besides asset management.