Tag: Moonshot

Moonshot AI Muon and Moonlight LLM

Moonshot AI introduces Muon, a new optimizer, and Moonlight, a model trained with it. Muon enhances large language model training efficiency and stability, achieving superior performance with reduced computational cost. Moonlight outperforms comparable models in various benchmarks, demonstrating Muon's effectiveness. Open-sourcing promotes further research in efficient training methods.

Moonshot AI Muon and Moonlight LLM

Kimi Moonlight 30B 160B MoE Model

Moonshot AI unveils Moonlight a hybrid expert model with 30B and 160B parameters trained on the Muon architecture using 57 trillion tokens. It achieves superior performance and Pareto efficiency with a novel optimizer that doubles computational efficiency compared to AdamW making large language model training more accessible and sustainable.

Kimi Moonlight 30B 160B MoE Model

Moonshot AI's Kimi k1.5 Model Rivals OpenAI's o1

Moonshot AI's Kimi k1.5 model achieves performance comparable to OpenAI's full o1, marking a significant advancement in AI. It excels in mathematics, coding, and multimodal reasoning, with its short-CoT variant outperforming GPT-4o and Claude 3.5 Sonnet. This development highlights domestic innovation and a collaborative approach in AI research.

Moonshot AI's Kimi k1.5 Model Rivals OpenAI's o1