Meta’s Llama 4: A Questionable Release Strategy
The realm of Large Language Models (LLMs) has recently experienced a notable reshuffling, with Google asserting itself as a major contender while Meta and OpenAI grapple with considerable obstacles. Initially, OpenAI held the reins of power, propelled by its pioneering GPT models that established unprecedented standards for LLM proficiency. Meta also cemented a prominent position by providing open-weight models, boasting impressive capabilities and granting unfettered access to their publicly available code for use, modification, and deployment.
However, this initial dominance inadvertently relegated other tech giants, including Google, to a position of playing catch-up. Despite Google’s groundbreaking 2017 research paper on the transformer architecture – the very foundation upon which LLMs are built – the company’s initial endeavors were eclipsed by the widely criticized debut of Bard in 2023.
More recently, fortunes have shifted dramatically, with Google introducing a suite of potent new LLMs, coinciding with setbacks encountered by Meta and OpenAI. This paradigm shift has fundamentally reshaped the dynamics of the LLM ecosystem.
Meta’s decision to unexpectedly launch Llama 4 on Saturday, April 5th, sent ripples of surprise throughout the industry. The unconventional choice of releasing a major model on a weekend was perceived as an anomaly, resulting in a subdued reception and effectively burying the announcement beneath the subsequent week’s deluge of news.
While Llama 4 undoubtedly possesses certain strengths, including its multimodal capabilities (the ability to process images, audio, and other modalities) and its availability in three distinct versions (Llama 4 Behemoth, Maverick, and Scout) with varying sizes and strengths, its overall rollout was met with a wave of criticism. The Llama 4 Scout version, in particular, touted a substantial context window of up to 10 million tokens, enabling the model to process and generate colossal amounts of text within a single session.
However, the model’s initial acclaim soon dissipated as discrepancies emerged regarding Meta’s ranking methodology on LMArena, a prominent platform that ranks LLMs based on user votes. It was discovered that the specific Llama 4 model used for the rankings diverged significantly from the version made accessible to the general public. LMArena explicitly stated that Meta provided “a customized model to optimize for human preference.”
Moreover, Meta’s claims pertaining to Llama 4 Scout’s 10-million-token context window were met with a healthy dose of skepticism. Despite the technical validity of this figure, benchmark tests revealed that Llama 4 lagged behind its competitors in terms of long-context performance.
Further compounding the concerns, Meta opted to refrain from releasing a Llama 4 “reasoning” or “thinking” model and withheld smaller variants, although the company has indicated that a reasoning model is forthcoming.
Ben Lorica, founder of the AI consulting firm Gradient Flow, astutely observed that Meta deviated from the standard practice of a more systematic release, where all components are meticulously prepared. This suggests that Meta may have been overly eager to showcase a new model, even if it lacked essential elements such as a reasoning model and smaller versions, hinting at a potentially rushed or incomplete launch.
OpenAI’s GPT-4.5: A Premature Withdrawal
OpenAI, another titan in the LLM space, has also encountered a series of challenges in recent months.
GPT-4.5, initially unveiled as a research preview on February 27th, was heralded as the company’s “largest and best model for chat yet.” OpenAI’s own benchmarks indicated that GPT-4.5 generally outperformed its predecessor, GPT-4o, across a variety of tasks.
However, the model’s pricing structure quickly became a point of contention and drew widespread criticism. OpenAI set the API access price at a hefty US$150 per million output tokens, a staggering 15-fold increase compared to GPT-4o’s price of $10 per million tokens. The API, a critical component for developers, enables the seamless integration of OpenAI models into their applications and services.
Alan D. Thompson, an AI consultant and analyst at Life Architect, estimated that GPT-4.5 was likely the largest traditional LLM released during the first quarter of 2025, boasting approximately 5.4 trillion parameters. He argued that such immense scale is difficult to justify given current hardware limitations and poses significant challenges in effectively serving a large user base, especially considering the resource-intensive nature of such a massive model.
In a surprising turn of events on April 14th, OpenAI announced its decision to discontinue GPT-4.5 access via the API after a relatively short period of less than three months. While GPT-4.5 will remain accessible, its use will be limited to ChatGPT users through the ChatGPT interface, effectively restricting its broader application.
This announcement coincided with the introduction of GPT-4.1, a more economically priced model at $8 per million tokens. OpenAI’s benchmarks suggest that GPT-4.1 is not quite as capable as GPT-4.5 overall, although it demonstrates superior performance in certain coding benchmarks, highlighting a trade-off between cost and overall capability.
OpenAI also recently unveiled new reasoning models, o3 and o4-mini, with the o3 model demonstrating particularly strong benchmark performance, signaling a continued focus on advanced reasoning capabilities. However, cost remains a persistent concern, as API access to o3 is priced at $40 per million output tokens, potentially limiting its accessibility for certain applications.
Google’s Ascendancy: Capitalizing on Opportunity
The mixed reception of Llama 4 and ChatGPT-4.5 created a window of opportunity for competitors to capitalize on, and several have seized the moment.
Meta’s troubled launch of Llama 4 is unlikely to dissuade developers from exploring and adopting alternatives such as DeepSeek-V3, Google’s Gemma, and Alibaba’s Qwen2.5. These LLMs, introduced in late 2024, have rapidly gained traction and have become the preferred open-weight models on LMArena and HuggingFace leaderboards, indicating their growing popularity and adoption within the developer community. These models rival or surpass Llama 4 in popular benchmarks, offer more affordable API access options, and, in some cases, are available for download and use on consumer-grade hardware, making them more accessible to a wider range of users.
However, it is Google’s cutting-edge LLM, Gemini 2.5 Pro, that has truly captured the attention of the AI community and industry observers alike.
Launched on March 25th, Google Gemini 2.5 Pro is a “thinking model” akin to GPT-o1 and DeepSeek-R1, employing self-prompting techniques to reason through complex tasks and problems. Gemini 2.5 Pro is also multimodal, further enhancing its capabilities, features a context window of one million tokens, and supports in-depth research, positioning it as a versatile tool for a wide range of applications.
Gemini 2.5 has rapidly achieved a series of benchmark victories, quickly establishing itself as a top performer. These include securing the top spot in SimpleBench (although it later ceded that position to OpenAI’s o3 on April 16th) and on Artificial Analysis’s combined AI Intelligence Index, further solidifying its reputation for excellence. Gemini 2.5 Pro currently holds the top position on LMArena, a testament to its overall performance and user satisfaction. As of April 14th, Google models occupied 5 of the top 10 slots on LMArena, including Gemini 2.5 Pro, three variants of Gemini 2.0, and Gemma 3-27B, demonstrating Google’s growing dominance in the LLM space.
Beyond its impressive performance metrics, Google is also emerging as a price leader, further enhancing its competitive edge. Google Gemini 2.5 is currently available for free use through Google’s Gemini app and Google’s AI Studio website, making it highly accessible to a broad audience. Google’s API pricing is also exceptionally competitive, with Gemini 2.5 Pro priced at $10 per million output tokens and Gemini 2.0 Flash priced at just 40 cents per million tokens, making it one of the most affordable options available.
Lorica notes that for high-volume reasoning tasks, he often opts for DeepSeek-R1 or Google Gemini due to their compelling combination of performance and cost-effectiveness, while using OpenAI models requires more careful consideration of pricing due to their relatively higher cost.
While Meta and OpenAI are not necessarily on the verge of collapse – OpenAI, in particular, benefits from the widespread popularity of ChatGPT, which reportedly boasts one billion users – Gemini’s strong rankings and benchmark performance indicate a significant shift in the LLM landscape, currently favoring Google. The combination of performance, competitive pricing, and strategic releases has positioned Google as a major contender, and the company appears poised to continue its ascent in the ever-evolving world of Large Language Models. The future of LLMs looks increasingly diverse, with Google playing a pivotal role in shaping the direction of innovation and accessibility in the field. The rise of open-source alternatives also contributes to this dynamic landscape, offering developers and users more choices and control over their AI applications. This competition will likely lead to further advancements and a wider adoption of LLMs across various industries and use cases. Google’s strategic focus on both performance and affordability could prove to be a key differentiator in the long run, attracting a wider range of users and developers to its Gemini ecosystem. The coming months will be crucial in determining whether Google can maintain its momentum and solidify its position as a leader in the LLM arena.