Meta Debuts Llama 4 AI: Scout, Maverick, Behemoth | en

Meta Platforms, the technology conglomerate operating Facebook, Instagram, and WhatsApp, has notably bolstered its standing in the artificial intelligence domain by introducing its Llama 4 series. This launch signifies the subsequent generation of the company’s widely recognized Llama family of open models, indicating a persistent dedication to competing at the vanguard of AI development and potentially altering the competitive landscape within the sector. The release presents a trio of unique models, each crafted with distinct capabilities and computational frameworks, designed to serve a wide array of applications, from standard chat functions to intricate data processing operations.

Introducing the Llama 4 Family: Scout, Maverick, and Behemoth

The initial deployment of the Llama 4 generation features three specifically designated models: Llama 4 Scout, Llama 4 Maverick, and the currently developing Llama 4 Behemoth. Meta has stated that the foundation of these models relies on comprehensive training datasets composed of immense volumes of unlabeled text, images, and video material. This multi-modal training strategy aims to equip the models with a refined and ‘broad visual understanding’, broadening their abilities beyond solely text-based interactions.

The development path for Llama 4 seems to have been shaped by competitive dynamics within the swiftly advancing AI field. Information suggests that the arrival and remarkable efficiency of open models from international AI research centers, specifically mentioning the Chinese lab DeepSeek, spurred Meta to quicken its own development initiatives. It is understood that Meta allocated substantial resources, possibly forming specialized teams or ‘war rooms’, to scrutinize and comprehend the methods used by competitors such as DeepSeek, concentrating particularly on techniques that successfully lowered the computational expenses linked to operating and deploying sophisticated AI models. This competitive undercurrent emphasizes the fierce competition among major technology firms and research bodies to attain advancements in both AI performance and operational effectiveness.

Accessibility differs among the new Llama 4 models. Scout and Maverick are being released openly to the developer community and the general public via established routes, including Meta’s dedicated Llama.com portal and partner platforms like the popular AI development center, Hugging Face. This open access highlights Meta’s approach of cultivating a wider ecosystem centered around its Llama models. Conversely, Behemoth, identified as the most potent model in the current series, is still under development and has not been made available for general use yet. Simultaneously, Meta is incorporating these enhanced capabilities into its products aimed at users. The company declared that its own AI assistant, Meta AI, which functions across its suite of applications including WhatsApp, Messenger, and Instagram, has been enhanced to utilize the capabilities of Llama 4. This integration is being implemented in forty countries, although the sophisticated multimodal features (merging text, image, and potentially other data forms) are initially limited to English-speaking users in the United States.

Navigating the Licensing Landscape

Despite the focus on openness for certain models, the deployment and utilization of Llama 4 are regulated by particular licensing conditions that could pose challenges for some developers and organizations. A significant restriction explicitly forbids users and companies based in or having their main business location within the European Union from using or distributing the Llama 4 models. This geographical constraint is likely a direct result of the strict governance standards imposed by the EU’s extensive AI Act and current data privacy laws like GDPR. Managing these intricate regulatory structures appears to be a major factor influencing Meta’s deployment strategy in that region.

Furthermore, mirroring the licensing framework of earlier Llama versions, Meta enforces a requirement on large-scale businesses. Companies demonstrating a user base surpassing 700 million monthly active users must formally apply for a special license directly from Meta. Importantly, the authority to approve or reject this license resides entirely within Meta’s ‘sole discretion’. This provision effectively grants Meta authority over how its most advanced models are employed by potentially rival large technology corporations, preserving a level of strategic supervision despite the ‘open’ characteristic of parts of the Llama ecosystem. These licensing details highlight the intricate balance between encouraging open innovation and preserving strategic dominance in the high-stakes AI field.

In its official statements accompanying the launch, Meta characterized the Llama 4 release as a critical juncture. ‘These Llama 4 models mark the beginning of a new era for the Llama ecosystem,’ the company declared in a blog post, adding further, ‘This is just the beginning for the Llama 4 collection.’ This forward-looking declaration implies a plan for ongoing development and growth within the Llama 4 generation, positioning this launch not as a final point but as a notable achievement in a continuous journey of AI progress.

Architectural Innovations: The Mixture of Experts (MoE) Approach

A crucial technical feature setting the Llama 4 series apart is its implementation of a Mixture of Experts (MoE) architecture. Meta emphasizes that this is the initial group within the Llama family to employ this particular design methodology. The MoE strategy signifies a major change in the structure and training of large language models, providing considerable benefits in computational efficiency, both during the demanding training stage and the operational stage when handling user requests.

Essentially, an MoE architecture operates by breaking down complex data processing tasks into smaller, more easily handled subtasks. These subtasks are then intelligently directed or assigned to a group of smaller, specialized neural network units, known as ‘experts’. Each expert is typically trained to perform exceptionally well on specific kinds of data or tasks. A gating mechanism within the architecture decides which expert or combination of experts is most suitable for managing a specific portion of the input data or query. This differs from conventional dense model architectures where the entire model processes every segment of the input.

The efficiency improvements arise because only a fraction of the model’s total parameters (the ‘active’ parameters associated with the chosen experts) are utilized for any specific task. This selective activation markedly lowers the computational burden compared to activating the entirety of a huge, dense model.

Meta offered specific details demonstrating this architecture:

Maverick: This model has a large total parameter count of 400 billion. Nevertheless, due to the MoE design integrating 128 separate ‘experts’, only 17 billion parameters are actively used at any point during processing. Parameters are often seen as an approximate measure of a model’s capacity for learning and complex problem-solving.
Scout: Structured similarly, Scout contains 109 billion total parameters spread across 16 ‘experts’, leading to the same 17 billion active parameters as Maverick.

This architectural decision enables Meta to construct models with immense overall capacity (high total parameter counts) while keeping computational requirements for inference (query processing) manageable, making them potentially more feasible to deploy and run at scale.

Performance Benchmarks and Model Specializations

Meta has positioned its new models competitively, publishing internal benchmark outcomes comparing Llama 4 against leading models from competitors like OpenAI, Google, and Anthropic.

Maverick, identified by Meta as ideal for ‘general assistant and chat’ functions, including activities like creative writing and code generation, reportedly shows better performance compared to models such as OpenAI’s GPT-4o and Google’s Gemini 2.0 on particular benchmarks. These benchmarks assess areas like coding skill, logical reasoning, multilingual abilities, managing long text sequences (long-context), and image comprehension. However, Meta’s own data reveals that Maverick does not consistently exceed the capabilities of the very latest and most powerful models currently accessible, such as Google’s Gemini 2.5 Pro, Anthropic’s Claude 3.7 Sonnet, or OpenAI’s expected GPT-4.5. This indicates Maverick targets a strong placement in the high-performance category but might not secure the absolute top position across all measures against the newest flagship models from rivals.

Scout, conversely, is designed for different strengths. Its abilities are emphasized in tasks like summarizing lengthy documents and reasoning over extensive, complex codebases. A particularly distinct and defining characteristic of Scout is its remarkably large context window, capable of processing up to 10 million tokens. Tokens are the fundamental units of text or code that language models handle (e.g., a word like ‘understanding’ might be split into several tokens like ‘un-der-stand-ing’). A 10-million-token context window practically means the ability to absorb and process a massive quantity of information concurrently – potentially equivalent to millions of words or entire code libraries. This permits Scout to preserve coherence and understanding across extremely long documents or intricate programming projects, a difficult task for models with smaller context windows. It can also process images alongside this extensive textual input.

The hardware needed to run these models reflects their size and architecture. Based on Meta’s estimates:

Scout is comparatively efficient, able to run on a single high-performance Nvidia H100 GPU.
Maverick, with its greater total parameter count despite MoE efficiency, needs more significant resources, requiring an Nvidia H100 DGX system (which usually includes multiple H100 GPUs) or similar computational power.

The upcoming Behemoth model is anticipated to necessitate even more substantial hardware infrastructure. Meta disclosed that Behemoth is engineered with 288 billion active parameters (out of nearly two trillion total parameters, distributed across 16 experts). Initial internal benchmarks place Behemoth ahead of models like GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Pro (though importantly, not the more advanced Gemini 2.5 Pro) on several evaluations concentrating on STEM (Science, Technology, Engineering, and Mathematics) skills, especially in areas like solving complex mathematical problems.

It is important to note, however, that none of the currently revealed Llama 4 models are specifically designed as ‘reasoning’ models similar to OpenAI’s developmental o1 and o3-mini concepts. These specialized reasoning models typically integrate mechanisms for internal fact verification and iterative refinement of their responses, potentially resulting in more dependable and accurate answers, particularly for factual inquiries. The compromise is often increased latency, meaning they require more time to produce responses compared to more conventional large language models like those in the Llama 4 family, which prioritize quicker generation.

Adjusting the Conversational Boundaries: Contentious Topics

An interesting element of the Llama 4 introduction concerns Meta’s intentional modification of the models’ response behavior, especially regarding sensitive or controversial issues. The company explicitly mentioned that it has fine-tuned the Llama 4 models to be less inclined to refuse answering ‘contentious’ questions compared to their predecessors in the Llama 3 series.

According to Meta, Llama 4 is now more disposed to engage with ‘debated’ political and social subjects where earlier versions might have declined or offered a generic refusal. Additionally, the company asserts that Llama 4 displays a ‘dramatically more balanced’ stance concerning the types of prompts it will refuse to engage with entirely. The declared objective is to deliver helpful and factual responses without imposing judgment.

A Meta spokesperson elaborated on this change, telling TechCrunch: ‘[Y]ou can count on [Llama 4] to provide helpful, factual responses without judgment… [W]e’re continuing to make Llama more responsive so that it answers more questions, can respond to a variety of different viewpoints […] and doesn’t favor some views over others.’

This modification takes place amidst ongoing public and political discussion about perceived biases in artificial intelligence systems. Certain political groups and commentators, including notable figures linked to the Trump administration like Elon Musk and venture capitalist David Sacks, have leveled accusations that popular AI chatbots display a political bias, often termed ‘woke’, supposedly censoring conservative perspectives or presenting information biased towards a liberal viewpoint. Sacks, for example, has previously criticized OpenAI’s ChatGPT specifically, alleging it was ‘programmed to be woke’ and unreliable regarding political topics.

Nevertheless, the difficulty of attaining genuine neutrality and eradicating bias in AI is broadly acknowledged within the technical community as an exceptionally complex and enduring issue (‘intractable’). AI models learn patterns and connections from the enormous datasets they are trained on, and these datasets inevitably mirror the biases present in the human-created text and images they comprise. Attempts to develop perfectly unbiased or politically neutral AI, even by companies explicitly striving for it, have proven challenging. Elon Musk’s own AI initiative, xAI, has reportedly encountered difficulties in creating a chatbot that avoids endorsing specific political positions over others.

Despite the inherent technical obstacles, the trend among leading AI developers, including Meta and OpenAI, seems to be shifting towards adjusting models to be less resistant to controversial subjects. This entails carefully calibrating safety filters and response guidelines to permit engagement with a broader spectrum of questions than previously allowed, while still striving to lessen the generation of harmful or overtly biased content. This fine-tuning illustrates the delicate balancing act AI companies must undertake between fostering open dialogue, guaranteeing user safety, and managing the intricate sociopolitical expectations surrounding their potent technologies. The launch of Llama 4, with its explicitly stated adjustments in addressing contentious queries, signifies Meta’s most recent move in navigating this complex terrain.

updated at 2025-04-06

# LLM # Llama # Meta