Meta's Llama AI Runs on Windows 98 PC

In a fascinating collision of technological eras, a narrative has emerged that bridges the nascent days of widespread home computing with the cutting edge of artificial intelligence. Marc Andreessen, a prominent figure in the tech world and co-founder of the influential venture capital firm Andreessen Horowitz, recently spotlighted a remarkable feat: a compact version of Meta’s Llama artificial intelligence model was successfully operated on a computer running the venerable Windows 98 operating system, equipped with a mere 128 megabytes of RAM. This revelation serves as a potent reminder of technological potential and raises intriguing questions about the historical trajectory of computing.

The very notion of running a sophisticated AI, even a scaled-down one, on hardware dating back over a quarter-century seems almost paradoxical. Modern generative AI, the technology powering tools like ChatGPT and Microsoft’s own Copilot, is typically associated with powerful processors, substantial memory allocations, and often, cloud-based infrastructure. Microsoft itself has heavily invested in integrating AI capabilities, particularly its Copilot assistant, deeply into its latest operating system, Windows 11, and a new generation of hardware dubbed Copilot+ PCs, designed explicitly with AI workloads in mind. This contrast makes the Windows 98 experiment all the more striking. It challenges our assumptions about the resources truly necessary for certain AI functions and offers a glimpse into an alternate technological timeline.

Resurrecting the Past: The Herculean Effort Behind the Experiment

While Andreessen brought wider attention to this accomplishment, the technical heavy lifting appears to stem from earlier work, notably by the team at Exo Labs. Their journey to coax a modern AI onto such vintage machinery was far from straightforward; it was an exercise in digital archaeology and creative problem-solving, highlighting the vast differences between computing then and now.

The first hurdle involved basic logistics and hardware compatibility. Finding functional hardware from the Windows 98 era is challenging enough. But beyond just booting up the machine, the team needed peripherals. Modern USB interfaces, ubiquitous today, were not standard fare in the prime of Windows 98. This necessitated sourcing compatible input devices using the older PS/2 connectors – keyboards and mice that many younger tech enthusiasts may have never encountered.

Once the physical setup was addressed, the next significant obstacle was data transfer. How do you get the necessary AI model files and development tools onto a machine lacking modern connectivity options like high-speed USB ports or seamless network integration? This likely involved resorting to older, slower methods, perhaps burning files onto CDs or utilizing limited network protocols of the time, turning a simple file copy into a potentially time-consuming process.

The core technical challenge, however, lay in compiling modern code for an ancient environment. The AI model, based on Meta’s Llama architecture, is built using contemporary programmingpractices and languages. Making this code understandable and executable by Windows 98 required a compiler – a program that translates source code into machine language – that could run on the old operating system and handle the complexities of the AI code.

Exo Labs initially turned to Borland C++ 5.02, itself a piece of software history – a 26-year-old integrated development environment (IDE) and compiler combination that ran natively on Windows 98. This choice represented a potential bridge between the modern code base and the vintage operating system. However, the path was fraught with complications. The intricacies of modern C++ standards and libraries proved difficult to reconcile with the capabilities and limitations of the Borland compiler and the Windows 98 environment. Compatibility issues arose, forcing the team to pivot.

Their solution involved regressing to an older version of the C programming language. While C is a foundational language and the precursor to C++, using an older C standard meant sacrificing some of the higher-level abstractions and conveniences of C++. This required a more laborious coding process, manually managing elements like functions and variables that C++ handles more elegantly. Progress was inevitably slower, demanding meticulous attention to detail to avoid errors that the older development tools might not easily catch.

The Memory Squeeze: Taming Llama for Limited Resources

Perhaps the most daunting constraint was the extremely limited Random Access Memory (RAM). The target machine possessed only 128 megabytes of RAM. To put this in perspective, modern smartphones routinely ship with 8, 12, or even 16 gigabytes of RAM (a gigabyte being roughly 1000 megabytes). High-end PCs designed for gaming or professional work often feature 32GB, 64GB, or more. Running a complex application like an AI model within such a minuscule memory footprint is akin to performing intricate surgery in a broom closet.

Meta’s Llama family of models, while generally considered more resource-efficient than behemoths like OpenAI’s GPT-4, still encompasses versions with billions of parameters. The Llama 2 architecture, for instance, includes models scaling up to 70 billion parameters. These larger models demand significant computational power and, crucially, vast amounts of memory to load the model weights and manage the calculations involved in processing information and generating responses. A standard Llama 2 model would be utterly incapable of running within a 128MB constraint.

Therefore, the success of the experiment hinged on using or developing a highly optimized, significantly smaller iteration of the Llama architecture. This specialized version had to be tailored specifically to function under severe hardware limitations. It likely involved techniques such as model quantization (reducing the precision of the numbers used in the model’s calculations) and pruning (removing less important parts of the neural network) to drastically shrink its memory and computational footprint. Exo Labs made their adapted version available on GitHub, showcasing the specific modifications needed.

This tiny AI, running on antiquated hardware, wouldn’t possess the broad knowledge or nuanced conversational abilities of its larger, cloud-run cousins. Its capabilities would be restricted. Yet, the very fact that it could run and perform basic generative tasks represents a significant technical achievement. It demonstrates that the core concepts of large language models can, in principle, be scaled down dramatically, even if practical utility is limited at such extremes.

Andreessen’s Provocation: A Lost Timeline for Conversational Computing?

Marc Andreessen seized upon this technical demonstration to make a broader, more provocative point about the history and potential future of computing. His reflection wasn’t merely about the technical curiosity of running new software on old hardware; it was a musing on a possible alternate history of human-computer interaction.

He articulated this by suggesting that the successful operation of Llama on a 26-year-old Dell PC implies a missed opportunity spanning decades. ‘All of those old PCs could literally have been smart all this time,’ Andreessen posited. ‘We could have been talking to our computers for 30 years now.’

This statement invites us to imagine a world where the trajectory of AI development converged differently with the rise of personal computing. Instead of PCs primarily being tools for calculation, document creation, and eventually, accessing the internet, perhaps they could have evolved into conversational partners much earlier. The image conjured is one of users interacting with their Windows 95, 98, or even earlier machines through natural language, asking questions, getting assistance, and engaging in dialogue in a way that only became mainstream reality with the advent of modern digital assistants and sophisticated LLMs.

Of course, this is a significant counterfactual leap. Generative AI, as we understand it today, with its reliance on massive datasets, sophisticated neural network architectures (like the Transformer architecture underlying Llama and GPT models), and immense computational power for training, is a relatively recent phenomenon. The AI research of the 1980s and 1990s, while ambitious, focused on different paradigms, such as expert systems and symbolic reasoning. The hardware of the era, while capable of running the stripped-down Llama demonstrated by Exo Labs, was orders of magnitude less powerful than today’s systems, and the vast digital datasets needed to train capable generative models simply did not exist in an accessible form.

Andreessen acknowledged this context, noting the optimism of the 1980s AI boom: ‘A lot of smart people in the 80s thought all this was going to happen then.’ That era saw significant investment and research into artificial intelligence, but it ultimately led to an ‘AI winter’ – a period of reduced funding and interest when the technology failed to deliver on its most ambitious promises. The limitations in computational power, data availability, and algorithmic approaches were profound.

Therefore, Andreessen’s comment is perhaps best understood not as a literal claim that sophisticated, human-like AI was feasible on 1990s hardware in the way we experience it now, but rather as a thought experiment. It highlights the potential that might have been unlocked if research priorities, algorithmic breakthroughs, and hardware development had followed a different course. It underscores the idea that the building blocks for some form of intelligent interaction might have been technically achievable, even if the result would have been far simpler than today’s AI.

Contrasting Eras: From Dial-Up Dreams to AI-Infused Reality

The Windows 98 experiment serves as a stark point of contrast to the current landscape of AI integration. Today, AI is rapidly moving from a cloud-centric service to being deeply embedded within the operating system and even the hardware itself.

Microsoft’s push with Copilot and Copilot+ PCs exemplifies this trend. Windows 11 features numerous entry points for Copilot, offering AI assistance for tasks ranging from summarizing documents and drafting emails to generating images and adjusting system settings. The new Copilot+ PC specification mandates the inclusion of a Neural Processing Unit (NPU) – specialized silicon designed to accelerate AI computations efficiently. This signifies a fundamental shift where AI processing is becoming a core function of the personal computer, handled locally rather than solely relying on remote servers.

This modern approach assumes, and leverages, abundant resources. Copilot+ PCs require a minimum of 16GB of RAM and fast solid-state storage, specifications vastly exceeding the humble 128MB of the Windows 98 machine. The AI models employed, while optimized for client-side execution, are far more complex and capable than the miniature Llama version used in the experiment. They benefit from decades of algorithmic refinement, massive training datasets, and hardware specifically architected for their needs.

The contrast illuminates several points:

  1. Software Optimization vs. Bloat: The Exo Labs experiment is a testament to extreme optimization, forcing modern algorithms into a highly constrained environment. It implicitly critiques the tendency for modern software to assume ever-increasing hardware resources, sometimes leading to inefficiency or ‘bloat.’
  2. Evolution of Hardware: The sheer difference in computational power and memory between a typical 1998 PC and a 2024 Copilot+ PC is staggering, representing multiple generations of Moore’s Law and architectural innovation.
  3. Accessibility of Data: The training of modern LLMs relies on internet-scale datasets that were unimaginable in the Windows 98 era. The digital universe was simply too small and disconnected then.
  4. Algorithmic Breakthroughs: The development of architectures like the Transformer model in 2017 was a pivotal moment, enabling the scaling and performance seen in today’s generative AI. Earlier AI approaches had fundamental limitations.

While Andreessen dreams of talking computers 30 years ago, the reality is that the confluence of hardware power, data availability, and algorithmic innovation required for today’s AI experience only occurred much more recently.

What Does It All Mean? Reflections Beyond Nostalgia

Is the successful deployment of a Llama model on Windows 98 merely a clever hack, a nostalgic stunt for tech enthusiasts? Or does it hold deeper significance? It arguably serves several purposes:

  • Demonstrating Extreme Scalability: It proves that the fundamental principles behind large language models can be adapted to operate under incredibly tight resource constraints. This has potential implications for deploying AI on low-power embedded systems, IoT devices, or older hardware that remains in use in various parts of the world.
  • Highlighting the Power of Constraints: Working within severe limitations often forces innovation and efficiency. The Exo Labs team had to find creative solutions and optimize ruthlessly, skills that are valuable even in resource-rich environments.
  • Challenging Assumptions: It prompts reflection on whether all the computational power and memory used by modern applications are strictly necessary for the value they provide. Could some software be leaner and more efficient?
  • Illustrating the Contingency of Technological Paths: History rarely follows a straight line. The fact that some rudimentary AI might have been possible on older hardware underscores how different choices, research directions, or even chance discoveries could have led us down a different technological path.

This experiment doesn’t rewrite history, nor does it mean that the sophisticated AI experiences of 2024 were somehow achievable in 1998. The gap in enabling technologies – processing power, memory, data, algorithms – remains immense. However, it does provide a fascinating data point, a testament to engineering ingenuity, and a catalyst for contemplating the winding road of technological progress. It reminds us that yesterday’s limitations can sometimes be overcome with today’s knowledge, yielding surprising results and prompting us to reconsider what might be possible, both now and in the future. The ghost in the old machine whispers not just of what was, but perhaps also of untapped potential residing in simplicity and efficiency.