The Promise of AGI
In the rapidly evolving field of artificial intelligence, the concept of “artificial general intelligence” (AGI) has become a central focus. Industry leaders are increasingly suggesting that we are on the verge of creating virtual agents capable of matching, or even surpassing, human understanding and performance across a wide range of cognitive tasks. This anticipation has fueled a race among tech companies, each striving to be the first to achieve this groundbreaking milestone.
OpenAI, a major player in the AI arena, is subtly hinting at the imminent arrival of a “PhD-level” AI agent. This agent, they suggest, could operate autonomously, performing at the level of a “high-income knowledge worker.” Elon Musk, the ever-ambitious entrepreneur, has made even bolder predictions, stating that we will likely have AI “smarter than any one human” by the end of 2025. Dario Amodei, CEO of Anthropic, another prominent AI company, offers a slightly more conservative timeline but shares a similar vision, suggesting that AI could be “better than humans at almost everything” by the end of 2027. These pronouncements paint a picture of a near future where AI transcends human limitations in virtually all domains of cognitive ability.
Anthropic’s ‘Claude Plays Pokémon’ Experiment
Amidst this backdrop of ambitious predictions, Anthropic introduced its “Claude Plays Pokémon” experiment last month. This project, presented as a step toward the predicted AGI future, was described as showcasing “glimmers of AI systems that tackle challenges with increasing competence, not just through training but with generalized reasoning.” Anthropic garnered significant attention by highlighting how Claude 3.7 Sonnet’s “improved reasoning capabilities” enabled the company’s latest model to make progress in the classic Game Boy RPG, Pokémon, in ways that “older models had little hope of achieving.”
The company emphasized that Claude 3.7 Sonnet’s “extended thinking” allowed the new model to “plan ahead, remember its objectives, and adapt when initial strategies fail.” These, Anthropic argued, are “critical skills for battling pixelated gym leaders. And, we posit, in solving real-world problems too.” The implication was clear: Claude’s progress in Pokémon was not just a game; it was a demonstration of the AI’s burgeoning ability to tackle complex, real-world challenges. The experiment was framed as a tangible demonstration of AI moving beyond narrow, task-specific capabilities towards a more general and adaptable form of intelligence.
The Reality Check: Claude’s Struggles in the Game
However, the initial excitement surrounding Claude’s Pokémon performance has been tempered by a dose of reality. While Claude 3.7 Sonnet undoubtedly outperformed its predecessors, it has not achieved mastery over the game. Thousands of viewers on Twitch have witnessed Claude’s ongoing struggles, observing its frequent missteps and inefficiencies. The live stream of Claude’s gameplay has provided a transparent, and at times humbling, view of the AI’s actual capabilities.
Despite extended “thinking” pauses between moves – during which viewers can observe the system’s simulated reasoning process – Claude often finds itself:
- Revisiting completed towns: The AI frequently returns to areas it has already explored, seemingly without purpose. This suggests a lack of spatial awareness or an inability to efficiently track progress.
- Getting stuck in blind corners: Claude often becomes trapped in corners of the map for extended periods, unable to navigate its way out. This highlights difficulties in understanding the game’s 2D environment and planning effective movement.
- Repeatedly interacting with unhelpful NPCs: The AI has been observed engaging in fruitless conversations with the same non-player characters over and over again. This indicates a challenge in distinguishing between relevant and irrelevant information, and a potential inability to learn from past interactions.
These examples of distinctly sub-human in-game performance paint a picture far removed from the superintelligence envisioned by some. Watching Claude struggle with a game designed for children, it becomes difficult to imagine that we are witnessing the dawn of a new era of computer intelligence. The contrast between the projected capabilities of AGI and Claude’s actual performance in Pokémon raises questions about the current state of AI and the timeline for achieving truly general intelligence.
Lessons Learned from Sub-Human Performance
Despite its shortcomings, Claude’s current level of Pokémon performance offers valuable insights into the ongoing quest for generalized, human-level artificial intelligence. Even its struggles hold significant lessons that could inform future development efforts. The experiment, while not demonstrating AGI, provides a valuable benchmark and reveals specific areas where current AI models fall short.
In a sense, it’s remarkable that Claude can play Pokémon at all. When developing AI systems for games like Go and Dota 2, engineers typically provide their algorithms with extensive knowledge of the game’s rules and strategies, along with a reward function to guide their learning. This “supervised” approach provides the AI with a clear framework for understanding and mastering the game. In contrast, David Hershey, the developer behind the Claude Plays Pokémon project, started with an unmodified, generalized Claude model that had not been specifically trained or tuned to play Pokémon games.
Hershey explained to Ars, “This is purely the various other things that [Claude] understands about the world being used to point at video games.” He added, “So it has a sense of a Pokémon. If you go to claude.ai and ask about Pokémon, it knows what Pokémon is based on what it’s read… If you ask, it’ll tell you there’s eight gym badges, it’ll tell you the first one is Brock… it knows the broad structure.” This “unsupervised” approach, relying on Claude’s pre-existing knowledge base, makes the AI’s ability to play the game, even imperfectly, all the more impressive.
The Challenges of Visual Interpretation in a Pixelated World
In addition to monitoring key Game Boy RAM addresses for game state information, Claude interprets the game’s visual output much like a human player would. However, despite recent advancements in AI image processing, Claude still struggles to interpret the low-resolution, pixelated world of a Game Boy screenshot with the same accuracy as a human. The visual fidelity of the Game Boy presents a unique challenge for AI models trained primarily on high-resolution images and text.
“Claude’s still not particularly good at understanding what’s on the screen at all,” Hershey admitted. “You will see it attempt to walk into walls all the time.” This difficulty in accurately perceiving the game environment is a major contributor to Claude’s navigational struggles.
Hershey suspects that Claude’s training data likely lacks detailed textual descriptions of images resembling Game Boy screens. This means that, somewhat counterintuitively, Claude might actually perform better with more realistic imagery. The lack of specific training data for low-resolution, pixelated graphics creates a gap in Claude’s understanding of the visual world of Pokémon.
“It’s one of those funny things about humans that we can squint at these eight-by-eight pixel blobs of people and say, ‘That’s a girl with blue hair,’” Hershey noted. “People, I think, have that ability to map from our real world to understand and sort of grok that… so I’m honestly kind of surprised that Claude’s as good as it is at being able to see there’s a person on the screen.” This highlights the inherent human ability to interpret and abstract visual information, even when it is degraded or incomplete, a capability that current AI models are still striving to replicate.
Different Strengths, Different Weaknesses: A Comparison to Human Play
Even with perfect visual interpretation, Hershey believes Claude would still struggle with 2D navigation challenges that are trivial for humans. “It’s pretty easy for me to understand that [an in-game] building is a building and that I can’t walk through a building,” he said. “And that’s [something] that’s pretty challenging for Claude to understand… It’s funny because it’s just kind of smart in different ways, you know?” This difference in cognitive strengths and weaknesses underscores the fundamental differences between human intelligence and current AI models.
Where Claude excels, according to Hershey, is in the more text-based aspects of the game. During battles, Claude readily notices when the game indicates that an electric-type Pokémon’s attack is “not very effective” against a rock-type opponent. It then stores this information in its vast written knowledge base for future reference. Claude can also integrate multiple pieces of knowledge into sophisticated battle strategies, even extending these strategies into long-term plans for catching and managing teams of Pokémon. This demonstrates Claude’s strength in processing and utilizing textual information, a capability that often surpasses human performance in terms of speed and recall.
Claude even demonstrates surprising “intelligence” when the game’s text is intentionally misleading or incomplete. Hershey cited an early-game task where the player is told to find Professor Oak next door, only to discover he’s not there. “As a 5-year-old, that was very confusing to me,” Hershey said. “But Claude actually typically goes through that same set of motions where it talks to mom, goes to the lab, doesn’t find [Oak], says, ‘I need to figure something out’… It’s sophisticated enough to sort of go through the motions of the way [humans are] actually supposed to learn it, too.” This ability to adapt to incomplete or misleading information, and to infer the correct course of action, is a significant step towards more human-like reasoning.
These contrasting strengths and weaknesses, compared to human-level play, reflect the overall state of AI research and capabilities, Hershey explained. “I think it’s just a sort of universal thing about these models… We built the text side of it first, and the text side is definitely… more powerful. How these models can reason about images is getting better, but I think it’s a decent bit behind.” This highlights the ongoing development of AI, with text-based reasoning currently outpacing visual interpretation and spatial reasoning.
The Limits of Memory and the Context Window
Beyond challenges with visual and textual interpretation, Hershey acknowledged that Claude struggles with “remembering” what it has learned. The current model has a “context window” of 200,000 tokens, which limits the amount of relational information it can store in its “memory” at any given time. When the system’s expanding knowledge base fills this window, Claude undergoes an elaborate summarization process, condensing detailed notes into shorter summaries that inevitably lose some fine-grained details. This limitation in memory capacity and the loss of information during summarization contribute to Claude’s difficulties in maintaining a consistent and comprehensive understanding of the game state.
This can lead to Claude “having a hard time keeping track of things for a very long time and really having a great sense of what it’s tried so far,” Hershey said. “You will definitely see it occasionally delete something that it shouldn’t have. Anything that’s not in your knowledge base or not in your summary is going to be gone, so you have to think about what you want to put there.” This highlights the challenge of managing and prioritizing information within a limited memory capacity, a problem that is less pronounced in human cognition.
The Perils of Incorrect Information and Confirmation Bias
More problematic than forgetting important information is Claude’s tendency to inadvertently insert incorrect information into its knowledge base. Like a conspiracy theorist building a worldview on a flawed premise, Claude can be remarkably slow to recognize when an error in its self-authored knowledge base is leading its Pokémon play astray. This demonstrates a form of confirmation bias, where the AI prioritizes information that confirms its existing beliefs, even if those beliefs are incorrect.
“The things that are written down in the past, it sort of trusts pretty blindly,” Hershey said. “I have seen it become very convinced that it found the exit to [in-game location] Viridian Forest at some specific coordinates, and then it spends hours and hours exploring a little small square around those coordinates that are wrong instead of doing anything else. It takes a very long time for it to decide that that was a ‘fail.’” This highlights the difficulty of correcting errors in the AI’s internal knowledge representation, and the potential for these errors to significantly impact performance.
Despite these challenges, Hershey noted that Claude 3.7 Sonnet is significantly better than earlier models at “questioning its assumptions, trying new strategies, and keeping track over long horizons of various strategies to [see] whether they work or not.” While the new model still “struggles for really long periods of time” retrying the same actions, it ultimately tends to “get a sense of what’s going on and what it’s tried before, and it stumbles a lot of times into actual progress from that,” Hershey said. This indicates progress in the AI’s ability to learn from its mistakes and adapt its strategies, albeit at a slower pace than a human player.
The Path Forward: Incremental Improvements and Future Challenges
One of the most fascinating aspects of observing Claude Plays Pokémon across multiple iterations, Hershey said, is seeing how the system’s progress and strategy can vary significantly between runs. Sometimes, Claude demonstrates its “capable of actually building a pretty coherent strategy” by “keeping detailed notes about the different paths to try,” he explained. But “most of the time it doesn’t… most of the time, it wanders into the wall because it’s confident it sees the exit.” This variability in performance highlights the stochastic nature of the AI’s learning process and the challenges of achieving consistent and reliable results.
One of the major limitations of the current version of Claude, according to Hershey, is that “when it derives that good strategy, I don’t think it necessarily has the self-awareness to know that one strategy [it] came up with is better than another.” And that, he acknowledged, is not a trivial problem to solve. The ability to evaluate and compare different strategies, and to consciously select the most effective one, is a key aspect of human intelligence that is currently lacking in Claude.
Nevertheless, Hershey sees “low-hanging fruit” for improving Claude’s Pokémon play by enhancing the model’s understanding of Game Boy screenshots. “I think there’s a chance it could beat the game if it had a perfect sense of what’s on the screen,” he said, suggesting that such a model would likely perform “a little bit short of human.” This suggests that improvements in visual interpretation could lead to significant gains in overall performance.
Expanding the context window for future Claude models will also likely enable them to “reason over longer time frames and handle things more coherently over a long period of time,” Hershey added. Future models will improve by getting “a little bit better at remembering, keeping track of a coherent set of what it needs to try to make progress,” he said. These improvements in memory capacity and information management are crucial for enabling the AI to develop more complex and long-term strategies.
While the prospect of impending improvements in AI models is undeniable, Claude’s current Pokémon performance does not suggest that it is on the verge of ushering in an era of human-level, fully generalizable artificial intelligence. Hershey conceded that watching Claude 3.7 Sonnet get stuck on Mt. Moon for 80 hours can make it “seem like a model that doesn’t know what it’s doing.” The extended periods of seemingly aimless wandering and repeated mistakes highlight the significant gap between current AI capabilities and the ambitious goals of AGI.
However, Hershey remains impressed by the occasional glimmers of awareness that Claude’s new reasoning model displays, noting that it will sometimes “kind of tell that it doesn’t know what it’s doing and know that it needs to be doing something different. And the difference between ‘can’t do it at all’ and ‘can kind of do it’ is a pretty big one for these AI things for me,” he continued. “You know, when something can kind of do something it typically means we’re pretty close to getting it to be able to do something really, really well.” This suggests that even incremental progress in AI capabilities can represent significant advancements towards the ultimate goal of creating truly intelligent machines. The “Claude Plays Pokémon” experiment, while not demonstrating AGI, provides valuable insights into the current state of AI and the challenges that lie ahead. It serves as a reminder that the path to general intelligence is complex and iterative, and that even seemingly simple tasks can reveal the profound differences between human cognition and current AI models.