Gemini Live Camera Mode: AI Future on iOS

The arrival of Gemini Live’s camera mode marks a significant step forward in the evolution of artificial intelligence, bringing a tangible piece of the future directly to our fingertips. While early adopters with Pixel 9 and Samsung Galaxy S25 devices have enjoyed this innovative feature for some time, Google’s recent announcement at its I/O conference expands access to a much wider audience, encompassing both Android and iOS users. This development is particularly exciting for iPhone owners, who can now experience one of the most compelling AI functionalities currently available, especially considering that the camera mode was initially rolled out to other Android users back in April.

Unveiling the Power of Sight: How Gemini’s Camera Mode Works

At its core, Gemini Live’s camera mode grants the AI the ability to "see," enabling it to recognize and identify objects placed within the camera’s field of view. This is not merely a superficial gimmick; it’s a powerful tool that allows users to interact with their surroundings in a more intuitive and informative way.

Beyond simple object recognition, Gemini can also answer questions about the identified items, providing context and insights on demand. Furthermore, users can share their screen with Gemini, allowing the AI to analyze and identify elements displayed on their phone’s screen. To initiate a live session with the camera mode, users simply enable the live camera view, allowing them to engage in a conversation with the chatbot about anything the camera captures.

First Impressions: A Test Drive with Gemini Live

During my initial testing phase with Gemini Live on a Pixel 9 Pro XL, I was thoroughly impressed by its capabilities. One particularly memorable experience involved asking Gemini to locate my misplaced scissors.

The AI responded with remarkable accuracy: "I just spotted your scissors on the table, right next to the green package of pistachios. Do you see them?"

To my surprise, Gemini was spot on. The scissors were exactly where it indicated, despite the fact that I had only briefly passed the camera in front of them during a 15-minute live session where I was essentially giving the AI chatbot a tour of my apartment.

Intrigued by this initial success, I eagerly explored the camera mode further. In another, more extended test, I activated the feature and began walking through my apartment, prompting Gemini to identify the objects it saw. It accurately recognized various items, including fruit, ChapStick, and other everyday objects. The rediscovery of my scissors, however, remained the most striking demonstration of its capabilities.

The fact that Gemini identified the scissors without any prior prompting was particularly impressive. The AI had silently recognized them at some point during the session and accurately recalled their location with remarkable precision. This experience truly felt like a glimpse into the future, prompting me to conduct further investigations into its potential.

Drawing Inspiration: Google’s Vision for Live Video AI

My experimentation with Gemini Live’s camera feature mirrored the demo showcased by Google the previous summer, which offered a first look at these live video AI capabilities. The demo featured Gemini reminding the user where they had left their glasses, a seemingly too-good-to-be-true feat. However, as I discovered, this level of accuracy was indeed achievable.

Gemini Live is capable of recognizing far more than just household items. Google claims that it can assist users in navigating crowded train stations or identifying the fillings in pastries. It can also provide deeper insights into artwork, such as its origin and whether it’s a limited edition piece.

This functionality goes beyond that of a regular Google Lens. You can have a conversation with the AI, which is far more conversational than Google Assistant.

Google has also released a YouTube video demonstrating the feature, and it now has its own page on the Google Store.

To begin, start Gemini, switch on the camera, and start talking.

Gemini Live builds on Google’s Project Astra, which was initially presented last year and is perhaps the company’s biggest "we’re in the future" feature, an experimental next step for generative AI capabilities, beyond simply typing or even speaking prompts into a chatbot like ChatGPT, Claude, or Gemini.
AI companies are continuously improving the capabilities of AI tools, from video creation to basic processing power. Apple’s Visual Intelligence, which the iPhone maker released in beta last year, is comparable to Gemini Live.

Gemini Live has the potential to revolutionise how we connect with the environment by merging our digital and physics surroundings as we simply hold the camera in front of anything. This integration could redefine interactions with everyday objects, turning smartphones into intelligent guides and aides. Imagine assisting visually impaired individuals by describing their surroundings in real-time or helping tourists understand complex historical sites. The possibilities stretch far beyond typical applications.

Putting Gemini Live to the Test: Real-World Scenarios

The first time I used it, Gemini accurately recognised a very specific gaming collectible of a stuffed rabbit in my camera’s view. The second time, I showed it to a friend in an art gallery. It immediately recognised the tortoise on a cross (don’t ask me) and identified and translated the kanji right next to it, giving both of us chills and leaving us slightly creeped out. In a positive way, I believe.

I began to consider how I might stress-test the function. When I attempted to screen-record it in action, it consistently failed. What if I strayed from the usual path? I am a big fan of the horror genre (films, television series, and video games) and possess a plethora of collectibles, trinkets, and other items. How well would it perform with more obscure items, like as my horror-themed collectibles? This curiosity led to a deep dive into the quirky side of object recognition.

First, I must state that Gemini can be both unbelievably amazing and incredibly irritating in the same round of questions. I had around 11 objects that I wanted Gemini to identify, and the longer the live session lasted, the worse it grew, so I had to limit sessions to one or two objects. In my opinion, Gemini attempted to use contextual information from previously recognised items to guess at new ones, which makes sense to some extent, but eventually benefited neither me nor it. The implication here is that the AI becomes overly reliant on previous data, clouding its judgments on new information.

Sometimes, Gemini was quite accurate, providing the correct answers easily and without confusion, although this happened more frequently with more recent or popular objects. I was surprised, for example, when it immediately deduced that one of my test objects was not only from Destiny 2, but also a limited edition from a seasonal event from the previous year. The precision impressed me and hinted at the AI’s potential for detailed and up-to-date recognition capabilities.

Gemini would frequently be completely off the mark, requiring me to provide further hints in order to come near to the proper answer. Sometimes, it appeared as though Gemini was utilising context from my previous live sessions to generate responses, identifying multiple objects as coming from Silent Hill when they were not. I have a display case devoted to the game series, so I can understand why it would want to dip into that area swiftly. This emphasizes a critical limitation: Gemini’s tendency to latch onto themes once established, even when inappropriate.

Unveiling Imperfections: Bugs and Quirks in the System

Gemini can be completely bugged out at times. On occasion, Gemini misidentified one of the objects as a fictitious character from the unreleased Silent Hill: f game, clearly combining parts of different titles into something that never existed. This highlighted the AI’s struggle to differentiate real items from fictional creations and its susceptibility to hallucinating details. When Gemini gave an incorrect answer, and I corrected it and gave it a closer hint at the answer—or simply gave it the answer—only to have it repeat the incorrect answer as if it were a new guess, was the other consistent bug I encountered. When that occurred, I would close the session and begin a new one, which was not always helpful. A lack of immediate learning and adaptive feedback posed as a recurring flaw.

One technique I discovered was that some discussions were more effective than others. If I went through my Gemini conversation list, tapped an old chat that had gotten a particular item correct, and then went live again from that chat, it would be able to identify the items without any problems. While this is not always unexpected, it was intriguing to note that certain dialogues performed better than others, even when using the same language. This pointed to a possible inconsistency in the AI’s memory or recall mechanism, warranting further investigation.

Google did not respond to my inquiries for additional information on how Gemini Live works. This lack of transparency made it difficult to understand the underlying algorithms and internal processes that drove Gemini Live’s functionality.

I wanted Gemini to successfully answer my challenging, sometimes highly specific questions, so I offered plenty of hints to help it do so. The nudges proved useful, but not always. This demonstrates the ongoing necessity of human guidance in interacting with AI, highlighting the current limitations of fully autonomous recognition.

A Transformative Technology: Gemini Live’s Potential Impact

Gemini Live represents a paradigm shift in how we interact with our surroundings, seamlessly merging the digital and physical realms through the lens of our cameras. While the technology is still in its early stages, its potential applications are vast and transformative. The promise of augmenting our perception and understanding of the real world holds remarkable weight.

Imagine using Gemini Live to:

  • Navigate unfamiliar environments: Simply point your camera at street signs or landmarks, and Gemini will provide real-time directions and information. Imagine tourists easily navigating foreign cities or commuters quickly finding their way through bustling train stations, empowering them to experience the world with confidence.
  • Learn about historical artifacts: When visiting a museum, use Gemini to identify and provide context for artwork and historical objects. Think of enhancing the educational experience, making museums more engaging through dynamic narratives surrounding art and historical pieces – bridging the gap between objects and stories.
  • Cook complex recipes: Ask Gemini to guide you through each step of a recipe, identifying ingredients and suggesting alternative techniques. This transforms the kitchen into an interactive learning space, allowing users to explore new culinary skills with AI support enhancing food enthusiasts’ creative and practical skills.
  • Diagnose simple household problems: Point your camera at a malfunctioning appliance, and Gemini will provide troubleshooting tips and potential solutions. This could relieve homeowners from dependence on professional technicians for minor malfunctions, promoting self-sufficiency and resource management.

These are just a few examples of the myriad ways in which Gemini Live can enhance our daily lives. As the technology continues to evolve and improve, its potential to revolutionize how we interact with the world around us is truly limitless. Future developments will undoubtedly integrate more refined contextual awareness and improved adaptive learning curves, solidifying its place in our daily routines.

The integration of Gemini Live into iOS devices further expands its reach and accessibility, bringing the power of AI-powered vision to a wider audience. As AI technology continues to advance at an exponential rate, features like Gemini Live offer a glimpse into a future where our devices are not only tools for communication and entertainment but also intelligent companions that can help us navigate, understand, and interact with the world around us in new and meaningful ways. This seamless integration has the potential to redefine human interaction with technology, shaping the way we live, learn, and experience our world. Furthermore, the improved iterations of such technologies could potentially reshape the workplace, bringing efficiency and accuracy in various sectors, especially in environments requiring detail-oriented observation and quick analytical skills.