Unveiling Gemini: Google’s Next-Gen AI Family
Gemini is Google’s ambitious project in the next generation of AI models. A collaborative effort between DeepMind and Google Research, Gemini is not a single entity, but a family of models, each designed for specific tasks and performance levels. This family includes:
- Gemini Ultra: The most powerful model, designed for highly complex tasks. (Currently unavailable)
- Gemini Pro: A robust model, smaller than Ultra, capable of a wide range of tasks. Gemini 2.0 Pro, the latest iteration, is currently Google’s flagship.
- Gemini Flash: A streamlined, ‘distilled’ version of Pro, prioritizing speed and efficiency.
- Gemini Flash-Lite: A slightly reduced and quicker version of Gemini Flash.
- Gemini Flash Thinking: A model showcasing ‘reasoning’ abilities.
- Gemini Nano: Comprising two compact models, Nano-1 and the slightly more potent Nano-2, designed for offline operation on devices.
A key feature of all Gemini models is their inherent multimodality. Unlike models trained only on text, like Google’s LaMDA, Gemini models can process and analyze various data types. They’ve been trained on a massive dataset of public, proprietary, and licensed audio, images, videos, codebases, and text in multiple languages.
This multimodality allows Gemini to surpass the limitations of text-only models. While LaMDA is restricted to text input and output, Gemini models, especially newer versions of Flash and Pro, can natively generate images and audio alongside text.
However, the ethical and legal implications of training AI models on public data, often without explicit consent, remain complex. While Google offers an AI indemnification policy for some Google Cloud customers, it has limitations. Users, especially those using Gemini commercially, should be cautious.
Gemini Apps vs. Gemini Models: Understanding the Distinction
It’s important to distinguish between the Gemini models and the Gemini apps available on web and mobile platforms (formerly Bard).
The Gemini apps act as clients, connecting to various Gemini models and providing a user-friendly, chatbot-like interface. They are the front end for interacting with Google’s generative AI.
On Android, the Gemini app replaces the Google Assistant app. On iOS, the Google and Google Search apps serve as Gemini clients.
Android users can invoke a Gemini overlay to ask questions about on-screen content, like a YouTube video. This is triggered by holding a supported phone’s power button or saying ‘Hey Google.’
The Gemini apps accept images, voice commands, and text as input. They can process files like PDFs, uploaded directly or from Google Drive, and generate images. Conversations started with Gemini apps on mobile sync with Gemini on the web, if logged into the same Google Account.
Gemini Advanced: Unlocking Premium AI Features
The Gemini apps aren’t the only way to use Gemini models. Google is integrating Gemini-powered features into its core apps and services, including Gmail and Google Docs.
To fully utilize these, users typically need the Google One AI Premium Plan. This plan, technically part of Google One, costs $20/month and grants access to Gemini within Google Workspace apps like Docs, Maps, Slides, Sheets, Drive, and Meet. It also unlocks ‘Gemini Advanced,’ providing access to Google’s more sophisticated Gemini models within the Gemini apps.
Gemini Advanced users get benefits like priority access to new features and models, the ability to execute and modify Python code within Gemini, and expanded limits for NotebookLM, Google’s tool for turning PDFs into AI-generated podcasts. A recent addition is a memory feature that stores user preferences and lets Gemini reference past conversations for context.
One exclusive Gemini Advanced feature is ‘Deep Research.’ This uses Gemini models with enhanced reasoning to generate detailed briefs. For a prompt like ‘How should I redesign my kitchen?’, Deep Research creates a multi-step research plan, searches the web, and compiles a comprehensive answer.
In Gmail, Gemini resides in a side panel, composing emails and summarizing threads. A similar panel in Docs helps with writing, refinement, and brainstorming. In Slides, Gemini generates slides and custom images. In Google Sheets, it aids in data tracking, organization, and formula creation.
Gemini is also in Google Maps, aggregating reviews and offering recommendations, like itinerary suggestions. The chatbot’s capabilities extend to Drive, summarizing files and folders.
Gemini has recently been integrated into Google’s Chrome browser as an AI writing tool. This tool can create new content or rewrite existing text, considering the context of the current web page.
Beyond these, traces of Gemini are in Google’s database products, cloud security tools, and app development platforms (including Firebase and Project IDX). It also powers features in apps like Google Photos (natural language search), YouTube (video idea brainstorming), and Meet (caption translation).
Code Assist (formerly Duet AI for Developers), Google’s AI-powered tools for code completion and generation, uses Gemini for computationally intensive tasks. Google’s security products, like Gemini in Threat Intelligence, use Gemini to analyze malicious code and facilitate natural language searches for threats.
Gemini Extensions and Gems: Tailoring the AI Experience
Gemini Advanced users can create ‘Gems,’ custom chatbots powered by Gemini models, accessible on desktop and mobile. Gems can be generated from natural language descriptions, like ‘You’re my running coach. Give me a daily running plan,’ and can be shared or kept private.
The Gemini apps can integrate with various Google services through ‘Gemini extensions.’ These allow Gemini to interact with Drive, Gmail, YouTube, and others, responding to queries like ‘Could you summarize my last three emails?’
Gemini Live: Engaging in In-Depth Voice Conversations
‘Gemini Live’ offers an immersive experience, allowing users to have detailed voice conversations with Gemini. This feature is available within the Gemini apps on mobile and on the Pixel Buds Pro 2, accessible even when the phone is locked.
With Gemini Live, users can interrupt Gemini while it’s speaking to ask clarifying questions, and the chatbot adapts to speech patterns in real-time. Live is also designed as a virtual coach, assisting with event preparation, brainstorming, and other tasks. For example, Live can suggest skills to highlight during a job interview and provide public speaking tips.
Gemini for Teens: A Tailored AI Experience for Students
Google provides a specialized Gemini experience for teenage students.
This teen-focused version incorporates ‘additional policies and safeguards,’ including a customized onboarding process and an AI literacy guide. Apart from these, it closely resembles the standard Gemini experience, including the ‘double-check’ feature that verifies Gemini’s responses by cross-referencing information on the web.
Exploring the Capabilities of the Gemini Models
The multimodal nature of the Gemini models allows them to perform a wide range of tasks, from speech transcription to real-time image and video captioning. Many of these capabilities are already in Google’s products, with more advancements promised.
However, Google, like its competitors, hasn’t fully addressed inherent challenges with generative AI, such as encoded biases and the tendency to fabricate information (hallucinations). These limitations should be considered when using Gemini, especially for critical applications.
Gemini Pro’s Prowess
Google claims its latest Pro model, Gemini 2.0 Pro, is its most advanced for coding and handling complex prompts. 2.0 Pro surpasses its predecessor, Gemini 1.5 Pro, in benchmarks assessing programming, reasoning, math, and factual accuracy.
Within Google’s Vertex AI platform, developers can customize Gemini Pro for specific contexts through fine-tuning or ‘grounding.’ For example, Pro (and other Gemini models) can be instructed to use data from third-party providers like Moody’s, Thomson Reuters, ZoomInfo, and MSCI, or to source information from corporate datasets or Google Search instead of its broader knowledge base. Gemini Pro can also connect to external, third-party APIs to perform actions, like automating back-office workflows.
Google’s AI Studio platform provides templates for creating structured chat prompts with Pro. Developers can control the model’s creative range, provide examples to guide tone and style, and fine-tune Pro’s safety settings.
Gemini Flash: Lightweight Efficiency and Gemini Flash Thinking’s Reasoning Abilities
Gemini 2.0 Flash, is capable of using Google search and other external APIs. Even though it is smaller, it outperforms some of the larger 1.5 models on benchmarks measuring coding and image analysis. As a derivative of Gemini Pro, Flash is designed for efficiency, targeting narrow, high-frequency generative AI tasks.
Google highlights Flash’s suitability for applications such as summarization, chat applications, image and video captioning, and data extraction from lengthy documents and tables. Meanwhile, Gemini 2.0 Flash-Lite, a more compact iteration of Flash, surpasses Gemini 1.5 Flash in performance while maintaining the same price and speed, according to Google.
In December of the previous year, Google introduced a ‘thinking’ variant of Gemini 2.0 Flash, with ‘reasoning’ capabilities. This AI model takes a few seconds to work backward through a problem before answering, potentially enhancing reliability.
Gemini Nano: On-Device AI Power
Gemini Nano is a compact version of Gemini, designed to operate directly on compatible devices, eliminating the need to send tasks to a remote server. Currently, Nano powers features on the Pixel 8 Pro, Pixel 8, Pixel 9 Pro, Pixel 9, and Samsung Galaxy S24, including Summarize in Recorder and Smart Reply in Gboard.
The Recorder app, which records and transcribes audio, incorporates a Gemini-powered summarization feature for recorded conversations, interviews, presentations, and other audio. These summaries are generated even without a network connection, and no data leaves the user’s device.
Nano also powers Smart Reply in Gboard, Google’s keyboard replacement. This feature suggests responses in messaging apps like WhatsApp, streamlining conversations.
A future Android iteration will use Nano to alert users to potential scams during phone calls. The new weather app on Pixel phones uses Gemini Nano to generate personalized weather reports. TalkBack, Google’s accessibility service, uses Nano to create aural descriptions of objects for users with visual impairments.
Gemini Ultra: Awaiting its Return
Gemini Ultra has been relatively absent recently. The model is not currently available within the Gemini apps, nor is it listed on Google’s Gemini API pricing page. However, Google may reintroduce Ultra in the future.
Pricing Structure for the Gemini Models
Gemini 1.5 Pro, 1.5 Flash, 2.0 Flash, and 2.0 Flash-Lite are accessible through Google’s Gemini API for developing applications. They operate on a pay-as-you-go basis. The base pricing, excluding add-ons, as of February 2025, is:
- Gemini 1.5 Pro: $1.25 per 1 million input tokens (for prompts up to 128K tokens) or $2.50 per 1 million input tokens (for prompts longer than 128K tokens); $5 per 1 million output tokens (for prompts up to 128K tokens) or $10 per 1 million output tokens (for prompts longer than 128K tokens)
- Gemini 1.5 Flash: 7.5 cents per 1 million input tokens (for prompts up to 128K tokens), 15 cents per 1 million input tokens (for prompts longer than 128K tokens), 30 cents per 1 million output tokens (for prompts up to 128K tokens), 60 cents per 1 million output tokens (for prompts longer than 128K tokens)
- Gemini 2.0 Flash: 10 cents per 1 million input tokens, 40 cents per 1 million output tokens. For audio, 70 cents per 1 million input tokens.
- Gemini 2.0 Flash-Lite: 7.5 cents per 1 million input tokens, 30 cents per 1 million output tokens.
Tokens represent subdivided units of raw data, like the syllables ‘fan,’ ‘tas,’ and ‘tic’ in ‘fantastic.’ One million tokens are roughly 750,000 words. ‘Input’ refers to tokens fed into the model, while ‘output’ denotes tokens generated by the model.
Pricing for 2.0 Pro is yet to be announced, and Nano remains in early access.
Gemini’s Potential Arrival on the iPhone
The prospect of Gemini’s integration with iPhones is a distinct possibility.
Apple has indicated that it’s engaged in discussions to potentially utilize Gemini and other third-party models for various features within its Apple Intelligence suite. Following a keynote presentation at WWDC 2024, Apple SVP Craig Federighi confirmed plans to collaborate with models, including Gemini, but refrained from divulging further specifics.