To celebrate Global Accessibility Awareness Day (GAAD), we’re excited to announce new updates to Android and Chrome, along with new resources for the ecosystem. Advancements in Artificial Intelligence (AI) are continuously making our world more accessible. Today, in celebration of GAAD, we’re launching new updates to our Android and Chrome products and adding new resources for developers building speech recognition tools.
More AI-Powered Android Innovations
We’re solidifying our efforts and integrating the best of Google AI and Gemini into core mobile experiences tailored for visual and auditory needs.
Get All the Details with Gemini and TalkBack
Last year, we introduced Gemini’s capabilities into TalkBack, Android’s screen reader, providing AI-generated image descriptions to people who are blind or have low vision, even when alt text isn’t available. Today, we’re expanding this Gemini integration so that people can ask questions and get responses about their images.
This means that the next time a friend sends you a picture of their new guitar, you can get a description and ask follow-up questions about the brand and color, or even what else is in the image. Now, people can also get descriptions and ask questions about their entire screen. So, if you’re shopping for the latest deals on your favorite shopping app, you can ask Gemini about the material of the product or if there are any discounts.
More specifically, this update elevates image descriptions to unprecedented levels by harnessing the power of Gemini. Users are no longer confined to static descriptions; they can engage with the images, pose specific questions, and receive nuanced answers. For instance, a user could upload a photograph of a historical landmark and inquire about its architectural style, the year it was constructed, or any other relevant details. Gemini’s intelligent processing capabilities will parse the image, extract the relevant information, and provide a comprehensive response in an easily digestible format.
Furthermore, the integration of Gemini with TalkBack transcends simple image recognition. It extends to screen content, allowing users to pose questions about the information displayed on their devices. If you’re encountering difficulty navigating a complex webpage or using an unfamiliar application, you can simply activate TalkBack and ask Gemini for clarification or guidance. Gemini will analyze the screen content, identify key elements, and provide explanations or instructions in a clear and concise manner. This interactive approach empowers visually impaired users to navigate the digital world with unprecedented confidence and independence.
Understanding More of the Emotion Behind Captions
With Expressive Captions, your phone can provide real-time captions for anything with sound on most apps on your phone — using AI to not only capture what someone says, but also how they say it. We know that one way people express themselves is through the stretching of their words, which is why we developed a new duration feature, so you can know when the sports broadcaster is yelling “amaaazing shot,” or the video message isn’t “no” but “nooooo.” You’ll also receive more sound labels so you can know when someone is whistling or clearing their throat. This new release is rolling out in English in the U.S., U.K., Canada, and Australia for devices running Android 15 and higher.
Expressive Captions revolutionize the captioning experience by capturing subtle inflections, speech tempos, and sound cues. Think about it: a simple “good” can express agreement, excitement, or sarcasm. While traditional captions are only able to record the words, Expressive Captions decipher the hidden emotions and convey them to the audience through textual cues. For example, a sigh might denote frustration or weariness, whereas a chuckle would indicate amusement or glee. By incorporating these nonverbal cues, Expressive Captions add depth and context to the viewing experience for individuals with hearing impairments or who prefer to rely on visual aids.
In addition, the duration feature in Expressive Captions adds another layer of realism and engagement. By accurately reflecting the stretching and elongating of words, captions convey the intensity and significance of the speaker’s emotions. A drawn-out “No!” communicates more resistance than a concise “No,” and a prolonged “Amazing” incites excitement and awe. This attention to detail makes the captions more compelling, informative, and relatable, fostering a deeper connection between the audience and the content they’re consuming.
Beyond emotional enhancement, Expressive Captions also include sound labels to identify and transcribe various audio cues, such as whistles, laughter, and applause. These labels add context to the captions and allow viewers to fully grasp the audio environment, even if their hearing is limited. By identifying key sound elements, Expressive Captions enable viewers to engage with and comprehend the content they’re consuming, bridging the gap between auditory and visual information.
Improving Speech Recognition Around the World
In 2019, we launched Project Euphonia to find ways to make speech recognition more accessible for people with non-standard speech. Now, we’re supporting developers and organizations around the world as they bring this work to more languages and cultural contexts.
New Developer Resources
To improve the ecosystem of tools globally, we’re providing our open-source repository for developers via Project Euphonia’s GitHub page. They can now develop personalized audio tools for research, or train their models to adapt to different speech patterns.
By providing an open-source repository, Google empowers developers, researchers, and organizations to leverage and contribute to the results of Project Euphonia. This collaborative approach accelerates the advancement of speech recognition technology for non-standard speech, ensuring its availability extends to a diverse range of languages and cultural contexts. By sharing code, datasets, and models, Google fosters a community of innovation and experimentation, creating groundbreaking solutions for assistive technology.
Furthermore, the availability of developer resources enables individuals or organizations to customize speech recognition tools to meet their specific needs. Researchers can leverage these resources to investigate different speech patterns and develop algorithms that can accurately transcribe a wide variety of speaking styles. Start-ups or small businesses can integrate them into their applications or services to enhance their inclusivity and accessibility. By lowering the barriers to entry for speech recognition technology, Google is enabling innovation and empowering developers to create meaningful solutions that enable individuals with speech impairments to communicate and interact with the world.
Supporting New Projects in Africa
Earlier this year, we partnered with Google.org to support the creation of the Centre for Digital Language Inclusion (CDLI) at University College London. CDLI is working to improve speech recognition for non-English speakers in Africa by creating open-source datasets in 10 African languages, building new speech recognition models, and continuing to support the ecosystem of organizations and developers in this space.
Google.org’s support for the Centre for Digital Language Inclusion (CDLI) is a testament to the company’s commitment to bridging the technology gap in African languages. By providing funding and resources to CDLI, Google is helping to develop more accurate and inclusive speech recognition models for the African continent. CDLI’s focus on creating large, open datasets of African languages is an essential step in training robust speech recognition systems. By collecting and annotating speech samples from African languages, the Centre for Digital Language Inclusion (CDLI) is laying the groundwork for a future of speech recognition technology that can accurately transcribe the speech of African people, regardless of their language or accent.
In addition to creating datasets, the Centre for Digital Language Inclusion (CDLI) is also dedicated to building new speech recognition models that are specifically designed for the unique linguistic characteristics of African languages. These models take into account the tonal variations, speech patterns, and vocabulary of African languages, which often differ from English and other widely studied languages. By tailoring speech recognition models to the complexities of African languages, CDLI is boosting the accuracy and reliability of speech recognition technology, so African people can access it and use it.
Most importantly, the Centre for Digital Language Inclusion (CDLI) is focusing on supporting the ecosystem of organizations and developers on the African continent. CDLI is providing training programs, mentorship opportunities, and funding resources to help build a community of skilled experts. Through facilitating the advancement of African language technology, CDLI is creating economic opportunities for African people and building a robust, inclusive digital future.
Expanding Accessibility Options for Students
Accessibility features are especially useful for students with disabilities, from using facial gestures to navigate their Chromebook with Face Controls to using Reading Mode to customize their reading experience.
Now, when you’re using the College Board’s Bluebook testing application on Chromebooks (where students can take the SAT and most Advanced Placement exams), you’ll be able to use all of Google’s built-in accessibility features. This includes the ChromeVox screen reader and dictation, as well as the College Board’s own digital testing tools.
Here’s how accessibility features can revolutionize the learning experiences of students with different disabilities:
- Students with visual impairments can leverage the ChromeVox screen reader, which verbally reads out the text on the screen, thereby granting access to written content, even if they cannot see it. ChromeVox can also provide descriptions of images, buttons, and links, enabling students to navigate the web and applications with ease.
- Students with motor impairments may find the Face Controls feature, which allows them to navigate their Chromebook using facial expressions such as a smile or raised eyebrows, incredibly useful. This hands-free control method can be a game-changer for students who are unable to use a keyboard or mouse in the traditional manner.
- Students with learning disabilities can use Reading Mode to customize their reading experience. Reading Mode allows students to adjust font sizes, colors, and spacing, making it easier for them to read text. It can also remove distractions, such as images and advertisements, enabling students to focus on the content.
Overall, Google’s accessibility features open a world of possibilities for students with disabilities. By providing tailored access and support, these tools empower students to overcome barriers, reach their full potential, and succeed academically.
Making Chrome More Accessible
Over two billion people use Chrome every day, and we’re always working to make our browser easier to use and accessible for everyone, with features like Live Caption and image descriptions for screen reader users.
Easier Access to PDFs on Chrome
Previously, if you opened a scanned PDF in the Chrome browser on your desktop, you wouldn’t be able to interact with it using a screen reader. Now with Optical Character Recognition (OCR), Chrome will automatically recognize these types of PDFs, so you can highlight, copy, and search text, and have it read aloud by a screen reader, just like any other page.
The integration of Optical Character Recognition (OCR) technology revolutionizes the way PDF files are used by individuals who are blind or have visual impairments, or who prefer to use screen readers to access content. Previously, scanned PDF files were essentially inaccessible to screen readers because they were treated as images rather than machine-readable text. This meant that individuals with visual impairments were unable to read, search, or interact with the content within the scanned PDF files.
With OCR technology, Chrome is now able to automatically analyze scanned PDFs, recognize the text within the files, and convert it into a machine-readable format. This process allows screen readers to read the text within the PDFs, making the files accessible and usable for individuals with visual impairments in the same way as any other digital document.
The benefits of OCR integration are multifold:
- Enhanced Accessibility: OCR makes previously inaccessible scanned PDF files accessible to users of screen readers. This opens up a world of possibilities for individuals who were previously unable to independently access scanned documents.
- Improved User Experience: OCR allows users to interact with scanned PDF files in the same way that they would with any other digital document. They can highlight text, copy sections, and search for specific words or phrases, enhancing their reading and research experience.
- Increased Efficiency: OCR eliminates the need for manual transcription of text from scanned PDF files. This saves time and effort, allowing users to focus on the task at hand rather than struggling to access information.
In conclusion, the integration of OCR technology in Chrome is a significant advancement that makes PDF files more accessible for individuals with visual impairments. By making previously inaccessible documents searchable, readable, and interactive, Chrome is helping to bridge the digital divide for individuals who face challenges with reading and learning.
Easy Reading with Page Zoom
Page zoom now lets you increase the size of the text you see in Chrome on Android, without impacting the web page layout or your browsing experience — just like it already works on Chrome desktop. You can customize how much you want to zoom, and easily apply your preference to all pages you visit or just specific pages.
The page zoom feature can be a game changer for individuals who have low vision or who prefer larger text for clarity and ease of reading. By allowing users to adjust the text size without affecting the layout of the webpage, Chrome ensures that the text is visually more comfortable and readable, without the risk of text overlap or broken formatting.
The benefits of the page zoom feature are:
- Improved Readability: Page zoom allows users to adjust the size of the text they see, making it easier and more enjoyable to read. This is particularly helpful for individuals with low vision, dyslexia, or other visual impairments.
- Enhanced Comfort: Page zoom allows users to customize the text size to suit their personal preferences and visual requirements. This helps to reduce eye strain and make reading more comfortable for extended periods.
- Preserved Layout: Unlike simply zooming in on the entire webpage, page zoom allows users to increase or decrease only the text size, while maintaining the integrity of the original layout. This ensures that the webpage is still easy to navigate and that all elements are positioned as intended.
- Flexible Customization: Page zoom provides a range of customization options, allowing users to fine-tune the text size to meet their specific needs. Users can select from predefined zoom levels or enter a custom value, and they can apply their preference to all webpages or just specific websites.
To get started, simply tap the three-dot menu in the top right of Chrome, and set your zoom preferences.