Building Multimodal RAG Apps: Amazon Bedrock Data Automation

Building Multimodal RAG Apps: Amazon Bedrock Data Automation and Amazon Bedrock Knowledge Bases Applications

Organizations today deal with vast amounts of unstructured data in various formats, including documents, images, audio files, and video files. Extracting meaningful insights from this diverse data traditionally required intricate processing pipelines and significant development effort. However, generative AI technologies are revolutionizing this domain, offering powerful capabilities to automate the processing, analysis, and insight extraction from these varied document formats, significantly reducing manual effort while enhancing accuracy and scalability.

With Amazon Bedrock Data Automation and Amazon Bedrock Knowledge Bases, you can now easily build powerful multimodal RAG applications. Together, they enable organizations to efficiently process, organize, and retrieve information from their multimodal content, transforming how they manage and leverage unstructured data.

This article will guide you through building a full-stack application that uses Amazon Bedrock Data Automation to process multimodal content, stores the extracted information in an Amazon Bedrock Knowledge Base, and enables natural language querying through a RAG-based question-answering interface.

Real-World Use Cases

The integration of Amazon Bedrock Data Automation and Amazon Bedrock Knowledge Bases provides powerful solutions for handling large volumes of unstructured data across various industries, as illustrated by the following examples:

  • In healthcare, organizations need to process a large volume of patient records, including medical forms, diagnostic images, and consultation audio recordings. Amazon Bedrock Data Automation can automate the extraction and structuring of this information, while Amazon Bedrock Knowledge Bases allows medical professionals to use natural language queries, such as “What was the patient’s last blood pressure reading?” or “Show the treatment history for diabetes patients.”
  • Financial institutions deal with thousands of documents daily, ranging from loan applications to financial statements. Amazon Bedrock Data Automation can extract key financial metrics and compliance information, while Amazon Bedrock Knowledge Bases allows analysts to ask questions like, “What risk factors were mentioned in the latest quarterly report?” or “Show all loan applications with high credit scores.”
  • Law firms need to handle a large volume of case files, including court documents, evidentiary photographs, and witness testimonies. Amazon Bedrock Data Automation can process these diverse sources, while Amazon Bedrock Knowledge Bases allows lawyers to query, “What evidence was presented regarding the events on March 15?” or “Find all witnesses statements that mention the defendant.”
  • Media companies can use this integration to achieve intelligent contextual advertising. Amazon Bedrock Data Automation processes video content, subtitles, and audio to understand the scene context, dialog, and sentiment, while also analyzing ad assets and campaign requirements. Then, Amazon Bedrock Knowledge Bases allows for complex queries to match ads with relevant content moments, such as “Find positive outdoor activity scenes for ads with sports equipment” or “Identify travel ad segments that discuss tourism.” This intelligent contextual matching provides more relevant and effective ad placements while maintaining brand safety.

These examples showcase how the combination of Amazon Bedrock Data Automation’s extraction capabilities with Amazon Bedrock Knowledge Bases’ natural language querying can transform how organizations interact with their unstructured data.

Solution Overview

This comprehensive solution demonstrates Amazon Bedrock’s advanced capabilities in processing and analyzing multimodal content (documents, images, audio files, and video files), and achieves this through three key components: Amazon Bedrock Data Automation, Amazon Bedrock Knowledge Bases, and foundation models accessible through Amazon Bedrock. Users can upload various types of content, including audio files, images, videos or PDFs, for automated processing and analysis.

Upon content upload, Amazon Bedrock Data Automation processes it using either standard or custom blueprints to extract valuable insights. The extracted information is stored in JSON format within an Amazon Simple Storage Service (Amazon S3) bucket, while job statuses are tracked through Amazon EventBridge and persisted in Amazon DynamoDB. Custom parsing of the extracted JSON is performed to create knowledge base-compatible documents, which are then stored in Amazon Bedrock Knowledge Bases and indexed.

Through an intuitive user interface, the solution displays both the uploaded content and the extracted information. Users can interact with the processed data through a Retrieval Augmented Generation (RAG)-based question-answering system powered by Amazon Bedrock foundation models. This integrated approach allows organizations to efficiently process, analyze, and gain insights from various content formats, all while using a robust and scalable infrastructure deployed using the AWS Cloud Development Kit (AWS CDK).

Architecture

The following architecture diagram illustrates the flow of the solution:

  1. Users interact with the front-end application, authenticating via Amazon Cognito.
  2. API requests are handled by Amazon API Gateway and AWS Lambda functions.
  3. Files are uploaded to an S3 bucket for processing.
  4. Amazon Bedrock Data Automation processes the files and extracts information.
  5. EventBridge manages job statuses and triggers post-processing.
  6. Job statuses are stored in DynamoDB, and processed content is stored in Amazon S3.
  7. Lambda functions parse the processed content and index it in Amazon Bedrock Knowledge Bases.
  8. A RAG-based question-answering system uses Amazon Bedrock foundation models to answer user queries.

Prerequisites

Backend

For the backend, you need the following prerequisites:

  • An AWS account.
  • Python 3.11 or later.
  • Docker.
  • GitHub (if using the code repository).
  • AWS CDK. See Getting Started with the AWS CDK for more details and prerequisites.
  • Access to the following foundation models enabled in Amazon Bedrock:
    • Anthropic’s Claude 3.5 Sonnet v2.0
    • Amazon Nova Pro v1.0
    • Anthropic’s Claude 3.7 Sonnet v1.0

Frontend

For the frontend, you need the following prerequisites:

  • Node/npm: v18.12.1
  • A deployed backend.
  • At least one user added to the relevant Amazon Cognito user pool (required for authenticated API calls).

Everything you need is available as open source code in our GitHub repository.

Deployment Guide

This sample application codebase is organized into the following key folders:

  • backend/: Contains the application’s backend infrastructure code built with the AWS CDK. This folder includes the Python scripts that define the AWS resources needed to run the application, such as AWS Lambda functions, Amazon API Gateway endpoints, Amazon S3 buckets, Amazon DynamoDB tables, Amazon EventBridge rules, Amazon Bedrock Data Automation configurations, and Amazon Bedrock Knowledge Base configurations.
  • frontend/: Contains the React-based frontend application source code. It houses the user interface components, API call logic, and state management necessary for interacting with the backend services.
  • docs/: Contains reference documentation and architecture diagrams.

Step 1: Clone the Repository:

Clone the GitHub repository containing the solution code: