Identity Document Processing (IDP) Solution with Amazon Bedrock and Claude-3

License

This project is licensed under the MIT-0 License. See the LICENSE file for details.

Identity Document Processing (IDP) Solution with Amazon Bedrock and Claude-3

The IDP Solution is a serverless application that automates the processing of identity documents using Amazon Bedrock's Claude-3 model. It provides intelligent document classification and data extraction capabilities, with specific optimization for birth certificate processing.

The solution leverages AWS services to create a scalable document processing pipeline. When users upload identity documents to an S3 bucket, the system automatically classifies the document type and, for birth certificates, extracts detailed information from the application forms. The extracted data is stored in DynamoDB tables for easy access and management. This automation significantly reduces manual data entry efforts and improves accuracy in document processing workflows.

This code is linked to an AWS blog post

Repository Structure

idp-genai/
├── lambda/                          # Contains Lambda function implementations
│   ├── bedrock_invoker/            # Lambda function for document processing using Claude-3
│   │   └── invoke_bedrock_claude3.py
│   └── dynamodb_inserter/          # Lambda function for storing extracted data
│       └── insert_into_dynamodb.py
└── template.yml                     # CloudFormation template defining AWS infrastructure

Usage Instructions

Prerequisites

AWS Account with appropriate permissions
AWS CLI installed and configured
Python 3.12 or later
Terraform CLI is installed. Install Terraform
Make sure the desired foundational model (i.e. Anthropic Claude Sonnet 3) to use is enabled in Bedrock. Refer to AWS Documentation in order to add access to foundation models.
S3 bucket permissions for document storage
IAM permissions for creating and managing AWS resources
Although this solution could work in multiple regions, it has been tested in us-east-region and its recommend to use us-east-1 region.

Installation

Clone the repository:

git clone <repository-url>
cd idp-genai

Terraform

Make sure your AWS credentials are properly configured on the CLI
1. If multiple CLI profile is configured, run this command to set the correct profile

export AWS_PROFILE=<profile-name>

The credentials used in the CLI should have admin privileges (preferred) or adequate permissions to be able to create and update the resources used in this solution.
Init Terraform:

terraform init

Now review the resources that will be created part of this code:

terraform plan

Once you are ready, apply the changes:

terraform apply

SAM

Make sure your AWS credentials are properly configured on the CLI
1. If multiple CLI profile is configured, run this command to set the correct profile export AWS_PROFILE=
2. The credentials used in the CLI should have admin privileges (preferred) or adequate permissions to be able to create and update the resources used in this solution.
Deploy the CloudFormation stack:

sam deploy --guided

Quick Start

Upload a birth certificate image to the S3 bucket:

aws s3 cp path/to/birth-certificate.jpeg s3://your-bucket-name/birth_certificates/images/

The system will automatically:
- Extract information (for birth certificates)
- Store the data in DynamoDB
Query the extracted data:

“aws dynamodb scan \
  --table-name BirthCertificates

More Detailed Examples

Birth Certificate Data Extraction:

# Example of extracted data structure
{
    "applicantDetails": {
        "applicantName": "John Doe",
        "dayPhoneNumber": "555-0123",
        "address": "123 Main St"
    },
    "BirthCertificateDetails": {
        "nameOnBirthCertificate": "John Doe",
        "dateOfBirth": "2000-01-01",
        "cityOfBirth": "Charleston"
    }
}

Troubleshooting

Image Processing Failures
- Error: "Invalid image format"
- Solution: Ensure images are in supported formats (JPEG, PNG, PDF)
- Check file permissions in S3 bucket
Claude-3 Model Issues
- Error: "Model invocation failed"
- Enable CloudWatch logs for the Lambda function
- Verify Bedrock service permissions
- Check model quota limits
DynamoDB Insertion Errors
- Check DynamoDB table permissions
- Verify table schema matches data structure
- Monitor CloudWatch logs for error details
S3 bucket creation takes longer than 2 mins with “Still creating” message
- Make sure that you provided a globally unique bucket team
- Terminate the terraform execution by pressing Ctrl + C (in Mac) and rerun with a globally unique S3 bucket name

Data Flow

The solution processes documents through a serverless pipeline that handles document classification and data extraction.

Upload → S3 Bucket → Lambda (Bedrock) → Extraction → Simple Queue Service → Lambda → DynamoDB
   |                 |                       |                   |              |
   └─ Image Files    └─ Claude-3 Processing  └─ JSON Output      └─ Queue       └─ Storage

Key Component Interactions:

S3 triggers Lambda function on document upload
Bedrock Lambda processes image using Claude-3
Classification results stored directly in DynamoDB
Extracted data sent to SQS queue
Second Lambda processes queue messages
Structured data stored in birth certificates table
Error handling and retries managed by SQS

Architecture

Storage Resources

S3 Bucket (DocumentBucket)
- Versioning enabled
- Public access blocked
DynamoDB Tables
- BirthCertificatesTable (Primary key: Id)

Compute Resources

Lambda Functions
- InvokeBedrockFunction (Python 3.12)
- InsertDynamoDBFunction (Python 3.12)

Message Queue

SQS Queue (ExtractedDataQueue)
- 60-second visibility timeout

IAM Roles

InvokeBedrockRole
- Bedrock model invocation
- S3 read access
- SQS message publishing
InsertDynamoDBRole
- DynamoDB write access
- SQS message processing

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
img		img
lambda		lambda
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
main.tf		main.tf
template.yml		template.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

License

Identity Document Processing (IDP) Solution with Amazon Bedrock and Claude-3

Repository Structure

Usage Instructions

Prerequisites

Installation

Terraform

SAM

Quick Start

More Detailed Examples

Troubleshooting

Data Flow

Architecture

Storage Resources

Compute Resources

Message Queue

IAM Roles

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

aws-samples/sample-serverless-bedrock-idp

Folders and files

Latest commit

History

Repository files navigation

License

Identity Document Processing (IDP) Solution with Amazon Bedrock and Claude-3

Repository Structure

Usage Instructions

Prerequisites

Installation

Terraform

SAM

Quick Start

More Detailed Examples

Troubleshooting

Data Flow

Architecture

Storage Resources

Compute Resources

Message Queue

IAM Roles

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages