Menu
AWS Architecture Blog·June 9, 2026

Automating Medical Record Digitization with Serverless FHIR Pipeline on AWS

This article details a serverless, event-driven architecture on AWS for automating the digitization of paper medical records into FHIR R4-compliant data. It outlines a pipeline using Amazon Bedrock Data Automation, AWS Lambda, S3, and HealthLake to extract, transform, and store clinical information, providing a blueprint for building scalable, interoperable healthcare data solutions.

Read original on AWS Architecture Blog

Healthcare organizations grapple with vast amounts of unstructured paper medical records, leading to care gaps and high manual data entry costs. The technical challenge is to efficiently transform these scanned documents into standardized, interoperable health data without extensive custom machine learning development. This solution addresses this by providing a serverless pipeline on AWS, leveraging managed services to automate the entire process from document ingestion to FHIR-compliant data storage.

Event-Driven Serverless Architecture Overview

The proposed architecture is fully event-driven and serverless, eliminating the need for constant polling or scheduled jobs. It relies on AWS services like Amazon S3, AWS Lambda, Amazon Bedrock Data Automation (BDA), and AWS HealthLake to create a robust and scalable pipeline. This design pattern emphasizes loose coupling and independent scalability of each processing stage, critical for handling varying data volumes in enterprise environments.

Core Components and Their Roles

  • Amazon S3: Acts as the pipeline's backbone, serving as both the entry point for raw PDF documents and the intermediate storage layer between processing stages. S3 event notifications trigger subsequent Lambda functions.
  • Amazon Bedrock Data Automation (BDA): The intelligence layer responsible for extracting over 50 structured clinical fields from scanned PDFs. It uses a custom medical blueprint and returns confidence scores for each extraction, removing the need for custom ML models or templates.
  • AWS Lambda: The transformation layer, composed of two functions. The *BDA Trigger Lambda* initiates BDA jobs upon S3 events, while the *FHIR Processor Lambda* converts BDA's JSON output into FHIR R4 Bundles and triggers HealthLake imports. This separation of concerns allows for independent testing and replacement.
  • AWS HealthLake: The FHIR data store. It ingests NDJSON, validates resources against the FHIR R4 specification, establishes relationships between resources, indexes data for efficient querying, and exposes data via standard FHIR API endpoints.
💡

Scalability and Maintainability through Decoupling

The use of S3 event notifications to trigger Lambda functions ensures that each stage of the pipeline operates independently. This decoupling allows each component to scale automatically based on demand and simplifies maintenance, as changes in one part of the pipeline have minimal impact on others. This is a fundamental principle in designing resilient distributed systems.

Security and Infrastructure as Code

The entire infrastructure is provisioned as code using AWS CloudFormation, ensuring repeatability and version control. Security is paramount, especially with Protected Health Information (PHI). IAM roles enforce least-privilege permissions between services, preventing overly broad access. AWS KMS encrypts HealthLake data at rest, and CloudWatch/CloudTrail provide comprehensive monitoring and audit trails.

AWSServerlessEvent-DrivenFHIRHealthcareData PipelineBedrockHealthLake

Comments

Loading comments...