This article details the architecture of Doczy.ai, an intelligent contract interpretation solution built on AWS, leveraging generative AI to transform unstructured legal documents into structured, actionable insights. It highlights the use of AWS services like S3, Lambda, Textract, and large language models, alongside proprietary "smart chunking" and dual clustering algorithms, to achieve high accuracy and scalability in document processing.
Read original on AWS Architecture BlogDoczy.ai addresses the significant challenge of extracting critical business information from large volumes of unstructured legal documents. Traditional manual or rules-based approaches are unscalable, error-prone, and inefficient. Doczy.ai uses a cloud-native, AI-driven approach to automate this process, achieving high accuracy and delivering substantial cost savings and operational efficiencies for clients in healthcare and financial services.
The Doczy.ai platform is built entirely on AWS, designed to manage the full document processing lifecycle. It processes documents from user upload through intelligent extraction, semantic and structural analysis, and finally, structured data generation for business intelligence. Key AWS services form the backbone of this scalable and secure architecture.
System Design Lessons
This architecture demonstrates a robust pattern for building AI-powered document processing systems. It highlights the importance of leveraging specialized services (like Textract for OCR), employing proprietary algorithms for data preparation (smart chunking), and combining different AI approaches (embeddings for semantics, pattern recognition for structure, LLMs for generation) to achieve high accuracy and handle complex unstructured data. The iterative prompting with a feedback loop is a key technique for improving LLM performance in specific domains.
Doczy.ai has successfully processed 2.5 million contracts (50 million pages) and handled billions of tokens with 137 million API calls to Amazon Bedrock, demonstrating its production readiness and scalability. This approach has led to a 97% reduction in manual processing time and significant financial savings for clients.