AWS Architecture Blog·April 1, 2026

Designing a Scalable Computer Vision System for Workplace Safety Monitoring on AWS

This article outlines a serverless, event-driven architecture for a computer vision and generative AI-based workplace safety monitoring system. It details how to scale the solution to hundreds of sites and thousands of cameras, focusing on image collection, anonymization, training pipelines, and inference. Key system design considerations include data privacy, security, and continuous model improvement through feedback loops.

AI & ML Infrastructure Distributed Systems Cloud & Infrastructure

Read original on AWS Architecture Blog

The article presents a robust, scalable architecture for automated workplace safety monitoring using computer vision and generative AI. This system addresses the limitations of manual safety audits by providing continuous, real-time oversight of PPE compliance and zone-based hazard detection across numerous facilities. It emphasizes a serverless, event-driven approach designed for efficiency and high scalability.

Core Architectural Principles

Serverless and Event-Driven: The architecture leverages AWS Lambda, Amazon S3, AWS Step Functions, and Amazon EventBridge to build a highly scalable and cost-effective system. This allows for processing massive volumes of image data from thousands of cameras efficiently.
Distributed Across AWS Accounts: To ensure proper security, operational segregation, and data isolation, the solution is distributed across multiple AWS accounts. This includes separate accounts for training pipelines, image collection, the end-user web application, and analytics.
Privacy-Preserving: Human faces and identifiable features are blurred using Amazon Rekognition and custom Python code immediately after image capture, before images are replicated or used for training/inference, to protect Personally Identifiable Information (PII).

Key System Components and Data Flows

The system's workflow begins with image collection from site cameras, stored temporarily in a restricted Amazon S3 bucket for anonymization. These anonymized images are then replicated to S3 buckets across different accounts for training, inference, and the web application. This multi-stage data handling is critical for both privacy and operational efficiency.

ML Training Pipeline

Ground Truth Generation: AWS Step Functions orchestrate Amazon SageMaker Ground Truth labeling jobs, triggered by Amazon EventBridge. This workflow integrates Zone User feedback and saved ML model predictions to prioritize data labeling for underperforming classes and cameras.
Data Processing and Storage: AWS Lambda transforms completed labeling jobs into a suitable format, storing metadata in Amazon DynamoDB and annotations in an S3 bucket.
Model Building and Promotion: Amazon SageMaker AI Pipelines execute model training workflows. Approved models trigger an EventBridge event, which a Lambda uses to update the SageMaker AI endpoint via a CI/CD pipeline, decoupling model science from application deployments.

Inference Pipeline

Each safety use case has its own inference pipeline. Once an anonymized image lands in its dedicated S3 bucket, it triggers an Amazon SNS notification, initiating the hazard detection process. This continuous monitoring acts as a digital safety supervisor, distinguishing normal workflows from potential hazards.

💡

Scalability and Maintainability

The design leverages serverless services for automatic scaling and manages model updates through a well-defined CI/CD process. Decoupling the model training and deployment cycles ensures that data scientists can iterate on models without directly impacting the application code, and engineers can manage infrastructure changes independently.

computer visiongenerative AIAWSserverlessevent-drivenmachine learningworkplace safetysystem architecture

Comments

Loading comments...

Architecture Design

View Architecture

Design a scalable, real-time workplace safety monitoring system using computer vision and generative AI, capable of processing video streams from thousands of cameras across hundreds of facilities. The system must include modules for image collection, PII anonymization, automated hazard detection (e.g., PPE compliance, zone violations), and continuous ML model training and deployment. Focus on architectural choices for high availability, low-latency inference, data privacy, and efficient feedback loops for model improvement.

Practice Interview

Other design angles

· Design just the ML training and model promotion pipeline for a computer vision system, emphasizing data labeling, model evaluation, and seamless deployment to production endpoints.· Design the real-time inference and alert generation component of a safety monitoring system, focusing on minimizing latency for hazard detection and integrating with existing incident response workflows.· Design a multi-tenant platform for facility monitoring, allowing different organizations to configure their own safety rules, camera feeds, and user roles while ensuring data isolation and privacy.