Menu
AWS Architecture Blog·June 3, 2026

Achieving High Availability for Oracle Databases on AWS with FSx for ONTAP

This article outlines a robust, cloud-native architecture for achieving high availability (HA) for Oracle databases on AWS. It demonstrates how to combine Amazon FSx for NetApp ONTAP for shared storage, EC2 Auto Scaling groups for automated instance recovery, AWS Backup for consistent AMI creation, and Lambda/EventBridge for orchestration. This approach significantly reduces RTO and RPO while ensuring configuration consistency and automating operational tasks for mission-critical Oracle workloads.

Read original on AWS Architecture Blog

Introduction to Highly Available Oracle on AWS

Traditional high availability solutions for Oracle databases are often complex, costly, and prone to single points of failure. This architecture leverages modern cloud services to overcome these challenges, providing an automated and resilient setup for Oracle databases on AWS. The core idea is to decouple compute (EC2 instances) from storage (FSx for NetApp ONTAP) and automate the lifecycle management of compute resources with dynamic AMI updates.

Key Architectural Components and Interactions

  • Amazon FSx for NetApp ONTAP (FSxN): Provides Multi-AZ shared file storage, synchronously replicating data across two Availability Zones. This ensures Oracle database files, software, and configurations remain accessible even if an EC2 instance or an entire AZ fails.
  • Amazon EC2 Auto Scaling Groups: Manages the lifecycle of Oracle EC2 instances. In case of instance failure, Auto Scaling automatically launches new instances configured with the latest AMI, which then reconnect to the existing Oracle database files on FSxN.
  • AWS Backup: Captures the complete state of the Oracle EC2 instance by creating AMIs. This ensures that the operating system, Oracle software, patches, and configurations are consistently baked into a new AMI.
  • AWS EventBridge & Lambda: EventBridge detects the completion of AWS Backup jobs, triggering a Lambda function. This function extracts the new AMI ID from the backup recovery point, updates an SSM Parameter Store parameter, and manages AMI cleanup, orchestrating automated AMI updates.
  • AWS Systems Manager Parameter Store: Stores the current, validated AMI ID. Auto Scaling launch templates reference this parameter, ensuring that all newly launched instances use the most up-to-date Oracle host configuration.

How FSxN Enables Oracle HA

Unlike EBS volumes, which are tied to a single AZ, FSxN Multi-AZ ensures data persistence and availability across AZs. When an EC2 instance is replaced, the new instance can immediately mount the existing iSCSI LUNs from FSxN and access the Oracle database files without requiring a data restore. Multipath I/O configurations are crucial for maintaining connectivity during AZ failovers.

Automated AMI Updates and Instance Recovery

The combination of AWS Backup, EventBridge, Lambda, and SSM Parameter Store creates a self-healing and continuously updated system. Regular backups create fresh AMIs, which are then automatically registered for use by the Auto Scaling group. This means that any new Oracle instance launched due to scaling events or failures will always start with the latest software and configuration, significantly reducing recovery times and operational burden.

ℹ️

Achieved SLOs

This architecture aims for a Recovery Time Objective (RTO) of 2–5 minutes with the latest Oracle configuration and a near-zero Recovery Point Objective (RPO) through synchronous Multi-AZ replication provided by FSxN.

OracleHigh AvailabilityAWSFSx for NetApp ONTAPAuto ScalingDatabase ArchitectureCloud-NativeDisaster Recovery

Comments

Loading comments...