Course/Data Storage/Object Storage (S3 Pattern)

Object Storage (S3 Pattern)

How object storage works: S3-compatible APIs, eventual consistency, versioning, lifecycle policies, and when to use it vs block/file storage.

10 min read

Three Storage Paradigms

Before diving into object storage, it's important to understand how it differs from the other two storage paradigms systems engineers work with.

Paradigm	Access Model	Examples	Best For
Block Storage	Fixed-size blocks via disk interface (like a hard drive)	AWS EBS, GCP Persistent Disk	OS filesystems, databases, VMs — anything needing random read/write access
File Storage (NAS)	Hierarchical filesystem with directory tree	AWS EFS, NFS, Samba	Shared filesystems accessed by multiple servers simultaneously
Object Storage	Flat namespace of objects via HTTP API (GET/PUT/DELETE)	AWS S3, GCS, Azure Blob	Large unstructured data: images, videos, backups, logs, ML datasets

How Object Storage Works

Object storage has no directory hierarchy — it's a flat namespace where each object is identified by a bucket name + object key (a string that can look like a path: `photos/user-42/avatar.jpg`). Objects are immutable once written — to update a file, you overwrite the entire object. This immutability enables the extreme durability and scalability properties S3 is known for.

Loading diagram...

S3 internal architecture: stateless frontends route to replicated data nodes based on metadata lookup

AWS S3 provides 11 nines of durability (99.999999999%) by storing each object across multiple availability zones. An object is considered durable when it has been written to a quorum of storage nodes. S3 Standard also provides 99.99% availability — this is distinct from durability. Durability means the data exists; availability means you can access it.

Consistency Model

As of December 2020, Amazon S3 provides strong read-after-write consistency for all operations (GET, PUT, DELETE, LIST). Previously it was eventually consistent, which caused surprising bugs: you'd PUT an object and immediately GET it, sometimes receiving a 404. The updated model means a successful PUT is immediately visible to subsequent GETs — a significant usability improvement.

ℹ️

S3 Eventual Consistency History

If you see older system design resources warning about S3's eventual consistency for overwrites, note this was fixed in 2020. Current S3 offers strong consistency for all operations. This matters for patterns like: upload a file, then immediately process it — that workflow is now safe without extra synchronization.

Key Features for System Design

Versioning

Enable versioning on a bucket to retain every version of every object. Deletes become delete markers — the object isn't physically removed. You can restore any previous version. This is essential for backup systems, data pipelines where source data must be reprocessable, and compliance requirements.

Lifecycle Policies

Lifecycle policies automate cost optimization. You define rules like: transition objects to S3 Infrequent Access after 30 days (cheaper storage, higher retrieval cost), transition to S3 Glacier after 90 days (very cheap, minutes-to-hours retrieval), and expire (delete) objects after 365 days. This automatically tiers data without manual intervention.

json

{
  "Rules": [
    {
      "ID": "log-lifecycle",
      "Status": "Enabled",
      "Transitions": [
        { "Days": 30, "StorageClass": "STANDARD_IA" },
        { "Days": 90, "StorageClass": "GLACIER" }
      ],
      "Expiration": { "Days": 365 }
    }
  ]
}

Pre-Signed URLs

A pre-signed URL grants temporary access to a private S3 object without requiring the recipient to have AWS credentials. The server generates a URL signed with its credentials and an expiry time (e.g., 15 minutes). The client uses this URL to upload or download directly from S3. This is the standard pattern for user file uploads — the client never goes through your server, reducing server load and cost.

Loading diagram...

Pre-signed URL pattern: client uploads directly to S3, bypassing your application server

Object Storage Anti-Patterns

Using S3 as a database: S3 has no query capability. You cannot filter objects by metadata without listing all objects. Use a database (DynamoDB) to store metadata and object keys, and S3 for the actual blobs.
Hot key prefix: S3 partitions by key prefix. If all your objects start with the same prefix (e.g., a date `2024-01-15/`), they land on the same partition. Randomize or hash the prefix to distribute across partitions for high-throughput workloads.
Storing small files: S3 has per-request costs. Millions of tiny files (< 1KB) cost more to operate than fewer larger files. Aggregate small files into batches before storing.

💡

Interview Tip

Object storage appears in almost every large system design. When you mention 'users upload photos' or 'we store video files', immediately follow up with the pre-signed URL pattern: 'The client requests a pre-signed URL from our API server, then uploads directly to S3. This keeps large binary transfers off our application servers and dramatically reduces bandwidth costs.' Bonus points: mention CDN in front of S3 for read-heavy content (user avatars, thumbnails).

Data Lakes & Data Warehouses

Caching Strategies: Write-Through, Write-Back, Write-Around