Dev.to #systemdesign·March 21, 2026

Strategies for Efficiently Transferring Large Datasets from Backend to Frontend

This article explores common pitfalls and effective architectural strategies for transferring large datasets (e.g., millions of rows) from a backend service to a client. It details various techniques, including pagination, server-side streaming, batching, and advanced serialization formats like Protocol Buffers and Parquet, alongside compression methods, to optimize performance, memory usage, and network efficiency. The discussion highlights crucial trade-offs and considerations across database, network, and client layers.

Performance & Scaling API Design Databases & Storage

Read original on Dev.to #systemdesign

Sending large volumes of data from a backend to a client is a common challenge in system design. A naive approach, such as fetching all data and serializing it into a single JSON response, can lead to severe performance bottlenecks and failures. This includes high server memory consumption (OOM errors), long serialization times blocking event loops, significant network latency, and client-side crashes due to excessive parsing memory requirements.

The Perils of Single-Blob Data Transfer

⚠️

Impact of a Single Large JSON Response

Attempting to send a million rows as a single JSON object can consume 1-2 GB of server RAM per request, block the event loop for tens of seconds, transfer 50-500 MB over the network (even compressed), and cause client browsers to allocate 1-2 GB, leading to crashes or unresponsiveness. This approach is highly unscalable.

Technique 1: Pagination for Controlled Data Delivery

Pagination is the most fundamental strategy to avoid sending all data at once. It breaks down a large dataset into smaller, manageable chunks. Two primary types are:

Offset Pagination: Uses `LIMIT` and `OFFSET` SQL clauses. While simple to implement, it becomes inefficient at deep pages because the database still has to scan and discard rows up to the offset, leading to degraded performance for users accessing later pages.
Cursor-Based (Keyset) Pagination: Utilizes an indexed column (e.g., a primary key or timestamp) as a 'cursor' to fetch the next set of rows directly after the last seen item. This is significantly more efficient for large datasets as it avoids full table scans and leverages index lookups, making it ideal for infinite scrolling or high-performance data retrieval.

sql

-- Cursor-Based Pagination Example
SELECT * FROM users WHERE id > 12345 ORDER BY id LIMIT 50;

data transferpaginationstreamingprotocol bufferscompressionbackend performancefrontend performancescalability

Comments

Loading comments...

Architecture Design

Design this yourself

Design an API service that allows clients to download datasets containing millions of records efficiently, incorporating strategies like cursor-based pagination, server-side streaming, and optimized data serialization (e.g., Protocol Buffers with compression). Detail the architectural choices for the API gateway, backend service, and data storage to handle high throughput and minimize client/server resource consumption.

Practice Interview

Focus: efficient large dataset transfer mechanism

Other design angles

· Design a backend data export service that generates compressed Parquet files for client download, focusing on asynchronous processing and notification mechanisms.· Design a real-time analytics dashboard backend that streams aggregated data to the client, discussing techniques for backpressure management and efficient data serialization over WebSockets.· Design a system for bulk data ingestion and subsequent retrieval, specifically focusing on how to efficiently store and then serve millions of time-series data points to various consumers.

Strategies for Efficiently Transferring Large Datasets from Backend to Frontend

The Perils of Single-Blob Data Transfer

Technique 1: Pagination for Controlled Data Delivery

Comments

Architecture Design

Related Lessons