This article explores common pitfalls and effective architectural strategies for transferring large datasets (e.g., millions of rows) from a backend service to a client. It details various techniques, including pagination, server-side streaming, batching, and advanced serialization formats like Protocol Buffers and Parquet, alongside compression methods, to optimize performance, memory usage, and network efficiency. The discussion highlights crucial trade-offs and considerations across database, network, and client layers.
Read original on Dev.to #systemdesignSending large volumes of data from a backend to a client is a common challenge in system design. A naive approach, such as fetching all data and serializing it into a single JSON response, can lead to severe performance bottlenecks and failures. This includes high server memory consumption (OOM errors), long serialization times blocking event loops, significant network latency, and client-side crashes due to excessive parsing memory requirements.
Impact of a Single Large JSON Response
Attempting to send a million rows as a single JSON object can consume 1-2 GB of server RAM per request, block the event loop for tens of seconds, transfer 50-500 MB over the network (even compressed), and cause client browsers to allocate 1-2 GB, leading to crashes or unresponsiveness. This approach is highly unscalable.
Pagination is the most fundamental strategy to avoid sending all data at once. It breaks down a large dataset into smaller, manageable chunks. Two primary types are:
-- Cursor-Based Pagination Example
SELECT * FROM users WHERE id > 12345 ORDER BY id LIMIT 50;