Menu
Dev.to #systemdesign·March 31, 2026

Optimizing Combinatorial Data Generation from O(10^7) to SQL Joins

This article details a system design problem involving the generation of valid phone numbers based on complex regulatory rules. It highlights a common pitfall in algorithmic design where an initial O(N^M) iterative approach, suitable for small datasets, fails dramatically at scale. The core system design lesson is the transformation of a compute-intensive algorithmic problem into a data-centric problem, leveraging the power of relational databases for efficient set-based operations.

Read original on Dev.to #systemdesign

The Problem: Generating Valid Phone Numbers at Scale

The objective was to generate lists of valid 10-digit phone numbers for specific cities in Mexico, adhering to IFT (Instituto Federal de Telecomunicaciones) regulations. These regulations involve prefixes (2-4 digits) tied to localities and state, and complex rules dictating which suffixes are valid for the remaining 6-8 digits. The challenge was to efficiently generate all valid combinations for a given city prefix.

Initial Approach: Iterative Brute-Force

The first implementation relied on deeply nested `for` loops. For each digit position, it would iterate through all possible values (0-9) and apply `if` conditions to filter invalid combinations based on IFT rules. In the worst case, for cities with 3-digit prefixes (leaving 7 digits to generate), this resulted in a time complexity of O(10^7) just for combination generation, before even applying the filtering logic. While this worked for small cities, it took hours for large cities like CDMX or Monterrey, pushing local systems to memory and CPU limits.

⚠️

Scalability Trap

This scenario highlights a crucial system design lesson: an algorithm that is logically correct can be an architectural disaster if its complexity isn't considered against expected data scale. Relying on iterative, in-memory processing for combinatorial problems quickly becomes a bottleneck.

The Architectural Shift: Data-Centric Approach

The key insight was recognizing that the problem was not a pure algorithmic puzzle best solved with imperative code, but rather a data combination and filtering problem. Relational database engines are highly optimized for this kind of set-based operation. The paradigm shifted from "generating combinations with code" to "modeling rules as data and letting the database perform the joins and filtering."

Solution: SQL with Temporary Tables and Joins

The refined solution involved transforming the IFT rules from `if` statements into database tables (often temporary). Each table represented valid digits for a specific position, conditioned on the state and locality. The generation of valid phone numbers then became a series of `JOIN` operations across these tables, efficiently combining the valid segments and filtering out invalid ones. This leverages the database's optimized query planners and indexing capabilities, drastically reducing computation time from hours to seconds or minutes.

sql
 -- Simplified example of rule conversion and combination
 CREATE TEMPORARY TABLE valid_prefix_cdmx AS
 SELECT prefix FROM ift_rules WHERE city = 'CDMX';

 CREATE TEMPORARY TABLE valid_suffix_part1 AS
 SELECT suffix_digits FROM ift_rules_suffix WHERE position = 1;

 CREATE TEMPORARY TABLE valid_suffix_part2 AS
 SELECT suffix_digits FROM ift_rules_suffix WHERE position = 2;

 SELECT
   p.prefix || s1.suffix_digits || s2.suffix_digits
 FROM
   valid_prefix_cdmx p
 JOIN
   valid_suffix_part1 s1 ON true -- or relevant join condition
 JOIN
   valid_suffix_part2 s2 ON true -- or relevant join condition
 WHERE
   LENGTH(p.prefix || s1.suffix_digits || s2.suffix_digits) = 10;

This approach demonstrates a powerful technique in system design: recognizing when an algorithmic problem can be reframed as a data manipulation task, allowing the use of specialized, highly optimized tools like database systems to achieve superior performance and scalability.

SQLDatabase OptimizationAlgorithmic ComplexityScalabilityData GenerationCombinatorial ProblemPerformance TuningSystem Design Patterns

Comments

Loading comments...