Menu
GitHub Engineering·May 13, 2025

Rebuilding GitHub Issues Search with Nested Queries and Boolean Operators

GitHub re-engineered its Issues search to support complex nested queries and boolean operators. This involved transitioning from a flat query parser to an Abstract Syntax Tree (AST)-based approach and mapping these structures to Elasticsearch's boolean query capabilities, all while ensuring backward compatibility and maintaining performance at high QPS.

Read original on GitHub Engineering

GitHub's Issues search functionality was enhanced to move beyond simple flat queries, enabling users to construct advanced searches with logical AND/OR operators and nested parentheses. This upgrade addressed a long-standing user request for more flexible search capabilities across all issue fields.

Architectural Shift: From Flat to AST-Based Parsing

The core architectural change involved replacing the existing IssuesQuery search module with a new ConditionalIssuesQuery module. The key innovation here was the adoption of an Abstract Syntax Tree (AST) for parsing user input. Unlike the previous flat list approach, an AST can recursively represent complex, nested query structures, which is crucial for handling boolean logic and parentheses.

ruby
class Parser < Parslet::Parser
  rule(:space) { match[" "].repeat(1) }
  rule(:space?) { space.maybe }
  rule(:lparen) { str("(") >> space? }
  rule(:rparen) { str(")") >> space? }
  rule(:and_operator) { str("and") >> space? }
  rule(:or_operator) { str("or") >> space? }
  rule(:var) { str("var") >> match["0-9"].repeat(1).as(:var) >> space? }

  # The primary rule deals with parentheses.
  rule(:primary) { lparen >> or_operation >> rparen | var }

  # Note that following rules are both right-recursive.
  rule(:and_operation) { (primary.as(:left) >> and_operator >> and_operation.as(:right)).as(:and) | primary }
  rule(:or_operation) { (and_operation.as(:left) >> or_operator >> or_operation.as(:right)).as(:or) | and_operation }

  # We start at the lowest precedence rule.
  root(:or_operation)
end

Query Transformation to Elasticsearch

After parsing, the AST is traversed recursively to generate an Elasticsearch query document. The beauty of this approach is how naturally nested AST nodes and boolean operators map to Elasticsearch's `bool` query clauses (e.g., AND to `must`, OR to `should`, NOT to `must_not`). This allows for efficient execution of complex logical searches within the underlying search engine.

Addressing System Design Challenges

  • Backward Compatibility: The new system was rigorously tested against existing unit and integration tests, and dark-shipped to 1% of production traffic to compare results with the old system, ensuring no regressions for millions of users.
  • Performance Degradation: To prevent performance issues with more complex queries (2000 QPS average), GitHub used its `scientist` library to run equivalent queries against both systems in parallel, comparing execution times and resource usage.
  • User Experience: Collaborating with product and design, limitations (e.g., 5 nested levels) and UI/UX cues (e.g., highlighting operators, auto-complete) were implemented to maintain usability despite increased query complexity.
  • Minimizing Risk: A gradual rollout strategy began with the GraphQL API and a specific UI tab, collecting feedback before expanding to the entire Issues dashboard and REST API, ensuring minimal blast radius in case of issues.
💡

System Design Lessons

When introducing significant changes to a critical system, especially one with high traffic and legacy usage, prioritize: 1. Backward Compatibility: Extensive testing and dark-shipping are crucial. 2. Performance Monitoring: Baseline new functionality and compare against existing to prevent regressions. 3. Gradual Rollout: Limit blast radius and gather feedback incrementally. 4. User Experience: Complex features should not degrade usability; thoughtful design and limitations are key.

search engineElasticsearchparsingASTbackward compatibilityperformance testinggradual rolloutsystem re-architecture

Comments

Loading comments...
Rebuilding GitHub Issues Search with Nested Queries and Boolean Operators | SysDesAi