QueryLab: A Deep-Dive Case Study
February 22, 2026
Reading a raw PostgreSQL execution plan is a specialized skill. Engineers often struggle with "loops traps", hidden disk spills, and cardinality drift that lead to suboptimal join orders.
QueryLab was inspired by the need for a "linter for query plans"—a tool that doesn't just show data but points directly to the architectural flaw.
The Core Technical Idea
All analysis happens in a single pass of the execution tree (O(N)), keeping analysis predictable even for deeply nested plans.
Data Flow Pipeline
graph LR
Input[Raw Text/JSON] --> Parser[Custom Regex Parser]
Parser --> Canonicalizer[Normalization Layer]
Canonicalizer --> StatsEngine[Recursive Enrichment]
StatsEngine --> RuleEngine[Heuristic Evaluation]
RuleEngine --> UI[React Dashboard]
At its heart, QueryLab leverages a recursive tree-processing engine to enrich raw execution plans with derived metrics (Self Time, Amplification Factors, Cardinality Ratios) and evaluates them against a heuristic rule engine.
Technical Challenges
Indentation-Based Tree Reconstruction
Postgres text plans use visual indentation to represent hierarchy. Parsing this reliably required a stack-based algorithm. The primary challenge was distinguishing between a new node (marked with ->) and supplemental node metadata (like Filter: or Buffers:) which are also indented.
The solution involved a worker-aware line processor that tracks the _indent of the last "Node Line." Any non-node line is treated as metadata for the current top-of-stack node until the indentation level decreases.
The Self-Time Calculation Paradox
In an execution tree, a parent node's time includes all children's time. Identifying the actual bottleneck requires subtracting child effectiveTime from the parent. Parallel workers and loops complicate this; a node might execute 1,000,000 times, each taking 5ms.
I calculated effectiveTime = totalTime * loops at every node before performing the subtraction. This normalized time across different loop counts, allowing for an accurate "Self Time" percentage.
Cardinality Mismatch Detection
Simply dividing Actual Rows by Plan Rows isn't enough. A mismatch of 0.1 is just as bad as 10. I implemented a "Folded Ratio" using Math.max(ratio, 1/ratio). This treats over-estimates and under-estimates with equal severity, highlighting model errors where the planner is fundamentally "wrong" about the data distribution.
Key Performance Features
Rule-Based Insight System
Each rule is a functional predicate that inspects a node's derived stats. This shifts the burden of "expert knowledge" from the developer to the tool. For example, the Nested Loop Amplification rule detects when a join is processing 1,000x more rows than it returns, signaling a missing index or poor join order.
Automated Optimization Priority
Instead of showing dozens of minor issues, QueryLab uses a scoring algorithm:
Score = Time% * log10(RowError) * SpillMultiplier.
This identifies the single node that, if fixed, provides the highest ROI for performance optimization.
Privacy and Security
Since QueryLab is a static client-side app, sensitive query parameters (which often appear in Filter conditions) never leave the user's browser. There is no database connection required, removing security friction and allowing engineers to analyze plans from production without needing direct DB access.
Lessons Learned
- Normalization is Essential: Moving to a canonical internal schema was the best architectural decision. It decoupled the system from Postgres version-specific JSON schemas and made the Rule Engine significantly cleaner.
- Heuristics over Hard Limits: Performance is relative. Rules must be based on impact (percentage of total time) rather than just absolute values.
- Zero-Config Utility: Removing the need for a database connection significantly increased adoption by removing security hurdles.
Future Direction
I am looking into SQL schema integration to provide deeper index recommendations based on CREATE TABLE statements, and AI-assisted recommendations to generate specific SQL rewrite suggestions based on the identified bottlenecks.