Designing a Lazy-Loaded Tree

November 4, 2024

MongoDBAggregation PipelinesLazy LoadingAPI Design

The Problem

As our dataset grew, one of our core APIs started degrading significantly in performance.

I was working with a contract_master collection that stored hierarchical contract data:

Master contracts
Child contracts
Sub-child contracts

On the frontend, I needed to display this as an expandable tree (Ant Design Tree), where users could progressively explore the hierarchy.

At first glance, this sounds straightforward. In practice, it turned into one of the slowest endpoints in the system.

The Original Approach

The API was implemented entirely in the application layer using Node.js.

Flow

Fetch paginated master contracts from MongoDB
For each master contract:
- Recursively fetch children using parent_id
- Then fetch sub-children the same way
No pagination for children or sub-children
Reconstruct the full tree structure in JavaScript
Send the entire hierarchy to the frontend

Why This Was a Problem

This approach had several critical issues:

N+1 query problem due to recursive fetching
Multiple database round trips per request
No pagination for nested levels
Large payloads being sent over the network
CPU-heavy tree construction in Node.js

Effectively, the API was doing iterative graph traversal in the application layer instead of letting the database handle it.

The Bottleneck

With just ~2000 contracts in the collection:

Fetching 10 master contracts took 8–10 seconds
Each request returned a massive nested structure
Performance degraded rapidly as data grew

The system was not scalable.

The Key Insight

The turning point was realizing this:

Hierarchical traversal should happen inside the database, not in the application layer.

MongoDB provides an operator specifically designed for this use case: $graphLookup.

Understanding $graphLookup

At a high level, $graphLookup allows you to recursively traverse relationships within a collection.

Basic Example

{
  $graphLookup: {
    from: "contract_master",
    startWith: "$_id",
    connectFromField: "_id",
    connectToField: "parent_id",
    as: "children",
    maxDepth: 1
  }
}

What Each Field Means

startWith: The starting node (current contract)
connectFromField: Field used to traverse outward
connectToField: Field used to match relationships
as: Output array containing related documents
maxDepth: Controls how deep the traversal goes

This allowed me to replace recursive queries with a single aggregation pipeline.

The New Architecture

Instead of fetching the entire tree at once, I redesigned the API to be lazy-loaded and interaction-driven.

I split the logic into three modes handled by a single API.

Step 1: Fetch Master Contracts (Paginated)

I only fetch master contracts with minimal data required for rendering.

Key Ideas

Return only essential fields (name, type, id)
Check if a contract has children
Do not fetch children yet

Example Pipeline

[
  { $match: { parent_id: null } },
  { $sort: { createdAt: -1 } },
  { $skip: 0 },
  { $limit: 10 },
  {
    $lookup: {
      from: "contract_master",
      localField: "_id",
      foreignField: "parent_id",
      as: "children"
    }
  },
  {
    $addFields: {
      hasChildren: { $gt: [{ $size: "$children" }, 0] }
    }
  },
  {
    $project: {
      name: 1,
      type: 1,
      hasChildren: 1
    }
  }
]

This allowed the frontend to render expandable nodes without loading unnecessary data.

Step 2: Fetch Children (On Expand)

When a user expands a master contract:

The same API is called with the master contract ID
A conditional aggregation pipeline runs
$graphLookup fetches children
I also check if each child has further descendants

Example

[
  { $match: { _id: ObjectId(masterId) } },
  {
    $graphLookup: {
      from: "contract_master",
      startWith: "$_id",
      connectFromField: "_id",
      connectToField: "parent_id",
      as: "children",
      maxDepth: 0
    }
  },
  { $unwind: "$children" },
  {
    $lookup: {
      from: "contract_master",
      localField: "children._id",
      foreignField: "parent_id",
      as: "grandChildren"
    }
  },
  {
    $addFields: {
      "children.hasChildren": {
        $gt: [{ $size: "$grandChildren" }, 0]
      }
    }
  },
  {
    $project: {
      "children._id": 1,
      "children.name": 1,
      "children.type": 1,
      "children.hasChildren": 1
    }
  }
]

Important Detail

I only fetch one level at a time. This keeps queries fast and predictable.

Step 3: Fetch Sub-Children

The same API handles deeper levels:

Pass child contract ID
Run similar aggregation
Return only that level’s data

This makes the system consistent and reusable.

Lazy Loading the Tree

Instead of loading the entire hierarchy upfront:

Data is fetched only when the user expands a node
Each level is fetched independently
The frontend simply merges new nodes into the tree

This drastically reduces both:

Initial load time
Total data transferred

Pagination Strategy

Pagination is applied at every level:

Masters → paginated
Children → paginated
Sub-children → paginated

This ensures:

Predictable performance
No unbounded data growth per request

Payload Optimization

I avoided sending full documents.

Each node only includes:

_id
name
type
hasChildren

This keeps responses lightweight and fast.

Frontend Simplification

Previously:

Backend built the entire tree
Frontend consumed a large nested structure

After refactor:

Backend returns flat, minimal nodes
Frontend incrementally builds the tree

This made the UI logic much simpler and more maintainable.

The Outcome

The improvements were significant:

Metric	Before	After
Response Time	8–10s	0.2–0.4s
Data Volume	Very large	Minimal
Query Pattern	Recursive (N+1)	Single aggregation
Scalability	Poor	Stable

Most importantly:

Performance became independent of total dataset size.

Key Takeaways

Move hierarchical logic into the database when possible
Avoid recursive API patterns for relational traversal
Design APIs around UI interaction patterns
Use lazy loading for tree-like data
Always control payload size

When Not to Use This Approach

$graphLookup is powerful, but not always the best choice:

Extremely deep hierarchies with high fan-out
Very large graphs where traversal becomes expensive
Cases where denormalization is more efficient

Final Thoughts

This wasn’t just a query optimization.

It was a shift in how I designed APIs:

From eager loading → lazy loading
From application-driven traversal → database-driven traversal
From heavy responses → minimal, interaction-based data

That shift is what unlocked the performance gains.