Designing a Lazy-Loaded Tree

November 4, 2024

MongoDBAggregation PipelinesLazy LoadingAPI Design

The Problem

As our dataset grew, one of our core APIs started degrading significantly in performance.

I was working with a contract_master collection that stored hierarchical contract data:

  • Master contracts
  • Child contracts
  • Sub-child contracts

On the frontend, I needed to display this as an expandable tree (Ant Design Tree), where users could progressively explore the hierarchy.

At first glance, this sounds straightforward. In practice, it turned into one of the slowest endpoints in the system.


The Original Approach

The API was implemented entirely in the application layer using Node.js.

Flow

  1. Fetch paginated master contracts from MongoDB
  2. For each master contract:
    • Recursively fetch children using parent_id
    • Then fetch sub-children the same way
  3. No pagination for children or sub-children
  4. Reconstruct the full tree structure in JavaScript
  5. Send the entire hierarchy to the frontend

Why This Was a Problem

This approach had several critical issues:

  • N+1 query problem due to recursive fetching
  • Multiple database round trips per request
  • No pagination for nested levels
  • Large payloads being sent over the network
  • CPU-heavy tree construction in Node.js

Effectively, the API was doing iterative graph traversal in the application layer instead of letting the database handle it.


The Bottleneck

With just ~2000 contracts in the collection:

  • Fetching 10 master contracts took 8–10 seconds
  • Each request returned a massive nested structure
  • Performance degraded rapidly as data grew

The system was not scalable.


The Key Insight

The turning point was realizing this:

Hierarchical traversal should happen inside the database, not in the application layer.

MongoDB provides an operator specifically designed for this use case: $graphLookup.


Understanding $graphLookup

At a high level, $graphLookup allows you to recursively traverse relationships within a collection.

Basic Example

{
  $graphLookup: {
    from: "contract_master",
    startWith: "$_id",
    connectFromField: "_id",
    connectToField: "parent_id",
    as: "children",
    maxDepth: 1
  }
}

What Each Field Means

  • startWith: The starting node (current contract)
  • connectFromField: Field used to traverse outward
  • connectToField: Field used to match relationships
  • as: Output array containing related documents
  • maxDepth: Controls how deep the traversal goes

This allowed me to replace recursive queries with a single aggregation pipeline.


The New Architecture

Instead of fetching the entire tree at once, I redesigned the API to be lazy-loaded and interaction-driven.

I split the logic into three modes handled by a single API.


Step 1: Fetch Master Contracts (Paginated)

I only fetch master contracts with minimal data required for rendering.

Key Ideas

  • Return only essential fields (name, type, id)
  • Check if a contract has children
  • Do not fetch children yet

Example Pipeline

[
  { $match: { parent_id: null } },
  { $sort: { createdAt: -1 } },
  { $skip: 0 },
  { $limit: 10 },
  {
    $lookup: {
      from: "contract_master",
      localField: "_id",
      foreignField: "parent_id",
      as: "children"
    }
  },
  {
    $addFields: {
      hasChildren: { $gt: [{ $size: "$children" }, 0] }
    }
  },
  {
    $project: {
      name: 1,
      type: 1,
      hasChildren: 1
    }
  }
]

This allowed the frontend to render expandable nodes without loading unnecessary data.


Step 2: Fetch Children (On Expand)

When a user expands a master contract:

  • The same API is called with the master contract ID
  • A conditional aggregation pipeline runs
  • $graphLookup fetches children
  • I also check if each child has further descendants

Example

[
  { $match: { _id: ObjectId(masterId) } },
  {
    $graphLookup: {
      from: "contract_master",
      startWith: "$_id",
      connectFromField: "_id",
      connectToField: "parent_id",
      as: "children",
      maxDepth: 0
    }
  },
  { $unwind: "$children" },
  {
    $lookup: {
      from: "contract_master",
      localField: "children._id",
      foreignField: "parent_id",
      as: "grandChildren"
    }
  },
  {
    $addFields: {
      "children.hasChildren": {
        $gt: [{ $size: "$grandChildren" }, 0]
      }
    }
  },
  {
    $project: {
      "children._id": 1,
      "children.name": 1,
      "children.type": 1,
      "children.hasChildren": 1
    }
  }
]

Important Detail

I only fetch one level at a time. This keeps queries fast and predictable.


Step 3: Fetch Sub-Children

The same API handles deeper levels:

  • Pass child contract ID
  • Run similar aggregation
  • Return only that level’s data

This makes the system consistent and reusable.


Lazy Loading the Tree

Instead of loading the entire hierarchy upfront:

  • Data is fetched only when the user expands a node
  • Each level is fetched independently
  • The frontend simply merges new nodes into the tree

This drastically reduces both:

  • Initial load time
  • Total data transferred

Pagination Strategy

Pagination is applied at every level:

  • Masters → paginated
  • Children → paginated
  • Sub-children → paginated

This ensures:

  • Predictable performance
  • No unbounded data growth per request

Payload Optimization

I avoided sending full documents.

Each node only includes:

  • _id
  • name
  • type
  • hasChildren

This keeps responses lightweight and fast.


Frontend Simplification

Previously:

  • Backend built the entire tree
  • Frontend consumed a large nested structure

After refactor:

  • Backend returns flat, minimal nodes
  • Frontend incrementally builds the tree

This made the UI logic much simpler and more maintainable.


The Outcome

The improvements were significant:

MetricBeforeAfter
Response Time8–10s0.2–0.4s
Data VolumeVery largeMinimal
Query PatternRecursive (N+1)Single aggregation
ScalabilityPoorStable

Most importantly:

Performance became independent of total dataset size.


Key Takeaways

  • Move hierarchical logic into the database when possible
  • Avoid recursive API patterns for relational traversal
  • Design APIs around UI interaction patterns
  • Use lazy loading for tree-like data
  • Always control payload size

When Not to Use This Approach

$graphLookup is powerful, but not always the best choice:

  • Extremely deep hierarchies with high fan-out
  • Very large graphs where traversal becomes expensive
  • Cases where denormalization is more efficient

Final Thoughts

This wasn’t just a query optimization.

It was a shift in how I designed APIs:

  • From eager loading → lazy loading
  • From application-driven traversal → database-driven traversal
  • From heavy responses → minimal, interaction-based data

That shift is what unlocked the performance gains.