From Polling to Event-Driven: A 70x Throughput Rewrite

March 4, 2026

Event-DrivenQueuesBatchingPostgreSQL

We started noticing that our PostgreSQL database was getting pinned at 70–95% CPU for long periods of time. Not spikes, but sustained load that began affecting unrelated queries.

Looking at the active queries, there were constantly processes streaming rows. The load wasn’t from one expensive query, but from something continuously running.

That eventually led back to a cron-driven pipeline.

The Old Pipeline

The system was split across a few services:

Cron service → triggered processing every 30 minutes
Versioning service → handled version creation and publishing
Product + User services → provided data needed for processing

The flow looked like this:

cron service scans the product table
filters records to process
calls the versioning service per record
versioning service fetches product + user data
performs versioning logic
writes back to the database

This worked fine at smaller scale. It didn’t hold up once the dataset grew.

Where It Started Breaking

The issue wasn’t a single slow query. It was how the system did work.

Every run scanned millions of rows, even if only a small subset had changed
Processing was sequential across services
Each record triggered the following:
- one HTTP call (cron → versioning)
- multiple DB reads (product + user)
- multiple DB writes

For ~10,000 records, this meant:

10,000 HTTP calls
~60,000+ database operations

Throughput stayed around ~1 record per second.

At that rate:

a single batch took a few hours
new cron runs started before previous ones finished
load stacked instead of resetting

That’s when the database started getting pinned.

What Changed

OLD (Cron-based, per-record)

Cron Trigger (30min interval)
↓
Cron Service
↓
Product Table (scan)
↓
Versioning Service (per record)
↓
Product + User Services
↓
Database Writes

Repeated scans
1 request per record
High DB + network overhead

NEW (Event-driven, batched)

Product Update
↓
Queue Table
↓
Cron Worker (batch consumer)
↓
Versioning Service
↓
Product + User Services (bulk reads)
↓
Database Writes (batched)

No table scans
Batched processing
Parallel workers

The fix was not inside the loop, but removing the loop entirely.

Work is no longer discovered by scanning, but captured at the time of change and processed in batches.

Queue (Triggering Work)

When a product becomes eligible, it is inserted into a queue table at write time (via a database trigger).

Instead of scanning large tables, the cron service now pulls only pending records from the queue.

Batching (Reducing Cross-Service Overhead)

Previously, the cron service called the versioning service once per record.

This was replaced with batched processing:

the cron service sends batches (150 IDs)
the versioning service processes the entire batch in one request

This reduces both network overhead and repeated database access.

Concurrency (Parallelizing Safely)

Multiple workers process the queue in parallel using:

FOR UPDATE SKIP LOCKED

Each worker claims a batch and processes it independently, allowing concurrency without coordination or duplicate work.

Bulk Processing Across Services (Doing Work Once)

Inside the versioning service, the workflow is no longer record-by-record:

product data is fetched in bulk
user data is fetched in bulk
versioning logic runs in-memory across the batch

Database writes are also grouped.

All operations for a batch are executed within a single transaction:

insert versions
update products
upsert published data
remove processed queue rows

This reduces write amplification and transaction overhead, since hundreds of operations are committed as a single batch instead of individual transactions, which contributed to the earlier CPU spikes.

Measured Impact

Measured before and after moving from cron-based polling to queue-driven batching.

Metric	Before	After
Execution	Cron + scan	Queue + batch
Service calls	1 / record	1 / batch
DB operations	~6–8 / record	~4 / batch
Throughput	~1/sec	~70/sec
Scaling	Table size	Change volume

The shift to queue-driven batching increased throughput by ~70x while reducing database load and cross-service overhead.

What I Learned

Polling does not scale with data size
Time-based scans repeatedly process the same dataset, even when little has changed. This leads to unnecessary load and poor scalability.
Work should be captured at the time of change
Moving from periodic scanning to change-driven triggers eliminates the need to search for work.
Batching reduces overhead across every layer
Network calls, database operations, and transactions become significantly cheaper when work is grouped instead of processed per record.
Throughput is often limited by system design, not query performance
The bottleneck was not a slow query, but the way work was distributed and executed.
Concurrency needs coordination primitives
Using FOR UPDATE SKIP LOCKED allowed safe parallel processing without duplicate work or contention.
Cross-service boundaries amplify inefficiency
Per-record service calls multiplied latency and load. Batch-based communication reduced this drastically.
Write amplification matters at scale
Grouping writes into a single transaction reduced database pressure and improved consistency.
Scaling should depend on change volume, not table size
Systems that scale with total data size degrade over time. Systems that scale with actual changes remain stable.

Closing

This wasn’t a database problem. It was a work distribution problem.

Once the system stopped scanning and started reacting to changes, both performance and scalability followed naturally.