TraceSense: Supplier Risk Intelligence
January 13, 2026
I wanted a way to answer a specific question: given a shipment context, which supplier is most likely to deliver on time? Average delay stats don't cut it—a supplier that's on time 90% of the time but occasionally runs 5 days late is a different risk profile than one that's consistently 1 day late. I needed a model that captured that variance, not just the mean.
Architecture
The system is three discrete stages I run sequentially: SQL extraction that builds training_data.csv, a PyTorch training run that produces supplier_model.pt, and a simulation script that loads the checkpoint and runs counterfactual inference.
graph LR
DB[(Supabase: POs, Batches, Dispatches)]
DB --> FE[build_training_dataset.py]
FE --> |training_data.csv| NN[train_supplier_embeddings.py]
NN --> |supplier_model.pt| CF[counterfactual_simulation.py]
CF --> |ranked transit predictions| OUT[Procurement Output]
There's no real-time inference here yet—it's an offline pipeline. The simulation script loads a fixed checkpoint and re-runs inference against all known suppliers for a given delivery context.
Feature Engineering
The tricky part was that nothing useful is stored in a single table. I had to join dispatches, purchase_orders, products, and users to get a single training row. The canonical query filters only DELIVERED dispatches with a non-null delivery_date—anything else is either in-flight or a data entry artifact I didn't want to train on.
From that join I derived three signals:
actual_transit_days:delivery_date - dispatch_datelateness_days:delivery_date - expected_delivery_endlog_quantity:log1p(quantity)to compress the range
I originally fed raw quantity into the model and the gradients were dominated by large orders. The model would fit those well and generalize poorly to smaller ones. Switching to log1p fixed that immediately.
I also filtered out rows where actual_transit_days < 0. Those exist because some dispatch records have the dates entered in the wrong order. It's not a lot of rows but they'd push the model in the wrong direction.
Supplier Embeddings
I originally tried treating supplier ID as a one-hot categorical feature. That worked fine with a small number of suppliers but doesn't generalize—if a new supplier appears, the model has no representation for it.
I switched to a nn.Embedding layer that maps each supplier to an 8-dimensional learned vector. During training, the MSE loss on actual_transit_days gradually adjusts those vectors so suppliers with similar delivery behavior end up close to each other in the latent space. I concatenate the supplier embedding with three context features—expected_window_width_days, log_quantity, shelf_life_days—then pass the combined vector through a two-layer MLP.
class SupplierEmbeddingModel(nn.Module):
def __init__(self, num_suppliers, embedding_dim=8):
super().__init__()
self.supplier_embedding = nn.Embedding(num_suppliers, embedding_dim)
self.network = nn.Sequential(
nn.Linear(embedding_dim + 3, 32),
nn.ReLU(),
nn.Linear(32, 1),
)
def forward(self, supplier_idx, features):
emb = self.supplier_embedding(supplier_idx)
x = torch.cat([emb, features], dim=1)
return self.network(x).squeeze()
The 8-dimensional size was a guess. I tried 4 and the loss plateaued earlier; 16 didn't improve things meaningfully. 8 felt like enough to capture the variance without overfitting on a relatively small dataset.
Counterfactual Simulation
The simulation script takes one delivery context—a fixed row from the training set—and runs inference across every supplier in the index. The output is a sorted list of predicted transit times, one per supplier.
for supplier_id, supplier_idx in list(supplier_to_idx.items())[:10]:
supplier_tensor = torch.tensor([supplier_idx], dtype=torch.long)
with torch.no_grad():
prediction = model(supplier_tensor, features).item()
results.append((supplier_id, prediction))
results.sort(key=lambda x: x[1])
This is the core "what-if" loop. For a given shipment—say, a cold-chain order with a 3-day delivery window—I can see which suppliers the model predicts will be fastest for that specific context, rather than just looking at historical averages.
The limitation is that it's ranking based on predicted mean transit time only. There's no variance estimate. A supplier that's consistently 3 days is ranked the same as one that's sometimes 1 day and sometimes 5. That's the next thing I'd fix.
Synthetic Data Generation
I built a Node.js data generator in sample_dataset/ to populate the Supabase schema with realistic events. The generator uses Faker.js to produce suppliers, buyers, purchase orders, batches, and dispatches with configurable delay probabilities per supplier profile.
I designed it to be deterministic via seeding so I could run reproducible experiments—change the model, regenerate the exact same dataset, and compare results cleanly.
Lessons Learned
- Log-transforming skewed features isn't optional: Raw quantity values across several orders of magnitude made training unstable.
log1pwas the fix. - One-hot encoding doesn't scale to new entities: The switch to embeddings was necessary the moment I wanted the model to say something useful about suppliers it hadn't seen many times.
- Offline pipelines are underrated: Not having a real-time inference endpoint forced me to be deliberate about what the model was actually trying to answer. The counterfactual framing came directly from that constraint.
What I'd Do Differently
The model currently predicts a single transit time with no uncertainty estimate. I'd add a quantile regression head—or switch to a probabilistic output entirely—so the simulation can surface variance, not just expected value. For procurement decisions, knowing that Supplier A has a tighter predicted range matters as much as the mean.
I'd also move inference to a FastAPI endpoint so it can be queried in real time from the traceability UI, rather than running as a script against a saved checkpoint.