Wills Education

Chapter 01

Surge isn't one model, it's three

The popular framing of Uber's surge pricing is that it's a single "raise the price when demand exceeds supply" rule. The actual implementation is three distinct ML systems running in real time, each with different inputs, different time horizons, and different optimisation targets.

	Demand model	Supply model	Matching engine
What it predicts	Ride requests per hex	Available drivers per hex	Optimal price multiplier
Time horizon	5 / 30 / 60 min	5 / 30 min	Real-time (per request)
Model class	GBT + deep learning	Sequence model	Reinforcement learning
Drives	Surge trigger	Driver incentives	1.4x / 2.1x badge

The first model is demand forecasting. Given a hex (Uber's spatial unit, roughly a city block), it predicts how many ride requests will arrive in the next 5 minutes, the next 30, and the next hour. The 5-minute prediction is what drives surge. The longer horizons drive driver incentives, "head to downtown in the next hour", and city-wide repositioning suggestions.

The second model is supply forecasting. Drivers are not a static pool, they log on and off, they accept or reject trips, they finish trips and re-position. Predicting how many drivers will be available where, and when, is a separate problem from predicting demand. Uber's system explicitly models driver behaviour as a function of recent trip earnings, time-of-day patterns, and the historical responsiveness of each driver to incentives.

The third model is the matching/pricing engine itself. Given a forecasted supply-demand imbalance, what's the price multiplier that brings the market back into balance without losing too many riders to abandonment or burning out drivers with surge fatigue? This is the model that surfaces as the "1.4x" or "2.1x" badge in the app.

Chapter 02

Why this is reinforcement learning, not rules

An early version of surge was a simple rules engine: if demand/supply > X then multiplier = Y. It worked, and it was interpretable, but it broke down under three conditions Uber kept hitting in practice. It over-corrected during transient spikes (a concert getting out doesn't need an hour of surge, it needs ten minutes). It under-corrected during slow-burn imbalances (rainy Saturday evenings). And it created perverse driver behaviour (drivers learned to log off in non-surge hexes and migrate, which made surge worse).

The current generation is a reinforcement learning system. The reward function balances completed trips, average rider price, average driver earnings, and a penalty for surge "fatigue", repeated price hikes in the same area. The system learns to be more aggressive in some markets, more conservative in others, and to anticipate events (sports, concerts, weather) that the rules engine couldn't reason about.

This is not magic. It's the same reinforcement learning patterns used in robotics, but with the constraint that every action affects real users in real time, and a wrong call costs Uber money in churn and driver attrition. Uber publishes papers on this; the implementation details are some of the most documented production RL systems anywhere.

Chapter 03

The marketplace constraint nobody else has to solve

Pricing models are common. Recommender systems are common. What makes Uber's problem unique is that every model decision affects two sets of users with conflicting interests, and a regulator looking over your shoulder.

If surge is too aggressive, riders churn and the press calls you predatory. If surge is too conservative, drivers churn because they can earn more on Lyft, DoorDash, or just by going home. If the matching is too greedy on rider experience (always serve the closest driver), you over-utilise some drivers and starve others. If the matching is too fair to drivers (round-robin), riders wait longer and the experience suffers. Every objective has a counter-objective, and the trade-off is regulated in many jurisdictions.

Hard cap

Rider wait time

model constraint, not soft weight

Hard floor

Driver earnings/hr

audit-defensible policy

Hard ceiling

Surge in emergencies

regulator-facing rule

Uber's answer is multi-objective optimisation with explicit constraints rather than implicit weighting. The rider satisfaction model has a hard cap on wait time. The driver model has a hard floor on earnings per active hour. The pricing model has hard ceilings during declared emergencies. These constraints are not hyperparameters tuned on a validation set, they are policy decisions that get audited externally.

Chapter 04

Latency is the entire story

Every Uber data scientist learns this in their first week: the model can be brilliant, but if it doesn't return in under 200ms, it doesn't ship. The end-to-end matching pipeline (rider taps button → driver gets ping) has a hard latency budget, and ML inference is one of many things competing for it.

Where 200ms goes (rider tap → driver ping)

ML inference is one of seven things competing for the same budget. Engineering choices everywhere are downstream of this constraint.

This forces architectural choices that most ML teams never think about. Models are pre-warmed and held in memory. Feature stores are sharded by city. Heavy feature computations run upstream of the request, not inline. The system caches recent forecasts and only recomputes when the underlying signals have moved meaningfully. None of this is glamorous; all of it is what makes the difference between a notebook prototype and a production marketplace.

Chapter 05

What you can copy from this playbook

Two ideas are universally useful even if you'll never build a marketplace. First: separate forecasting from action. Many ML systems collapse "predict the future state" and "decide what to do about it" into a single model, and they regret it later when the action policy needs to change for business reasons but the prediction is still valuable. Uber's separation between supply/demand forecasting and the surge/matching policy is a clean example, the forecasts get reused across many downstream actions.

Second: treat constraints as policy, not hyperparameters. If your model has to balance two objectives, codify the floor or ceiling explicitly rather than embedding it in a weighted loss function. The team that has to defend the trade-off, to executives, regulators, or end users, will thank you. "We never let driver earnings drop below X" is a sentence you can say in court. "We optimised the weighted Lagrangian" is not.

Chapter 06

Next-Gen Routing: Graph Neural Networks & Autonomous Fleets

Uber has upgraded its routing engine by transitioning from classical road segment weight calculations to deep Graph Neural Networks (GNNs). GNNs represent the city's road network as a dynamic graph, updating edge weights based on real-time transit telemetry from millions of active vehicles.

45%

ETA error reduction

since adopting deep GNN route models

150ms

Query latency

evaluating city-wide routes in real-time

Waymo

Fleet integration

coordinating driverless marketplace dispatches

UberUber's surge pricing: a masterclass in real-time ML economics.

Surge isn't one model, it's three

Why this is reinforcement learning, not rules

The marketplace constraint nobody else has to solve

Latency is the entire story

What you can copy from this playbook

Next-Gen Routing: Graph Neural Networks & Autonomous Fleets

More in Data & Analytics

Airbnb's smart pricing: the ML that trains hosts to earn more.

Zara's two-week design cycle, the original data-driven supply chain.

Ready to apply this playbook?