Reference Guides - Optimal Execution

From Almgren-Chriss to Reinforcement Learning

Apr 09, 2026

Every institutional trader faces the same fundamental problem: you must liquidate (or acquire) a large position X over a finite horizon T, but each share you trade moves the price against you. Trade too fast and you pay excessive market impact; trade too slowly and you bear the risk that the price drifts away while you still hold inventory. Optimal execution theory formalises this tension as a calculus-of-variations problem and delivers closed-form or numerically tractable schedules that balance expected cost against cost variance. This guide walks through ten progressively richer formulations, each one patching a deficiency in the last.

1. TWAP Baseline

The simplest possible schedule is Time-Weighted Average Price: divide the parent order X into equal slices and trade at a constant rate. The remaining inventory at time t is

and the trading rate is the constant v = X/T. TWAP implicitly assumes that temporary market impact is linear in trade rate, Δ S = η v, and that the impact coefficient η does not change throughout the day. Under these assumptions the total expected cost is simply

because each infinitesimal slice of size v dt costs η v in slippage, and integrating η v² over [0,T] gives η X²/T.

TWAP corresponds to zero risk aversion: the trader cares only about minimising expected impact and is indifferent to the variance of the execution cost. In a world where the mid-price follows a random walk with volatility σ, the cost variance of a TWAP schedule is maximal because inventory is held as long as possible. This is the baseline against which every subsequent model improves.

2. VWAP Extension

Intraday liquidity is not uniform. Equity markets exhibit a pronounced U-shaped volume profile: heavy volume at the open and close, thin volume over lunch. If temporary impact scales inversely with instantaneous volume, ηₖ = η₀ / Vₖ, then trading equal dollar amounts in every period is suboptimal. Volume-Weighted Average Price adapts the schedule so that the number of shares traded in period k is proportional to that period's expected volume:

n_k = X \,\frac{V_k}{\sum_{j=1}^{N} V_j}

The cost under time-varying impact becomes ∑ₖ ηₖ nₖ²/τ = η₀ ∑ₖ nₖ²/(Vₖ τ). Substituting the VWAP schedule shows that trading proportionally to volume minimises this sum, because it equalises the marginal cost of trading across periods. The proof is a direct application of the Cauchy-Schwarz inequality.

VWAP is the workhorse benchmark of agency execution desks. Its limitation is the same as TWAP's: it ignores price risk entirely. A trader who must sell a volatile stock still holds large inventory through thin midday periods, exposed to adverse price moves. To address this we need a framework that explicitly penalises risk.

3. Almgren-Chriss Linear Impact

Almgren and Chriss (2000) introduced the canonical formulation of optimal execution. The model decomposes market impact into two components: temporary impact η v that affects only the current trade, and permanent impact g nₖ that shifts the fundamental price for all subsequent trades. The trader chooses a discrete schedule ₁, …, nN\ with nₖ = xₖ₋₁ - xₖ to minimise the mean-variance objective

J = \sum_{k=1}^{N} \eta\,\frac{n_k^2}{\tau} \;+\; g\sum_{k=1}^{N} n_k\, x_k \;+\; \lambda\,\sigma^2\,\tau\sum_{k=1}^{N} x_k^2

subject to the boundary conditions x₀ = X and xN = 0. The first term is temporary impact cost, the second is permanent impact cost, and the third is the risk penalty weighted by the trader's risk-aversion parameter λ.

A key insight from Almgren and Chriss is that the permanent impact contribution to expected cost is path-independent: regardless of the schedule chosen, the total permanent impact cost equals 1/2g X². This means permanent impact does not affect the optimal schedule at all. The optimisation reduces to a tradeoff between temporary impact (which favours slow trading) and variance (which favours fast trading), with λ controlling the balance.

4. Continuous-Time Solution

Taking the continuous-time limit τ → 0 and applying the Euler-Lagrange equation to the Almgren-Chriss objective yields the second-order ODE

where the urgency parameter is

\kappa = \sigma\sqrt{\frac{\lambda}{\eta}}

This is a linear ODE with constant coefficients. The general solution subject to x(0) = X and x(T) = 0 is

x^*(t) = X\,\frac{\sinh\!\bigl(\kappa(T - t)\bigr)}{\sinh(\kappa T)}

The optimal trading rate follows by differentiation:

v^*(t) = -\dot{x}^*(t) = \kappa X\,\frac{\cosh\!\bigl(\kappa(T - t)\bigr)}{\sinh(\kappa T)}

The dimensionless product κ T is the single number that governs the shape of the entire schedule. When κ T ≪ 1 (low urgency: the stock is not very volatile, the trader is not very risk-averse, or impact is high), the schedule is nearly linear and close to TWAP. When κ T ≫ 1 (high urgency), the schedule front-loads aggressively, liquidating most of the position early and then trickling out the remainder. The transition between these regimes is smooth and monotonic.

5. Efficient Frontier of Execution

As the risk-aversion parameter λ varies from zero to infinity, the optimal schedule traces out an efficient frontier in the plane of expected cost versus cost standard deviation, directly analogous to the Markowitz mean-variance frontier in portfolio theory. The expected cost of the optimal schedule is

E[C^*] = \frac{1}{2}\,\eta\,\kappa\,X^2\,\coth(\kappa T)

and the variance of execution cost is

\text{Var}[C^*] = \frac{\sigma^2 X^2}{2\kappa}\left(\coth(\kappa T) - \frac{\kappa T}{\sinh^2(\kappa T)}\right)

At λ = 0 (equivalently κ = 0), we recover TWAP: minimal expected cost η X²/T but maximal variance 1/3σ² X² T. As λ → ∞ (κ → ∞), the trader executes immediately at time zero, paying the maximum impact cost η X² / τmᵢₙ but bearing zero variance.

The frontier is convex and decreasing, meaning that initial reductions in variance are cheap in terms of added cost, but further reductions become progressively more expensive. In practice, most execution algorithms operate in the mildly risk-averse region where κ T is between 1 and 3, achieving a substantial variance reduction for a modest increase in expected cost. The frontier also provides a natural way to benchmark any execution algorithm: if a realised (cost, variance) pair lies northeast of the frontier, the algorithm is suboptimal.

6. Square-Root Impact Model

Empirical studies across equities, futures, and foreign exchange consistently find that market impact follows a concave power law rather than a linear relationship. The widely cited square-root law states

\Delta S \approx \sigma\, c\left(\frac{n}{V}\right)^{\delta}

with δ ≈ 0.5 and c a dimensionless constant of order unity. This concavity means that doubling the trade size less than doubles the impact, which has profound implications for optimal scheduling.

Substituting square-root temporary impact into the continuous-time objective gives

J = \int_0^T \left[\eta\,|\dot{x}|^{3/2} + \lambda\,\sigma^2\,x^2\right] dt

The Euler-Lagrange equation is now nonlinear:

\frac{3}{4}\,\eta\,|\dot{x}|^{-1/2}\,\ddot{x} = \lambda\,\sigma^2\,x

No closed-form solution exists for this ODE. Numerical solutions reveal a schedule that is qualitatively similar to the Almgren-Chriss hyperbolic-sine trajectory but with two important differences: execution is more aggressive at the start (because the concave impact function makes large early trades relatively cheaper) and gentler near the end. The total cost is lower than the linear-impact prediction for the same parameters, reflecting the empirical reality that impact is less punishing than linear models assume.

7. Obizhaeva-Wang Transient Impact

A fundamental limitation of the Almgren-Chriss framework is the assumption that temporary impact vanishes instantaneously after each trade. In reality, the order book takes time to replenish after a large trade. Obizhaeva and Wang (2013) model this through a transient impact kernel that decays exponentially:

where ρ is the resilience rate, the speed at which liquidity providers refill the book. The cumulative price impact at time t from a sequence of trades ₖ\ at times ₖ\ is

S(t) - S(0) = \sum_{t_k \leq t} G_0\, n_k\, e^{-\rho(t - t_k)}

The optimal strategy in this model is qualitatively different from continuous Almgren-Chriss: the trader should execute in a series of discrete block trades separated by waiting periods, during which the order book recovers. The optimal inter-trade interval depends on the resilience rate ρ, with slower resilience dictating longer waits. In the limit ρ → ∞ (instant recovery), the model reduces to pure temporary impact and the Almgren-Chriss continuous schedule is recovered. In the limit ρ → 0 (permanent impact only), the schedule is irrelevant.

The Obizhaeva-Wang framework also explains why execution algorithms that "ping" the book with rapid small orders can be suboptimal: if resilience is slow, each successive ping hits a depleted book and the effective impact per share is higher than a single larger trade followed by a pause. The practical implication is that execution algorithms should estimate the resilience rate from order-book data and space trades accordingly.

8. Stochastic Liquidity

All preceding models treat the impact coefficient η as either constant or deterministically time-varying. In practice, liquidity is itself stochastic: bid-ask spreads widen and narrow unpredictably, depth fluctuates, and volatility clusters. When the impact coefficient η(t) is a known function of time (the simplest extension), the Euler-Lagrange equation becomes a variable-coefficient ODE:

2\lambda\sigma^2 x - \frac{d}{dt}\!\bigl[2\,\eta(t)\,\dot{x}\bigr] = 0

Expanding the derivative gives η(t) ẍ + η̇(t) ẋ - λσ² x = 0, which must generally be solved numerically. The qualitative behaviour is intuitive: the optimal schedule concentrates trading in periods where η(t) is low (liquidity is abundant) and reduces trading when η(t) is high (liquidity is thin).

When η(t) is taken to be the reciprocal of the expected volume profile, the solution naturally produces something resembling a risk-adjusted VWAP. This provides the theoretical justification for VWAP-like strategies: they are not merely convenient benchmarks but approximate solutions to the optimal execution problem with realistic intraday liquidity variation. The fully stochastic case, where η(t) is a random process observed in real time, leads to a dynamic programming formulation. The trader must decide at each instant whether the current liquidity is good enough to trade aggressively or whether to wait for better conditions. The Hamilton-Jacobi-Bellman equation for the value function V(x, η, t) is

\partial_t V + \min_v\!\left[\eta\, v^2 + \lambda\sigma^2 x^2 - v\,\partial_x V + \mathcal{L}_\eta V\right] = 0

where ℒ_η is the infinitesimal generator of the liquidity process. Closed-form solutions exist only for special cases (e.g., mean-reverting Ornstein-Uhlenbeck liquidity), but the framework is general.

9. Signal-Aware Execution

The models above assume the trader has no view on future price direction. In practice, many institutional orders are initiated precisely because the trader has a short-term alpha signal. If the signal predicts that the price will move against the position at rate α, delaying execution is costly not just because of variance but because of expected adverse drift. The augmented objective is

J = \int_0^T \left[\eta\,\dot{x}^2 + \lambda\,\sigma^2\,x^2 + \alpha\, x\right] dt

The Euler-Lagrange equation becomes

\ddot{x} = \kappa^2\, x + \frac{\alpha}{2\eta}

This is a nonhomogeneous linear ODE. The particular solution shifts the entire trajectory: when α > 0 (the price is expected to fall, adverse for a seller), the optimal schedule front-loads more aggressively than the zero-alpha Almgren-Chriss solution. When α < 0 (the price is expected to rise, favourable for a seller), the trader can afford to slow down and reduce impact costs.

In the multi-asset setting, the trader liquidates a vector of positions x(t) ∈ ℝᵈ with cross-asset covariance Σ and vector of alpha signals α. The objective generalises to

J = \int_0^T \left[\dot{\mathbf{x}}^\top H\, \dot{\mathbf{x}} + \lambda\,\mathbf{x}^\top \Sigma\, \mathbf{x} + \boldsymbol{\alpha}^\top \mathbf{x}\right] dt

where H is the matrix of impact coefficients (diagonal if cross-impact is negligible). The Euler-Lagrange system is a coupled set of linear ODEs that can be solved via matrix exponentials. Cross-asset covariance introduces a new effect: even if two assets have independent impact, correlated price moves mean that the optimal schedule for one asset depends on the inventory of the other. A risk-averse trader holding long positions in two highly correlated assets will liquidate both faster than if they were uncorrelated, because the combined position carries more risk.

10. Adaptive & RL-Based Execution

All closed-form models rely on parametric assumptions about impact, volatility, and liquidity that are at best approximations. Reinforcement learning offers a model-free alternative that can adapt to complex, non-stationary market dynamics. The execution problem maps naturally onto the RL framework: the state is sₜ = (xₜ, t, fₜ) where xₜ is remaining inventory, t is elapsed time, and fₜ is a vector of market features (spread, depth, recent volatility, order-flow imbalance). The action aₜ = vₜ is the trading rate. The one-step cost is

c(s_t, a_t) = \eta(s_t)\, v_t^2 + \lambda\,\sigma^2\, x_t^2

and the Bellman equation for the optimal value function is

V(x, t) = \min_{v}\!\left\{c(v, x, t) + \mathbb{E}\!\left[V(x - v\,\Delta t,\; t + \Delta t) \,\middle|\, s_t\right]\right\}

Model-free approaches such as deep Q-learning or policy gradient methods learn V or the optimal policy π∗(s) = argminᵥ Q(s, v) directly from historical execution data or a realistic simulator. The agent discovers schedule shapes that resemble Almgren-Chriss in calm markets but deviate substantially during high-volatility episodes, news events, or periods of unusual order-flow.

The practical advantage of RL is its ability to incorporate features that are impossible to model analytically: queue position, limit-order-book shape, time-of-day effects, and cross-asset signals can all enter the state vector. The practical disadvantage is sample efficiency: execution is an episodic problem with sparse, noisy rewards (the total cost is revealed only at the end of the schedule), and the state space is high-dimensional. Sim-to-real transfer is also challenging because simulator fidelity directly bounds policy quality. Current best practice combines a parametric baseline (typically Almgren-Chriss) with an RL-based residual policy that learns corrections to the baseline, inheriting the baseline's stability while capturing nonlinear effects.

These ten formulations compose a hierarchy of increasing realism. TWAP is the zero-information, zero-risk-aversion baseline that every practitioner understands. Almgren-Chriss introduces the fundamental cost-risk tradeoff and delivers the elegant hyperbolic-sine schedule. The square-root impact model corrects the linear impact assumption to match empirical data. Obizhaeva-Wang adds order-book resilience and explains why discrete block trades can outperform continuous schedules. Stochastic liquidity models capture the reality that market conditions fluctuate unpredictably. Signal-aware execution integrates alpha forecasts into the schedule. And reinforcement learning handles the full complexity of real markets at the expense of requiring substantial data and infrastructure. In practice, most systematic execution desks start with an Almgren-Chriss backbone calibrated to their empirical impact estimates, layer on intraday volume adjustment to approximate stochastic liquidity, and increasingly augment with learned components for latency-sensitive or alpha-driven flow. The mathematical progression from TWAP to RL is not merely academic: it mirrors the actual evolution of execution algorithms deployed across global markets over the past two decades.

Reference Guides - Market Impact Models — From Kyle's model to Almgren-Chriss
Measuring Market Impact: 24M Trades, Two Exchanges, One Answer — Empirical market impact from 24M crypto trades
Reference Guides - Order Book Dynamics — Queue dynamics, price formation, and microstructure

Delphic Alpha

Discussion about this post

Ready for more?