5 HFT Secrets Every Quant Trader Should Know

From order flow to lock-free buffers — the building blocks of high-frequency trading

Mar 04, 2026

Whether you're prepping for a quant trading interview or sharpening your market microstructure intuition, these five drills cover the core building blocks of high-frequency trading: reading order flow, detecting toxic liquidity, estimating fair value, quoting optimally, and building the infrastructure that makes it all run in nanoseconds.

Each drill includes the problem statement, the key insight, and a working Python implementation.

1. Order Flow Imbalance (OFI)

The drill: Compute OFI from L1 order book snapshots (best bid/ask price & size).

Why it matters: OFI, introduced by Cont, Kukanov & Stoikov (2014), captures supply/demand shifts at the top of book. When the bid queue grows or the ask queue shrinks, buying pressure increases. A simple linear regression of price changes on OFI explains 40-65% of short-term price variance — making it one of the most powerful microstructure signals in production systems.

The key insight is tracking changes in queue sizes, conditioned on whether the price level itself moved.

OFI(t) = ΔBid(t) − ΔAsk(t)
where ΔBid = Qbid · 1{Pbid ↑} + (Qbid − Qbid,prev) · 1{Pbid =} − Qbid,prev · 1{Pbid ↓}
ΔP = β · OFI + ε (R² ≈ 0.40–0.65 at 10s horizons)

import numpy as np, pandas as pd

def compute_ofi(ob: pd.DataFrame) -> pd.Series:
    """ob must have columns: bid_price, bid_size, ask_price, ask_size"""
    bp, bs = ob['bid_price'], ob['bid_size']
    ap, asize = ob['ask_price'], ob['ask_size']

    # Bid side contribution
    bid_up   = (bp > bp.shift(1)).astype(int)
    bid_same = (bp == bp.shift(1)).astype(int)
    bid_dn   = (bp < bp.shift(1)).astype(int)
    ofi_bid  = bid_up * bs + bid_same * (bs - bs.shift(1)) - bid_dn * bs.shift(1)

    # Ask side contribution
    ask_dn   = (ap < ap.shift(1)).astype(int)
    ask_same = (ap == ap.shift(1)).astype(int)
    ask_up   = (ap > ap.shift(1)).astype(int)
    ofi_ask  = ask_dn * asize + ask_same * (asize - asize.shift(1)) - ask_up * asize.shift(1)

    return ofi_bid - ofi_ask  # positive = net buying pressure

O(n) time · O(1) space per tick

2. VPIN — Volume-Synchronized Probability of Informed Trading

The drill: Implement VPIN using Bulk Volume Classification on trade data.

Why it matters: VPIN (Easley, Lopez de Prado, O'Hara 2012) answers a critical question: is the flow hitting my quotes informed or noise? It groups trades into equal-volume buckets (volume-time instead of clock-time) and measures how one-sided each bucket is. VPIN spiked dramatically before the 2010 Flash Crash — it's now a standard toxicity metric for market makers deciding whether to widen spreads or pull quotes entirely.

VPIN = (1/n) · ∑τ=1..n |Vbuy(τ) − Vsell(τ)| / Vbucket
Vbuy = V · Φ(ΔP / σ) Vsell = V · (1 − Φ(ΔP / σ))
where Φ is the standard normal CDF (Bulk Volume Classification)

import numpy as np
from scipy.stats import norm

def compute_vpin(prices, volumes, bucket_size, n_buckets=50):
    """
    prices, volumes: arrays of trade prices and quantities.
    bucket_size: volume per bucket (e.g. daily_vol / 50).
    """
    # BVC: classify each trade's volume as buy or sell
    dp = np.diff(prices, prepend=prices[0])
    sigma = np.std(dp[dp != 0]) or 1e-10
    buy_pct = norm.cdf(dp / sigma)
    buy_vol = volumes * buy_pct
    sell_vol = volumes * (1 - buy_pct)

    # Accumulate into equal-volume buckets
    cum_vol = np.cumsum(volumes)
    bucket_ids = (cum_vol // bucket_size).astype(int)

    imbalances = []
    for b in range(bucket_ids[-1] + 1):
        mask = bucket_ids == b
        bv = buy_vol[mask].sum()
        sv = sell_vol[mask].sum()
        imbalances.append(abs(bv - sv))

    imb = np.array(imbalances)
    # Rolling VPIN over n_buckets
    if len(imb) < n_buckets:
        return np.array([imb.sum() / (bucket_size * len(imb))])
    vpin = np.convolve(imb, np.ones(n_buckets), 'valid') / (n_buckets * bucket_size)
    return vpin

# VPIN near 0 = balanced flow; near 1 = highly toxic

O(n) time · O(B) space where B = num buckets

3. Microprice

The drill: Compute the microprice from L1 order book data.

Why it matters: The midprice treats bid and ask equally, but that's wrong when queue sizes differ. Stoikov (2018) showed that the microprice — which weights each side by the opposite side's depth — is a better short-term price predictor because it incorporates queue imbalance. If bid size is much larger than ask size, the microprice sits above the mid, correctly predicting upward pressure. It's a martingale by construction and the simplest meaningful upgrade over the midprice.

Pmicro = Pask · Qbid / (Qbid + Qask) + Pbid · Qask / (Qbid + Qask)
Pmicro − Pmid = spread · (imbalance − 0.5)
where imbalance = Qbid / (Qbid + Qask)

import pandas as pd

def microprice(bid_price, ask_price, bid_size, ask_size):
    """Vectorized microprice computation."""
    total = bid_size + ask_size
    return (bid_size * ask_price + ask_size * bid_price) / total

def microprice_signal(ob: pd.DataFrame, lag: int = 1) -> pd.Series:
    """Microprice return as a signal."""
    mp = microprice(ob['bid_price'], ob['ask_price'],
                    ob['bid_size'], ob['ask_size'])
    return mp.pct_change(lag)

# micro vs mid: micro adjusts for imbalance
# If bid_size >> ask_size, micro > mid (predicts price going up)
# Difference: micro - mid = spread * (imbalance - 0.5)

O(1) per tick · O(n) vectorized

4. Avellaneda-Stoikov Optimal Quotes

The drill: Compute optimal bid/ask quotes using the Avellaneda-Stoikov model.

Why it matters: This is the foundational market making model. Avellaneda & Stoikov (2008) solved the problem of where to place your bid and ask quotes given your current inventory, the asset's volatility, and how much time remains in the trading session. The key idea: your reservation price (where you'd trade at fair value) shifts away from mid in proportion to your inventory. When you're long, you lower your ask to encourage selling. The optimal spread depends on volatility, risk aversion, and order arrival intensity.

r(t) = S(t) − q · γ · σ² · (T − t)
δ = γ · σ² · (T − t) + (2/γ) · ln(1 + γ/k)
bid = r − δ/2 ask = r + δ/2
where q = inventory, γ = risk aversion, k = order arrival intensity

import numpy as np

def avellaneda_stoikov(mid, inventory, sigma, gamma, k, T_rem):
    reservation = mid - inventory * gamma * sigma**2 * T_rem
    spread = gamma * sigma**2 * T_rem + (2/gamma) * np.log(1 + gamma/k)
    bid = reservation - spread / 2
    ask = reservation + spread / 2
    return {'bid': bid, 'ask': ask, 'reservation': reservation,
            'spread': ask - bid, 'skew': mid - reservation}

O(1) time · O(1) space

5. SPSC Lock-Free Ring Buffer

The drill: Implement a single-producer single-consumer lock-free ring buffer.

Why it matters: Every HFT system needs to pass market data from a feed handler to a strategy engine with minimal latency. The SPSC ring buffer is the fundamental IPC primitive that makes this possible — zero locks, zero allocations, deterministic latency. The producer writes at the tail, the consumer reads at the head, and neither ever waits for the other. In C++, this runs at 2-5 nanoseconds per operation (1B+ ops/sec). The critical optimization is cache-line padding between head and tail to prevent false sharing.

class SPSCRingBuffer:
    """Lock-free SPSC ring buffer (Python simulation of C++ pattern)."""
    def __init__(self, capacity=1024):
        self.capacity = capacity
        self.buffer = [None] * capacity
        self.head = 0  # consumer reads here
        self.tail = 0  # producer writes here
        # In C++: head/tail would be std::atomic<size_t>
        # with cache line padding (64 bytes) between them

    def push(self, item) -> bool:
        """Producer only. Returns False if full."""
        next_tail = (self.tail + 1) % self.capacity
        if next_tail == self.head:  # full
            return False
        self.buffer[self.tail] = item
        # In C++: self.tail.store(next_tail, memory_order_release)
        self.tail = next_tail
        return True

    def pop(self):
        """Consumer only. Returns None if empty."""
        if self.head == self.tail:  # empty
            return None
        item = self.buffer[self.head]
        # In C++: self.head.store(next, memory_order_release)
        self.head = (self.head + 1) % self.capacity
        return item

# C++ version: ~2-5ns per push/pop (1B+ ops/sec)
# Key: cache line padding between head and tail (64 bytes)
# struct alignas(64) { atomic<size_t> head; };
# struct alignas(64) { atomic<size_t> tail; };

O(1) push/pop · zero locks · zero allocations

These five drills span the full HFT stack — from reading the order book (OFI, Microprice), to detecting when to pull quotes (VPIN), to setting optimal prices (Avellaneda-Stoikov), to building the infrastructure that delivers data in nanoseconds (SPSC buffer). Master these and you'll have a solid foundation for any quant trading interview or system design discussion.

This post is part of the Quant Trading Drill Series — 150 hands-on coding exercises covering microstructure, statistical testing, market making, ML for alpha, and HFT systems.

Building a Market-Maker on Hyperliquid — Part III: The Backtester — Building and backtesting a crypto market-making engine
Crypto Orderflow Alpha Report — Feb 2026 — Decoding crypto market microstructure through order flow
Building a Production-Grade Data Streamer for Hyperliquid — Building a real-time WebSocket data pipeline for crypto trading

Delphic Alpha

Discussion about this post

Ready for more?