Backtesting explained

Reading time: 7 min Updated: 2026-04-24

Backtesting runs a strategy against historical data to estimate how it would have performed. This page covers walk-forward validation, look-ahead bias, survivorship bias, realistic slippage, and the specific reasons backtests of crypto strategies routinely overstate results.

Backtesting is the simulation of a strategy over historical data. The goal is to decide whether the strategy has enough edge to be worth running live. The backtest output is a PnL curve, a drawdown profile, and trade statistics — from which the operator chooses to deploy, re-tune, or discard.

The problem with backtests is not that they lie. It is that they tell a very specific kind of truth — "how the strategy would have performed on this history with these assumptions" — and operators consistently misread that truth as a forecast.

What an honest backtest reports

Metric	What it tells you	What it does not tell you
Total return	Cumulative PnL over the sample	How noisy the path was
Sharpe ratio	Return per unit of volatility	Tail risk; downside vs upside volatility
Max drawdown	Worst peak-to-trough in the sample	Drawdown possible outside the sample
Win rate	Percent of trades profitable	Distribution of win and loss sizes
Profit factor	Sum(wins) / Sum(losses)	How stable this ratio is over time
Exposure time	Percent of time capital was at work	Opportunity cost of idle capital
Trade count	Sample size of results	Whether all fills were realistic
Slippage + fee accounting	Post-cost profitability	Real-book depth at order size

If a backtest does not report all of these, it is an advertisement, not a backtest.

The four biases that kill retail backtests

1. Look-ahead bias

The strategy uses data that was not available at the time of the decision. The classic case is computing an indicator on the current bar's close and then trading inside that same bar. Also common: rebalancing against a universe chosen with knowledge of which tokens survived to today (hence "survivorship bias").

Fix: decisions made at time t must only use data available at t. Enforce by shifting all signals by at least one bar and by trading on the next bar's open, not the current bar's close.

2. Survivorship bias

The universe you are testing against is the universe that exists today. Every delisted token, every dead exchange, every failed protocol is missing. A mean-reversion strategy that "works" on today's universe would have been decimated by the universe that existed five years ago, because the losers are gone.

Fix: test against a point-in-time universe — the set of assets that were tradable on each date — which is expensive to assemble for crypto and nearly impossible for long-tail tokens. The next best fix is to limit backtest scope to top-N assets by liquidity, acknowledge the bias, and size accordingly.

3. Sample-period bias

The backtest window is a single slice of market history, and the slice you pick drives the result more than the strategy does. A grid on BTC/USDT from 2023-01 to 2024-01 looks perfect (range-bound). The same grid from 2024-02 to 2025-04 looks terrible (trending). Neither window is wrong; both are incomplete.

Fix: report results across multiple out-of-sample windows, including a full bull-bear-bull cycle. Report the distribution, not the single number.

4. Slippage under-modelling

The backtest fills at the historical mid price. Live markets fill you against the spread, and sometimes outside it when the book is thin or the move is fast. For grid bots running hundreds of trades per day, a 5-bps slippage error compounds to a very different end equity.

Fix: model realistic fills:

Taker orders at the worst visible price of the requested size at

that timestamp.

Maker orders fill only if price trades through the posted

level, not just touches it.

During high-volatility bars, widen the spread model; during low-

liquidity hours, cap order size to a realistic fraction of the bar volume.

No public backtest engine nails all of these. The pragmatic approach is to run the backtest, then discount the result — 20–40% lower expected return, 30–50% higher drawdown — to get something closer to what the live strategy will actually do.

Walk-forward validation

The honest replacement for "train on all history, claim it works" is walk-forward validation:

Pick an in-sample window (e.g. 2021-01 to 2022-01) and tune the

strategy on it.

Pick an out-of-sample window (2022-01 to 2022-04) and run the

tuned strategy against it without further tuning.

Slide the window forward (2021-04 to 2022-04 in-sample, 2022-04

to 2022-07 out-of-sample) and repeat.

Concatenate all the out-of-sample PnLs. That concatenation is

what the strategy can actually be expected to produce.

Walk-forward routinely reduces reported returns by 30–60% vs a single-window fit. Operators who do not run walk-forward are getting an over-fitted number.

Crypto-specific pitfalls

Exchange migration. A backtest of BTC/USDT on Exchange A from

2019 may stitch together data from an exchange that no longer exists. Liquidity and spreads are not transferable.

Stablecoin depeg. A strategy that uses USDT as the quote

currency is assuming USDT = $1 at every bar. This has been wrong for extended windows (May 2022, March 2023) and the backtest usually does not correct for it.

Token dilution / airdrop. Perpetual token supply changes

silently change the "price" over long windows.

Fee-schedule changes. Exchanges change maker/taker fees

quarterly. A 2020 backtest using 2026 fees is optimistic.

Futures funding baselines. Funding rates have trended down

since 2021 as liquidity matured; a 2018 funding-arb backtest is not a 2026 forecast.

Specific notes per strategy

Grid trading strategy — grid backtests over a single range

always look perfect. Re-backtest the same grid over the 2022 bear and the 2024 Q1 breakout; the numbers are very different.

DCA bot strategy — DCA backtests are the most honest but are

path-dependent on the start date. Multi-start backtest is the fix.

Arbitrage bots — backtests ignore counterparty risk and

transfer latency, which are the two largest live loss sources.

Signal trading bots — the signal provider's backtest almost

always suffers from survivorship bias; re-run against the operator's own execution policy.

The broader discipline is covered in Risk management in automated trading: no amount of backtest accuracy removes the need for live-account caps, because the one variable the backtest cannot simulate is the operator.