Backtesting explained
Backtesting is the simulation of a strategy over historical data. The goal is to decide whether the strategy has enough edge to be worth running live. The backtest output is a PnL curve, a drawdown profile, and trade statistics — from which the operator chooses to deploy, re-tune, or discard.
The problem with backtests is not that they lie. It is that they tell a very specific kind of truth — "how the strategy would have performed on this history with these assumptions" — and operators consistently misread that truth as a forecast.
What an honest backtest reports
| Metric | What it tells you | What it does not tell you |
|---|---|---|
| Total return | Cumulative PnL over the sample | How noisy the path was |
| Sharpe ratio | Return per unit of volatility | Tail risk; downside vs upside volatility |
| Max drawdown | Worst peak-to-trough in the sample | Drawdown possible outside the sample |
| Win rate | Percent of trades profitable | Distribution of win and loss sizes |
| Profit factor | Sum(wins) / Sum(losses) | How stable this ratio is over time |
| Exposure time | Percent of time capital was at work | Opportunity cost of idle capital |
| Trade count | Sample size of results | Whether all fills were realistic |
| Slippage + fee accounting | Post-cost profitability | Real-book depth at order size |
If a backtest does not report all of these, it is an advertisement, not a backtest.
The four biases that kill retail backtests
1. Look-ahead bias
The strategy uses data that was not available at the time of the decision. The classic case is computing an indicator on the current bar's close and then trading inside that same bar. Also common: rebalancing against a universe chosen with knowledge of which tokens survived to today (hence "survivorship bias").
Fix: decisions made at time t must only use data available at t. Enforce by shifting all signals by at least one bar and by trading on the next bar's open, not the current bar's close.
2. Survivorship bias
The universe you are testing against is the universe that exists today. Every delisted token, every dead exchange, every failed protocol is missing. A mean-reversion strategy that "works" on today's universe would have been decimated by the universe that existed five years ago, because the losers are gone.
Fix: test against a point-in-time universe — the set of assets that were tradable on each date — which is expensive to assemble for crypto and nearly impossible for long-tail tokens. The next best fix is to limit backtest scope to top-N assets by liquidity, acknowledge the bias, and size accordingly.
3. Sample-period bias
The backtest window is a single slice of market history, and the slice you pick drives the result more than the strategy does. A grid on BTC/USDT from 2023-01 to 2024-01 looks perfect (range-bound). The same grid from 2024-02 to 2025-04 looks terrible (trending). Neither window is wrong; both are incomplete.
Fix: report results across multiple out-of-sample windows, including a full bull-bear-bull cycle. Report the distribution, not the single number.
4. Slippage under-modelling
The backtest fills at the historical mid price. Live markets fill you against the spread, and sometimes outside it when the book is thin or the move is fast. For grid bots running hundreds of trades per day, a 5-bps slippage error compounds to a very different end equity.
Fix: model realistic fills:
- Taker orders at the worst visible price of the requested size at
that timestamp.
- Maker orders fill only if price trades through the posted
level, not just touches it.
- During high-volatility bars, widen the spread model; during low-
liquidity hours, cap order size to a realistic fraction of the bar volume.
No public backtest engine nails all of these. The pragmatic approach is to run the backtest, then discount the result — 20–40% lower expected return, 30–50% higher drawdown — to get something closer to what the live strategy will actually do.
Walk-forward validation
The honest replacement for "train on all history, claim it works" is walk-forward validation:
- Pick an in-sample window (e.g. 2021-01 to 2022-01) and tune the
strategy on it.
- Pick an out-of-sample window (2022-01 to 2022-04) and run the
tuned strategy against it without further tuning.
- Slide the window forward (2021-04 to 2022-04 in-sample, 2022-04
to 2022-07 out-of-sample) and repeat.
- Concatenate all the out-of-sample PnLs. That concatenation is
what the strategy can actually be expected to produce.
Walk-forward routinely reduces reported returns by 30–60% vs a single-window fit. Operators who do not run walk-forward are getting an over-fitted number.
Crypto-specific pitfalls
- Exchange migration. A backtest of BTC/USDT on Exchange A from
2019 may stitch together data from an exchange that no longer exists. Liquidity and spreads are not transferable.
- Stablecoin depeg. A strategy that uses USDT as the quote
currency is assuming USDT = $1 at every bar. This has been wrong for extended windows (May 2022, March 2023) and the backtest usually does not correct for it.
- Token dilution / airdrop. Perpetual token supply changes
silently change the "price" over long windows.
- Fee-schedule changes. Exchanges change maker/taker fees
quarterly. A 2020 backtest using 2026 fees is optimistic.
- Futures funding baselines. Funding rates have trended down
since 2021 as liquidity matured; a 2018 funding-arb backtest is not a 2026 forecast.
Specific notes per strategy
- Grid trading strategy — grid backtests over a single range
always look perfect. Re-backtest the same grid over the 2022 bear and the 2024 Q1 breakout; the numbers are very different.
- DCA bot strategy — DCA backtests are the most honest but are
path-dependent on the start date. Multi-start backtest is the fix.
- Arbitrage bots — backtests ignore counterparty risk and
transfer latency, which are the two largest live loss sources.
- Signal trading bots — the signal provider's backtest almost
always suffers from survivorship bias; re-run against the operator's own execution policy.
The broader discipline is covered in Risk management in automated trading: no amount of backtest accuracy removes the need for live-account caps, because the one variable the backtest cannot simulate is the operator.
Further reading in this knowledge base
- What is automated crypto trading? — the broader category.
- Risk management in automated trading — the caps that bound any
strategy's downside regardless of what the backtest said.