PULSAR.INK

Backtesting explained

Reading time: 7 min Updated: 2026-04-24
Backtesting runs a strategy against historical data to estimate how it would have performed. This page covers walk-forward validation, look-ahead bias, survivorship bias, realistic slippage, and the specific reasons backtests of crypto strategies routinely overstate results.

Backtesting is the simulation of a strategy over historical data. The goal is to decide whether the strategy has enough edge to be worth running live. The backtest output is a PnL curve, a drawdown profile, and trade statistics — from which the operator chooses to deploy, re-tune, or discard.

The problem with backtests is not that they lie. It is that they tell a very specific kind of truth — "how the strategy would have performed on this history with these assumptions" — and operators consistently misread that truth as a forecast.

What an honest backtest reports

MetricWhat it tells youWhat it does not tell you
Total returnCumulative PnL over the sampleHow noisy the path was
Sharpe ratioReturn per unit of volatilityTail risk; downside vs upside volatility
Max drawdownWorst peak-to-trough in the sampleDrawdown possible outside the sample
Win ratePercent of trades profitableDistribution of win and loss sizes
Profit factorSum(wins) / Sum(losses)How stable this ratio is over time
Exposure timePercent of time capital was at workOpportunity cost of idle capital
Trade countSample size of resultsWhether all fills were realistic
Slippage + fee accountingPost-cost profitabilityReal-book depth at order size

If a backtest does not report all of these, it is an advertisement, not a backtest.

The four biases that kill retail backtests

1. Look-ahead bias

The strategy uses data that was not available at the time of the decision. The classic case is computing an indicator on the current bar's close and then trading inside that same bar. Also common: rebalancing against a universe chosen with knowledge of which tokens survived to today (hence "survivorship bias").

Fix: decisions made at time t must only use data available at t. Enforce by shifting all signals by at least one bar and by trading on the next bar's open, not the current bar's close.

2. Survivorship bias

The universe you are testing against is the universe that exists today. Every delisted token, every dead exchange, every failed protocol is missing. A mean-reversion strategy that "works" on today's universe would have been decimated by the universe that existed five years ago, because the losers are gone.

Fix: test against a point-in-time universe — the set of assets that were tradable on each date — which is expensive to assemble for crypto and nearly impossible for long-tail tokens. The next best fix is to limit backtest scope to top-N assets by liquidity, acknowledge the bias, and size accordingly.

3. Sample-period bias

The backtest window is a single slice of market history, and the slice you pick drives the result more than the strategy does. A grid on BTC/USDT from 2023-01 to 2024-01 looks perfect (range-bound). The same grid from 2024-02 to 2025-04 looks terrible (trending). Neither window is wrong; both are incomplete.

Fix: report results across multiple out-of-sample windows, including a full bull-bear-bull cycle. Report the distribution, not the single number.

4. Slippage under-modelling

The backtest fills at the historical mid price. Live markets fill you against the spread, and sometimes outside it when the book is thin or the move is fast. For grid bots running hundreds of trades per day, a 5-bps slippage error compounds to a very different end equity.

Fix: model realistic fills:

that timestamp.

level, not just touches it.

liquidity hours, cap order size to a realistic fraction of the bar volume.

No public backtest engine nails all of these. The pragmatic approach is to run the backtest, then discount the result — 20–40% lower expected return, 30–50% higher drawdown — to get something closer to what the live strategy will actually do.

Walk-forward validation

The honest replacement for "train on all history, claim it works" is walk-forward validation:

  1. Pick an in-sample window (e.g. 2021-01 to 2022-01) and tune the

strategy on it.

  1. Pick an out-of-sample window (2022-01 to 2022-04) and run the

tuned strategy against it without further tuning.

  1. Slide the window forward (2021-04 to 2022-04 in-sample, 2022-04

to 2022-07 out-of-sample) and repeat.

  1. Concatenate all the out-of-sample PnLs. That concatenation is

what the strategy can actually be expected to produce.

Walk-forward routinely reduces reported returns by 30–60% vs a single-window fit. Operators who do not run walk-forward are getting an over-fitted number.

Crypto-specific pitfalls

2019 may stitch together data from an exchange that no longer exists. Liquidity and spreads are not transferable.

currency is assuming USDT = $1 at every bar. This has been wrong for extended windows (May 2022, March 2023) and the backtest usually does not correct for it.

silently change the "price" over long windows.

quarterly. A 2020 backtest using 2026 fees is optimistic.

since 2021 as liquidity matured; a 2018 funding-arb backtest is not a 2026 forecast.

Specific notes per strategy

always look perfect. Re-backtest the same grid over the 2022 bear and the 2024 Q1 breakout; the numbers are very different.

path-dependent on the start date. Multi-start backtest is the fix.

transfer latency, which are the two largest live loss sources.

always suffers from survivorship bias; re-run against the operator's own execution policy.

The broader discipline is covered in Risk management in automated trading: no amount of backtest accuracy removes the need for live-account caps, because the one variable the backtest cannot simulate is the operator.

Further reading in this knowledge base

strategy's downside regardless of what the backtest said.