What Is Backtesting and Why It Matters
Backtesting is the process of testing a trading strategy against historical data to evaluate how it would have performed. It is the most important step before risking real capital.
A backtest answers fundamental questions: Does this strategy have a positive expected value? What is the maximum drawdown I should expect? How many consecutive losses might I face? Without backtesting, you are gambling — with it, you are making informed risk decisions based on statistical evidence.
However, backtesting is also the most commonly misused tool in trading. Poor methodology leads to strategies that look profitable in backtests but lose money in live trading. This guide covers how to backtest correctly.
The Overfitting Problem
Overfitting is the single biggest risk in backtesting. It happens when a strategy is tuned to fit historical data so precisely that it captures noise rather than signal.
Signs of overfitting include: a strategy that works perfectly on one specific date range but fails on others; a strategy with many parameters (more than 5-6) that each have narrow optimal ranges; or a strategy that requires frequent re-optimization to remain profitable.
The solution is out-of-sample testing. Never evaluate a strategy on the same data used to optimize it. FerroQuant uses walk-forward validation: the historical data is divided into rolling windows where each window has a training period (for optimization) followed by a test period (for evaluation). The strategy must prove itself on data it has never seen, repeatedly, across 7 years.
Walk-Forward Validation Explained
Walk-forward validation is the gold standard for backtesting. Here is how it works:
1. Divide your data into sequential windows (e.g., 6-month training + 2-month test) 2. Optimize strategy parameters on the training window 3. Run the strategy with those parameters on the test window (unseen data) 4. Record the test window results 5. Slide the window forward and repeat 6. Aggregate all test window results — this is your true expected performance
FerroQuant runs walk-forward validation across 7 years with overlapping windows. Each of the 165+ strategies is individually validated per instrument. A strategy that passes walk-forward validation on BTCUSDT may not pass on ETHUSDT — each instrument-strategy combination is treated independently.
This is computationally expensive (why we use Rust), but it produces the most honest assessment of strategy performance.
Key Metrics: Beyond Win Rate
Win rate alone is meaningless. A strategy with a 90% win rate can still lose money if the average loss is 10x the average win.
The metrics that matter are:
- Profit Factor: gross profit / gross loss. Above 1.5 is good, above 2.0 is excellent. - Maximum Drawdown: the largest peak-to-trough decline. This tells you the worst-case pain you will experience. - Sharpe Ratio: risk-adjusted return. Above 1.0 is acceptable, above 2.0 is strong. - Sortino Ratio: like Sharpe but only penalizes downside volatility. More relevant for directional strategies. - Recovery Factor: net profit / maximum drawdown. Tells you how quickly the strategy recovers from losses. - Trade Count: a strategy with a high Sharpe but only 10 trades is statistically unreliable. FerroQuant requires minimum 50 trades per walk-forward window.
When comparing strategies, focus on Sharpe ratio and maximum drawdown first. A strategy with a lower return but significantly lower drawdown is often preferable — you can increase position size to match returns while maintaining a better risk profile.
Common Backtesting Mistakes
These mistakes invalidate backtest results:
1. Look-ahead bias: using information that would not have been available at trade time (e.g., using daily close price to decide a trade made at market open). 2. Survivorship bias: only testing on instruments that exist today. Delisted tokens and bankrupt companies are excluded, inflating results. 3. Ignoring slippage and fees: a strategy that makes 0.1% per trade looks great until you account for 0.04% exchange fees + 0.02% average slippage. 4. Cherry-picking date ranges: testing only during bull markets or avoiding known crash periods. 5. Insufficient data: testing a daily strategy on 6 months of data produces statistically meaningless results.
FerroQuant addresses all of these: we use full historical data including delisted symbols, apply realistic slippage models per market, and require walk-forward validation across the full 7-year dataset.