Common Quantitative Trading Mistakes in 2022


The field of quantitative finance is enthralling. In my opinion, the ability to make sense of financial markets using data, arithmetic, and statistics is a fantastic notion. I knew my career in the world of trading would be quantitative ever when I found the paradise of the derivatives market (rather than the discretionary path).


Although Bachelier's option pricing model (which was later followed by the Black-Scholes option pricing model) can be traced back to the early 1900s, the true evolution of quantitative finance began in the mid-'80s, when mathematicians and statisticians began developing quantitative models to predict (and trade) financial markets.


We can trace the origins of funds like Winton Capital, AHL(now Man AHL), Aspect Capital, and Renaissance Technologies back to the mid-'80s. Using large databases and data science, these funds used quantitative models to find (and trade) opportunities in financial markets (or at least an early version of what we know today as Data Science).

Quant/systematic funds exploded in popularity in the 1990s, when firms like Millennium Partners, D.E Shaw, LTCM, and AQR (to mention a few) raised large sums of money to trade systematic techniques. Quantitative funds were the trendiest area investors were flocking to due to the exponential rise in processing power and growing interest from the quantitative community (mainly Ph.D./MS from hard scientific departments).


Quant trading encompasses a wide range of trading tactics (anything from big-data analysis to HFT market-making). We'll concentrate on quant analysis and data science in this piece because they're extensively employed by different types of traders (both on the institutional and the retail side). There are four significant traps to avoid while developing a quantitative trading strategy, in my opinion, based on my experience with quant trading:


Understanding probability and statistics.

Implementation of the model.

Back-testing and simulation of a strategy.

Risk management is a term that refers to the process of

Let's go in to explore where we could fail after we've recognized these traps.


Understanding Probability and Statistics

Data science and quant trading are built on the foundation of statistical analysis. When analyzing data (particularly time-series), if we don't have a sufficient understanding of statistics/probability, we might easily fall into a variety of traps (and statistical concepts).


Distribution of the Normal

When it comes to simulating the dynamics of financial assets, the normal distribution assumption is by far the weakest assumption we can make. Although there have been numerous publications published on the assumption of normality in financial asset time series, we choose the Gaussian (normal) distribution since it allows us to evaluate data quickly.

We can utilize realized distribution (using different time frames) and check how it fits the normal distribution to assess how normal the return distribution of our asset(s) is because we recognize that "normality" can often be a weak assumption.



"Correlation" is something that most quant analysts and traders adore. Correlation is one of the most commonly used but misunderstood statistical concepts. The "Pearson Correlation" is the most commonly used correlation coefficient (named after Karl Pearson). In a nutshell, correlation is the linear relationship between two variables (X, Y). Correlation swings back and forth between (-1, negative linear relationship) and (+1, positive linear relationship). So, since it all seems so simple, how could we possibly go wrong?

To begin, we must define what a correlation is not. It's not a predictor (in the sense that it doesn't show causation, but rather a linear relationship). When using the correlation function, we must avoid making the following mistakes:


  1. Instead of returns (either log returns or actual returns), correlate prices — We frequently deal with asset prices while dealing with time series. The nature of asset values is "non-stationary." The term "non-stationary" refers to a process in which the asset exhibits a trend (or a non-mean-reverting process). We can plainly observe the effect of utilizing non-stationary data if we look at the Gold Spot price vs. US 10yr real yield. This is a graph of the Gold/Yield prices over time.


  1. Not thinking about sample size/frequency enough— When analyzing correlation, just like when analyzing volatility, we need to think about sample size/frequency a lot. Our correlation assessment is greatly influenced by both the size and frequency of the data. If we select a frequency that is too low, we risk conflating short-term activity with a long-term correlation. On the other hand, using a sample size that is too large (for example, 1-year correlation when trading a short-term strategy) is a concern. The difference between utilizing a 20-day window and a 180-day window is plainly seen in the correlation matrix heatmap below.


  1. Assuming long-term correlation persistence – One of the most common misconceptions about correlation analysis is that long-term correlations never break. In times of financial crisis and market collapse, this assumption is frequently thrown out the window, since correlations tend to break and shift to extremes. We can easily notice the big change in relationships if we look around March 2020.


We want to enter deals with a good risk-reward ratio as traders. Z-score is one method we utilize to estimate our risk-reward ratio. In a nutshell, the Z-score is a technique for calculating the distance (in standard deviations) between our observation and the distribution's mean (That score can also be transformed into percent with terms quite easily). Obviously, as traders, we want to seek extreme events because they offer the best risk-reward (assuming some sort of reversion/convergence toward the mean). The problematic thing about Z-score is that, like everything else in statistics, it is strongly reliant on sample size and frequency.



Model fitting is both an art and a science. When it comes to fitting our model, we must strike a balance between overfitting and underfitting. Both biases are most likely to contribute to our model's low performance.

Overfitting – overfitting occurs when our model captures the dynamic with excessive precision. Overfitting a model usually necessitates a large number of explanatory variables.


Our goal while training the model is to employ the fewest variables possible while yet achieving the highest prediction power. The idea is that we want to calibrate our model as little as possible while yet having a model that can give reliable results. The more variables we add, the more calibration we'll have to do, and the less it'll be able to deal with rapidly changing markets.


Data from in-sample and out-of-sample

The distinction between in-sample and out-of-sample is critical in any model construction because the overlap between our "training set" and "test set" would lead to erroneous findings. I can't tell you how many times I've been presented models that appeared to be incredibly effective and profitable, only to discover that they were only evaluated on the training set (which explains why they did so well...). We must divide our data sets into "training set" and "test set" when creating our sample data (and making sure they are not mixed together). Consider the following scenario to better appreciate the significance of the in-sample/out-of-sample split: We want to use the 1-month realized volatility to predict the 1-month USDJPY realized volatility...


We can get near perfect regression if we plot 1-month volatility against its prior day value (i.e., with a 1-day lag). why? since 90% of our data is the same (we added one new observation and removed one, but 18 observations are the same in both series). We may wrongly believe that the present 1-month realized vol is a decent predictor of future vol, but regressing the current 1-month against longer lag yields a completely different result.


Taking care of outliers

Whether we like it or not, outlier observations are a part of our data set. We can't ignore these outliers, but we do need to know how to deal with them so that our model isn't skewed by extreme observations. While we may be tempted to overlook (or eliminate) outliers, we should fight this temptation because actual trading will almost certainly add our model outliers. Obviously, we must distinguish between different types of outliers – if the observation is clearly erroneous (data error), we should discard it; however, if it is a true observation, we should accept it and let our model manage it.


Modeling and Simulation

We wish to back-test (or simulate) our model using historical (or produced) data now that we have a solid model based on rigorous statistical/data analysis. This is an important stage in the creation of our model since it allows us to examine (and study) how our model acts in a controlled environment. Although there are fewer opportunities for errors at this step (relative to previous stages), these errors can be costly since we will fail to discover flaws (or faults) in our model.


Putting different market regimes to the test

When we create a model, we want it to work perfectly every time. Unfortunately, because different methods perform well in different market environments, this is practically impossible (think about trend-following strategies in a choppy market). While we won't be able to create a foolproof model, we will be able to pinpoint where the model falls short. We should test our plan under several market regimes in order to discover these weak places (a regime-switch model is a good way to identify these regimes).


Keeping Track of Transaction Costs

We frequently use historical datasets or simulate them when we back-test/simulate our model. We tend to overlook transaction expenses when doing so (as this complicates our analysis). Ignoring TC will result in unrealistic outcomes (will produce relatively higher P&L than we should expect in live trading). We must consider the asset's liquidity, bid-ask spread, slippage, and other factors when studying TC. Taking all TC into account will bring our testing findings much closer to real-world results, allowing us to judge the model/profitability. 


Management of Risk

The risk management process is external to our model, but it is the mechanism that ensures our trading activity's survival. Even if our model loses money, prudent risk management will ensure that our capital is not completely wiped out. Position sizing, risk limits, and exit points (S/L or T/P) are all aspects of good risk management.


Sizing of the Position

The importance of position sizing is often underestimated by traders. Position sizing is something we don't give much mind to. Many factors must be considered when sizing a position, including account size, desired level of risk, and required margin to maintain the position. Furthermore, we should consider the underlying variance while sizing our position.


When the market becomes volatile, ignoring these considerations will backfire. When a transaction goes against them, some traders double their positions to try to average the entry level, but this might expose them to even more losses if the market continues to go against them (which will eventually force them to stop out). A good way to avoid this is to set the entire position size and the levels at which we would expand our position from the start (taking to account all of the above variables).


Risk Capacity

Our risk limits should be defined independently of our trading model/strategy as part of our risk management process. Risk limits should define the extent of our exposures and the greatest losses we are willing to accept given our risk criteria (account balance, margins, risk tolerance, etc..). We are more likely to incur substantial losses when the position turns against us if our risk limits are imprecise, thus we must be disciplined when running our strategy not to exceed the risk limitations. Different risk limitations will apply to different techniques. An option book risk, for example, will be characterized as a function of the greeks (delta, gamma, vega, theta, and so on). Still, a trend-following strategy will most likely be specified as a draw-down strategy (or max-loss).


We hope that this blog post will be beneficial for you. We will continue to create useful works in order to get inspired by everyone. We are sure that we will achieve splendid things altogether. Keep on following Finage for the best and more.  

You can get your Real-Time and Historical Forex Data with Finage free FX Data API key.

Build with us today!

Start Free Trial