Products

Charts

Resources

Products

Charts

Resources

Back to Blog

by Finage at June 13, 2021 7 MIN READ

Real-Time Data

Algorithmic Trading with Reinforcement Learning 

 

We are following Quantitative Finance offerings for a long time because we believe it doesn't solve the root of people's problems; nevertheless, we feel there is value in utilizing Quantitative Finance as a challenge to design better utility that does precisely that.

This is because working on an algorithmic trading challenge can help us better harness the creativity of machines when devising trading strategies. AlphaGo and AlphaGoZero are truly creative, not only because they can design new moves, but also because they can be modeled as Generative Models, allowing us to understand strategizing as a creative endeavor. I'd like to explain the benefits of Deep Reinforcement Learning for algorithmic trading from the perspective of someone who isn't in finance. 


A Quantitative Trading Theorem


Assuming that people do not withdraw money from the financial market until in unusual circumstances such as a market crash, what they do is simply move money from one asset to another in the hopes of increasing its value. As a result, all trading approaches based on this idea are ultimately about constructing a correlation matrix for all assets, because money redrawn from one item would be reflected in the values of another. If we have a portfolio with n assets and only one characteristic for each asset, we will end up with a massive n by n matrix, which is why we may wish to compute the correlation matrix using an expressive function like Neural Network. 


“At the end of the day, all trading approaches boil down to computing a correlation matrix for all assets.” 


Why Deep Reinforcement Learning Is Beneficial


Algorithmic trading requires a high level of speed. Quantum Finance introduces a new computing paradigm, but it also necessitates the development of logics for making trade decisions. In an ideal world, we'd like a framework that alleviates the bottleneck as much as feasible in decision-making. Self-supervised learning has a promising future in AI since it combines the superior performance of supervised methods with the “no-label-no-problem” of unsupervised methods. Self-supervised learning methods all have one thing in common: they create the learning objectives that will be used to teach them. Their learning procedure is as follows: the agent (1) makes a prediction, (2) observes the actual event, (3) examines the forecast for errors, and lastly (4) learns from the error. Yes, I just defined what Reinforcement Learning is all about.

Let's look at why Reinforcement Learning works for algorithmic trading now that we know what it is. 


1. Exploit and Explore in a Timely Manner


Rather of focusing on constructing models for most domain-specific problems with well-defined objective functions, we should focus on improving the environment's representation to train the model. Even given an acceptable representation, people find it difficult to design a trading strategy since there are simply too many alternatives. Human traders have a tendency to rely on only a few signals as their major tools throughout time. Fortunately, unlike other frameworks, reinforcement learning deals with both prediction and control (for example, market prices) (e.g., portfolio allocation). We can train trading techniques offline using offline reinforcement learning and venture into unexplored territory. The execution latency required for high-frequency trading is drastically reduced as a result of this. 

 

Offline Reinforcement Learning, in contrast to Online Reinforcement Learning, trains the agent with a fixed dataset and no further incoming data. This improves learning efficiency while while ensuring policy "completeness." Because the existing policy in online learning is based on a stochastic future, it is imperfect. Though today's entire approach may not be applicable in the future, it is more robust if it is complete. 
The agent is taught on a preset dataset of encounters "offline" rather than on the move in offline-reinforcement learning (online). Offline Reinforcement Learning faces an existential challenge in most cases: We wouldn't know the targets of the agent's actions if it acted too off the beaten path given a dataset with fixed targets. However, in algorithmic trading, we can always compute the various targets with the prices, which saves us a lot of time and effort in terms of control. 


“...even with an appropriate representation, deriving a trading strategy from it is a challenging task for humans since there are simply too many options.” 


2. Invest in the future rather than the past. 


Despite the fact that historical data is all we have, it is simply not a good predictor of the future for most non-stationary problems, especially in financial markets where events are constantly changing. Model-based reinforcement learning has a distinct philosophy: instead of attempting to forecast the future based on the past, the agent may envision all of the future possibilities and plan ahead. If you don't want to do your homework, for example, you can either (1) gamble that your teacher will not bring it up, or (2) prepare a response for each of the two possibilities, as your teacher may or may not bring it up.

You will be in control of the situation if you choose the latter choice, regardless of what happens next. When there are many options, planning for the future (rather than forecasting it) can help to alleviate the stress of ambiguity. We are now dealing with the problem of capacity rather than precision, which is much easier to command, because we are focusing on preparing for the biggest possibilities. In fact, I feel that RL's usage of the Markov Decision Process empathizes with this unique approach to continuous predictive control, in which we only use the current state as the latent space to change the reaction to the future while discarding the past. 

 

Markovian Reinforcement Learning is a GAN-like generative model in which a generator (our policy) acts as a Bayes model to predict a posterior (action) based on sample states collected from a prior dataset (s). The discriminator is fully trained to recognize d(s) in a single agent context, therefore the model plays a reward-maximization game. In a multi-agent context, other players influence the discriminator (market), causing d(s) to become non-stationary and the agent to become an Empirical Bayes in a MinMax game that converges at Nash equilibrium. 


A plug-in framework like Model-based Reinforcement Learning should be explored as a solution for an Empirical Bayes estimator because the financial market is inherently non-stationary. Regardless, we must remember that the state-action value prediction is occasionally incorrect, necessitating a decoupling of prediction and decision. 


“Despite the fact that historical data is all we have, it is simply insufficient to predict the future for most non-stationary problems.” 


3. Prediction and decision are separated. 
Separating prediction and choice is a long-standing concept in inferential statistics. “Making a right prediction at a specific time does not guarantee that the decision based on that prediction will be accurate,” the logic goes. There are a variety of causes for this; for instance, imagine the following case, in which two objectives follow two different normal distributions. Though both the left and right have the same expectation, the right's standard deviation is much higher, implying more uncertainty, in which case we should take different actions (e.g., seeking more diversification on RHS). 

 

Because it has lower variances, the linear model on the LHS is more robust than the one on the RHS. To lessen the uncertainty on the RHS, we may require additional information at each step. 


Because stochastic predictors (e.g., stock predictors) frequently require computing the expectation of a target distribution, the argument for separating prediction and judgment should be strong in Algorithmic Trading. Currently, Reinforcement Learning uses Actor-critic approaches, in which a model consists of a Critic (which predicts the state-action value) and an Actor (which makes decisions based on, but not limited by, the Critic's forecast). There's no need to be concerned about their alignments because both functions are trained simultaneously and end-to-end. The value function (Critic) should, however, be regularized to account for the non-stationary nature of the state distribution. 


“Making an accurate prediction at a specific time does not ensure that the choice based on that prediction will be accurate.” 


Finally, I'd like to make a few observations. The ideal game for someone who enjoys playing games. 


Trading is nothing more than a game of the "Greater Fools" for the common person. Your objective is to identify people who are more ignorant than you in order to sell them something of lower worth. If everyone knows an asset will fail, no one will be ready to trade it; hence, the financial market is one of the few human innovations that generates wealth from chaos, and if winning in any form is about minimizing environmental uncertainty, there will be no inherent win-win scenario in trading.

 

To put it another way, trading financial assets is not the same as trading commodities; it is an adversarial game in which your only goal is to defeat a stochastic opponent (the market, or another trading bot) who is playing by the same rules (ideally) but with incomplete knowledge. Even better, this stochastic adversary is made up of several players with whom you can collaborate: One of the advantages of working together to benefit all is that you may ride the trend, but only for a limited time.

 

That's everything there is to it. Reinforcement Learning seems to be particularly good at games with simple rules but a lot of probability. The game, like Go, has rules that can be learned in under 60 minutes, but it has more broad configurations than the number of atoms in the Universe. Reinforcement Learning can beat the stock market if it can beat the game of Go.

Back to Blog

Request a consultation

Blog

Blockchain Innovations: How They’re Reshaping Cryptocurrency Markets

Blockchain technology, initially linked solely to Bitcoin, has significantly broadened its impact across the entire cryptocurrency market. This foundational technology of digital currencies is continually evolving, introducing new mechanisms and features that reshape how cryptocurrency trading, in

Smart Contract Advances on DEX Platforms: Opportunities and Risks

Decentralized exchanges (DEXs) have transformed our approach to financial transactions within the blockchain environment. By utilizing smart contracts, DEXs create a trustless platform where users can conduct transactions directly from their wallets, eliminating the need for traditional financial

Read more

Please note that all data provided under Finage and on this website, including the prices displayed on the ticker and charts pages, are not necessarily real-time or accurate. They are strictly intended for informational purposes and should not be relied upon for investing or trading decisions. Redistribution of the information displayed on or provided by Finage is strictly prohibited. Please be aware that the data types offered are not sourced directly or indirectly from any exchanges, but rather from over-the-counter, peer-to-peer, and market makers. Therefore, the prices may not be accurate and could differ from the actual market prices. We want to emphasize that we are not liable for any trading or investing losses that you may incur. By using the data, charts, or any related information, you accept all responsibility for any risks involved. Finage will not accept any liability for losses or damages arising from the use of our data or related services. By accessing our website or using our services, all users/visitors are deemed to have accepted these conditions.

Finage LTD 2024

Copyright