Today, the development of technology and the constant change of the needs of the modern world have made it necessary to make innovations in the financial sector. For these reasons, using the right technology in line with the needs is as important as determining the right trading strategy in the competitive market environment. For this reason, it has become important to receive, store and process correct data from the market. So in this article a review and an example of structuring the data obtained from Finage will be examined.
The importance of trading infrastructure strategy
The most important feature of quantitative finance is that it requires a significant amount of data engineering. These requirements are felt more when dealing with shorter periods than daily data. These are the methodologies, technologies, software, procedures and infrastructures used for data engineering and data acquisition, storage, separation and distribution to end customers in a quality manner.
This area may shine less than algorithms for achieving successful trading strategies, but this infrastructure setup is part of the professional operation. Defining the right infrastructure plays an important role in the medium and long-term journey of making money as part of business activities.
To date, there are many different approaches that have been developed to overcome this data engineering need. Although there are significant tradeoffs between all of these approaches, they can be equally valid, and understanding your own resources and environment is essential in choosing the right one. Let's briefly examine some of these approaches below.
Vendor environment approach
Some companies carry large amounts of trading capital that does not require high speed or fine details of time, so these companies can opt for the Microsoft ecosystem to move and store data. This approach has the advantage of using the same technology for everything, but provides seamless integration between different components. There are also advantages to using a commercial solution, such as providing vendor support. This approach seems appropriate for well-established companies trading in commodities. If there are processes that require dealing with high-speed or very specific or machine-learning-intensive strategies, this probably won't be the best environment.
Building a pure Python solution is another approach widely used in the industry. As an advantage of Python encoding and counting with many ML libraries, Jupyter notebooks can simplify the maintenance of structures and support self-documentation of procedures. Python may not be the best solution sometimes, but there are many operational benefits it can provide in a mixed environment. While Python performs extremely well for coding high-level maintenance software layers, it may not be a good choice for speed. Python outperforms the performance handicap, with shorter development times and easier collaboration between teams. This pure Python approach can be used very successfully in a small hedge fund with a well-established firm.
Another approach is to use C ++ or Java as an environment and to develop a custom framework for processing data. This approach emerges as an approach frequently used by companies operating intraday or high frequency with large amounts of data. You may need this approach for at least tick data in certain areas of your infrastructure. Regarding the choice of language to use, C ++ is the preferred choice for flagship enterprise firms, while Java is available at a lower operational cost, and some smaller firms have the ability to offer everything in Java.
It is natural for the data itself to be in a database and usually stored, but that could be an error. It can be said that this is valid for both SQL and non-SQL approaches. As the most important advantages of this approach; data can be easily consumed, reviewed and accessed, effectively eliminating another point of maintenance and trouble.
The common denominator of these described approaches is that they are generally based on a specific language or technology rather than a systems engineering approach. While this does not mean that there is no specific system engineering involved, it should be overlooked that it is often not the main approach.
An alternative can be used quite successfully, especially for independent traders, small hedge funds operating with limited resources, or small prop trading firms. Disclaimer: This approach will hurt feelings as it promotes the reuse of 40 years of technology.
UNIX: a file-based and collaborative small tools approach
It can be very useful to reuse the classic UNIX approach of files and small collaborative tools to set up operations based on UNIX systems engineering rather than DevOps or programming language based infrastructure.
The reason behind the word UNIX used, which also applies to Linux, is because we use FreeBSD.
Zero maintenance is essential for those who have reached a certain age (experience) like me and have already made the mistake of building complex things in the past. Using simple solutions is a must for small operations that lack both resources and teams.
In the sections below, we will talk about how to set up the FOREX data infrastructure service, how to backtest systems and how to accelerate them.
Quantitative analysis as a Business Need
In quantitative analysis, beyond any technical need, the priority is always a business need. And behind all technical solutions, it is important to have a full cost / resource analysis for both project and operational expenses. These basic checks are not always done specifically for people with only a technology background who may not have a natural understanding of medium and long term operational costs.
Finage for Collecting market data feeds
You always need a data provider, both in historical and real time. If you are going to use your data, it makes sense to store it locally as a cache, but private data companies will provide much better continuity and reliability in the process of storing and protecting this data.
There are several alternatives for retrieving and storing this data, using Python and SQL as an example of these alternatives. Since the number of records returned by Finage is limited to 10,000, you will definitely need to use a specific programming language as you will have to do a few searches until you get all the information.
Finage’s tips for the task can be listed as follows:
Tip 1: decision making about storage of your data
We will not use any database here. Instead, we would prefer to store all of the Buy / Sell information in a single file as space separated values. A maximum of 7300 files over a 20-year period is a number within the limits that any modern operating system can handle.
Tip 2: deciding to the effective time zones
Time zones matter, they often come across as a challenge and a point of pain. The best option we found was to simplify them and always use Eastern Time. The most important factor in doing this is that the New York cash session is the most important period of the day in terms of trading.
Tip 3: session opening time is a small detail
The answer to the question of whether we should group our data by New York cash session may be yes. However, from our experience, we think it is much better to use natural days in the Eastern Time zone. There may be objections to this, of course, but again, even when dealing with these questions, the simplest answer we found was to store the data by grouping them by natural days in Eastern Time.
Tip 4: arrange files in directories
The directory structure, which is a great success in the UNIX world, is free and should definitely be used because it is maintenance-free. For this reason, you can group your files in directories according to different time periods.
Tip 5: fixed-size fields are useful for storing the data
It does not store the data in CSV, and the use of fixed-size fields allows you to have great performance advantages, even if it seems a subtle dark. This simplifies the use of low-level routines in C and Java and enables faster retrieval of your backtest data.
Tip 6: code a simple tool to get market data API
Since you define where the data goes for each day, simply write a simple, standalone command line tool to retrieve and store all data for a given day. And surprisingly, you won't even need to program anything to get the data. You may still need a small script to submit requests because there will be approximately more than 10,000 records each day.
Tip 7: build helper commands which reuse your base commands.
Since retrieving data every day is an extremely time consuming activity, it would be quite logical to write a command to retrieve data for an entire year. In this case, typing in Python can make things easier, as parsing dates is much easier in Python.
In this article, we briefly reviewed how things are usually planned in terms of storing financial data for backtesting, as well as providing a really simple way to retrieve and store data. Based on this article, it has been seen that using software languages in the process of retrieving, storing and processing data will facilitate the process. It is also mentioned how to avoid data confusion and repetitive operations. we hope it can be used as a resource that can be used by anyone involved in this business.