*Using Commitment Of Trader data from CFTC to identify potential correlations to inform trade positioning*
Welcome back to the blog! Happy Friday to you all, I hope you have had a wonderful week and you are getting stuck in to 2021 to make it a good year. Most of us are stuck at home and with the entertainment industry closed, our time usually spent on hobbies is now sat waiting to be used. I am doing my best to create new and exciting content like this post and my previous blog post that was about stock portfolio optimisation in an attempt to keep you entertained and also expand your minds.
In this blog post I am going to once again show you a new project that I have been working on for a friend. I will share with you the project brief, how I planned to initially start the project and the Python programming code and data analysis used to complete it. I will also discuss the key elements of the project from a financial markets/trading perspective and finish with some thoughts on the future potential of this project.
Let’s get right in to it and start by looking at the project brief.
My friend messaged me asking for my help on investigating potential relationships between Commitment Of Traders (COT) report data from the Commodity Futures Trading Commission and the price of the Commodity futures he is trading on his own account. His initial thinking was to look to see if there was a correlation between the long and short positioning of both Money Managers and Commodity Producers and the price of the underlying Commodity.
The CFTC is the Commodity Futures Trading Commission which is an independent agency of the US government created in 1974, that regulates the U.S. derivatives markets, which includes futures, swaps, and certain kinds of options.
The COT Report. The CFTC publishes the Commitments of Traders (COT) reports to help the public understand market dynamics. Specifically, the COT reports provide a breakdown of each Tuesday’s open interest for futures and options on futures markets in which 20 or more traders hold positions equal to or above the reporting levels established by the CFTC.
The Disaggregated reports are a futures only report and a combined futures and options report which break down the reportable open interest positions into four classifications. The two in bold are the position classes that interested my friend the most.
- 1. Producer/Merchant/Processor/User
- 2. Swap Dealers
- 3. Managed Money
- 4. Other Reportables
The majority of this project was simply data cleansing because the CFTC has a huge amount of free data available but it tends to be offered in large spreadsheets with a lot of unnecessary data. You can see in the image of the CFTC website above that for each report type they have individual spreadsheets for each year. Within each report there is weekly data for every classification for a large number of commodities across multiple exchanges. An example of this is shown later in this blog post.
Once the correct data was obtained I then worked on formatting it into a way that it could be compared with asset price and then correlation coefficients calculated to determine if there is a relationship and more importantly, if there is potential for it to be used to successfully influence trading.
Managed Money – A “money manager,” for the purpose of this report, is a registered commodity trading advisor (CTA); a registered commodity pool operator (CPO); or an unregistered fund identified by CFTC.7 These traders are engaged in managing and conducting organized futures trading on behalf of clients.
Producers/Merchants/Processors/Users – A “producer/merchant/processor/user” is an entity that predominantly engages in the production, processing, packing or handling of a physical commodity and uses the futures markets to manage or hedge risks associated with those activities.
I was asked by my friend to specifically look at the positioning data for these two market participants because it would appear they are the best representations of the “smart money” and where the biggest positions on the futures themselves would come from. Upon completion of this project, I actually went back to the CFTC website and looked through the classifications and found a 3rd class which I will talk about at the end of this blog post.
Now let’s get in to the good stuff and start looking at the data. The first step was to gather the raw data from the CFTC website and find what I wanted to analyse. The image below shows an example of a single disaggregated futures only report for the year 2020.
As you can see, there is a lot of data there. For reference, there are more than 10,780 rows of data and each row has 188 columns which makes for a total of more than 2.02 million fields of data! This is because there is weekly data (each row is a week) for multiple futures traded on various exchanges.
For the purpose of this project I started by looking at WTI crude oil. The crude oil asset is linked to multiple types of futures within the report which can be seen in the list below, however we chose to start with WTI FINANCIAL CRUDE OIL – NEW YORK MERCANTILE EXCHANGE. Much like with the market participant classifications, there is further scope to investigate other WTI futures data in future versions of this project.
The columns that contained the data I need were labelled as follows;
- Report_Date_as_MM_DD_YYYY – This is for indexing and comparing position data vs asset price data.
- Prod_Merc_Positions_Long_ALL – Total producer long positions.
- Prod_Merc_Positions_Short_ALL – Total producer short positions.
- M_Money_Positions_Long_ALL – Total money manager long positions.
- M_Money_Positions_Short_ALL – Total money manager short positions.
Uploading excel/csv data in to a Python notebook is very quick and easy. It requires only 1 line of code and works wonders for quick data analysis if you already have the data in this format. The image below shows the code for both reading excel data and exporting data to excel.
In regards to asset price data, I once again used the Pandas DataReader tool to connect to Yahoo Finance and download the price data for the timeframe I required. This can be done manually by heading to https://uk.finance.yahoo.com and searching for the stock ticker, downloading the stock price history to excel and then using the code above to upload it to your Python notebook. However, this is not needed and can be done much quicker and faster by using the code shown below.
The output to price chart isn’t necessary but I like to add this step as an extra point of validation to make sure that all the data as imported and that the data is reasonably correct. If this chart was to have lots of breaks in it or if it didn’t look remotely like the Crude Oil price chart on my trading terminal then I know that something has gone wrong.
Test 1 – Starting with Crude Oil.
Okay, so I now have all the raw data I need to start to formatting it and calculating the correlations between the various elements. The first step is to create a dataframe (table) with the exact data I want to analyse, this is essentially data cleansing. The code below shows this.
There are 2 main features to the code above that you need to be aware of. The first is I made sure to set the DATE column in the COT report data to be the index. The Index is the address for all other data in the table and when I combine the COT report data with the Crude Oil asset price data, I want to be able to look for them by date and column name.
The second is the combining of asset price data into the COT report data to make one dataframe. This is done by simply making an extra column in the COT report dataframe (COTdf) and making that equal the Close price column in the asset price data. The reason this works with one line of code is because I have made the DATE column the Index which is how the Asset price dataframe is constructed. So combining the 2 dataframes is quick and easy.
The .head and .tail function can be used to see the first 5 rows (head) and last 5 rows (tail) of a dataframe. If you type a number into the empty parentheses (brackets) then you can define the number of rows obtained counting from the top or bottom (head or tail) of your dataframe. In the example above, I generated 15 rows from the top and displayed them. This can be used for checking your calculations and combing of dataframes has worked.
And finally, I produced a correlation matrix for the data in the dataframe. This can be seen below.
A correlation matrix is a good way of displaying the correlation coefficients between multiple assets/datasets. In this example you can see that both the rows and columns contain each of the column names from the COTdf dataframe. Then if you want to see the correlation coefficient of Asset price vs Produce Long positions you can simply look along the row required.
The red highlighted cells are the lowest correlation coefficient in each column which is essentially the largest negative correlation. This is just a feature I copied across from my FX Majors correlation table which is updated weekly on this blog site in my Weekly Chart Packs.
My findings from this first test weren’t great but that is always a possible (and common) outcome when looking at new data and analysing it. You can see in the correlation matrix above that across the Asset row (crude oil asset price) there were no examples of a strong negative or positive correlations vs any of the COT report positional data. The most significant value was a correlation coefficient of negative 0.356 on crude oil asset price vs managed money long positions.
Test 2 – Moving on to Cotton futures.
Because I found very little evidence of a meaningful correlation between the COT report data and Crude Oil asset price, my friend asked me to move on to a new commodity of which he also trades. This time it was Cotton futures. Much like before, I downloaded the COT report data from the CFTC website and uploaded this to a new Python notebook using the .read_excel() function. I also used the pandas DataReader function to download price data for Cotton futures which trade under the Ticker CT=F on Yahoo Finance.
After I had collected the data, cleansed it within Python and combined the 2 dataframes in to one (all of this uses the same code as the crude oil example), I then added a new step in the code to provide a visual aid.
The chart above is a simple line chart of all elements within the combined dataframe. I decided to produce this to see if there were any obvious correlations between the various producer and managed money positions over time. I think the most obvious one is the strong positive correlation between producer short positions and managed money long positions. This is reflected in the correlation matrix below.
The results of this correlation matrix are a bit more interesting. Firstly, you will notice I have included both the long and short positions of producers which I did to widen the search for potentially tradable correlations. You will also see that there are 2 positive correlations on the Asset row of the table and both of these are showing as much stronger than in the crude oil example.
The strongest correlation is between Asset price and the long positioning of managed money which has a coefficient of positive 0.596. I would consider this to be a mid-level strength. The closer to 1 (or -1 for negative correlations) means the stronger the correlation.
Finding this reasonably strong positive correlation between asset price and managed money positioning then lead me to expanding the data across a longer time period and look into the correlation data in more detail. This is covered in the next section.
Test 3 – Expanding on Cotton futures.
The reason I have expanded the data to 5 years is to mainly confirm that the positive correlation between Asset price (corn futures) and the managed money long positions still remains over a longer period of time. It also then provides me with more data to output in to graph format and look for potential tradable opportunities. I will show you this later in this blog post.
As before, the first few sections of code are still the same but I have used 5 years worth of data for both asset price (corn futures CT=F) and COT report data. The new correlation matrix can be seen below.
By moving to 5 years worth of cotton futures price data and COT report data, I did confirm that the strong positive correlation between asset price and long positions of managed money remained. In fact, the positive correlation got even stronger going from a coefficient of positive 0.596 to positive 0.696.
There was also a surprising result in the 5 year correlation data… an even stronger correlation of positive 0.801 between asset price and short positions of producers. The 2 charts below show these correlations.
The chart above shows the positive correlation between the asset price (cotton futures) and the long positions of managed money. I think it is fairly evident that there is a positive correlation with both asset price and managed money positions increasing and decreasing in unison.
It would be wise to assume this would be the case because managed money tends to be the “smart money” and over a longer period of time (5 years), you would like to think money managers would be on the right side of the market. However, does the chart above produce any tradable opportunities? Remember, you can’t place trades on the positions of managed money (orange line) but you can place trades on the price of cotton futures (black line).
This second chart shows the surprise positive correlation that appeared when I ran the 5 year data through my code. According to the correlation matrix, this chart should show an even stronger correlation between asset price and the short positions of producers compared to the first chart. I think it does that.
You can see asset price tracks the positioning of the producers very well and this, again, does make sense. Producers often use futures contracts to hedge and remove any risk outside of their normal business of producing cotton. They will sell futures contracts to lock in a fixed price for their cotton now and avoid any potential for drastic losses (and gains) in value due to circumstance outside of their control. I have covered this in more detail in a previous Market Dynamics blog post which you can read by clicking hear.
I also believe that over a long period of time, producers would tend to know what the market is going to be doing in their specific industry. As such, they will sell (short) cotton futures when the asset price is likely to fall and reduce short exposure when the value of cotton might start to increase. This is reflected in the chart above.
A good thing to point out on both charts is that the positioning of managed money and producers doesn’t always precede/lead the asset price. Therefore, it might be difficult to use them to predetermine exactly when you should buy or sell the underlying asset (cotton futures). I do think that this does show potential though and this data does act as a good starting point to dive deeper.
The main objective of this post was to show you all a little bit about python programming and its uses in finance and trading. Hopefully if you start to look into basic python programming for your own use, you can use some of the code I have shared in the blog post to make your life easier.
Another key point to take from all of the findings in this blog post is that it does not constitute a trading strategy. You cannot simply go and find a correlated asset/indicator and open positions without looking in to risk management, costs of trading and where to exit a trade. A lot of new traders focus so much on trade entry criteria and often neglect an even more important part of successful trading… when to get out!
In regards to the future potential of this project, I personally would like to investigate the positioning of producers and managed money for cotton over an even longer period of time. This would either further support the theory of a positive correlation as seen in 2 year and 5 year data, or it could change things and make me aware that there are time duration limitations.
I have also already started work on producing an easier to understand indicator based on the long and short positioning of producers and managed money by creating a ratio. This could then be used to find levels in the ratio where, more often than not, the asset price has then started to reverse direction. This would get us one step closer to finding a tradable strategy. An example of this ratio on a chart can be seen below.
As you can see, the spikes in the long/short ratio of managed money positions does often align itself with high asset prices for cotton futures where a drop in price follows. This is keeping me interested and wanting to dive deeper into this project.
If you are interested in learning my personal trading strategies, please consider my Mastering The Markets – Retail Trading Course. Head over to my Financial Analysis Education page to check out all of my education packages and the deals available.
All my technical analysis is done using the TradingView platform. You can get access via the link below.
My preferred broker of choice is IC Markets. Low spreads and trading costs really help long term profitability. A link to their site is below.
FTMO Trader Funding Programme.
Thanks for reading and please don’t forget to LIKE, SHARE and FOLLOW my blog to stay up to date with the latest market analysis and trading education posts.
DISCLAIMER: None of the information posted on this site is to be considered investment/financial advice. Trading is high risk and you should only trade with money you can afford to lose.