{"id":1019,"date":"2024-05-17T13:26:12","date_gmt":"2024-05-17T13:26:12","guid":{"rendered":"https:\/\/blogs.mathworks.com\/finance\/?p=1019"},"modified":"2024-05-30T15:15:57","modified_gmt":"2024-05-30T15:15:57","slug":"deep-learning-in-quantitative-finance-multiagent-reinforcement-learning-for-financial-trading","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/finance\/2024\/05\/17\/deep-learning-in-quantitative-finance-multiagent-reinforcement-learning-for-financial-trading\/","title":{"rendered":"Deep Learning in Quantitative Finance: Multiagent Reinforcement Learning for Financial Trading"},"content":{"rendered":"<p><em><span class=\"TextRun SCXW194797580 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW194797580 BCX0\"><img decoding=\"async\" loading=\"lazy\" width=\"150\" height=\"150\" class=\"size-thumbnail wp-image-1043 alignnone\" src=\"http:\/\/blogs.mathworks.com\/finance\/files\/2024\/05\/1707752446335-150x150.jpg\" alt=\"\" \/><\/span><\/span><\/em><\/p>\n<p><em><span class=\"TextRun SCXW194797580 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW194797580 BCX0\">The following blog was written by\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/adam-peters-580838197\/\">Adam Peters<\/a>, Software Engineer<\/span><\/span> at Mathworks.<\/em><\/p>\n<p><strong>Download<\/strong> the code for this example from <a href=\"https:\/\/github.com\/matlab-deep-learning\/reinforcement_learning_financial_trading\">Github here<\/a><\/p>\n<p><strong>Overview: <\/strong><\/p>\n<p>Financial trading optimization involves developing a strategy that maximizes expected returns among a set of investments. For instance, based off key market indicators, a learned strategy may decide to reallocate, hold, or sell stocks on a day-to-day basis. Reinforcement learning is a common approach to learning an optimal strategy for any problem. In this machine learning paradigm, an agent makes decisions and receives penalties\/rewards, learning to optimize its actions with trial-and-error. Given the inherently competitive nature of the financial market, multiagent reinforcement learning, whereby multiple agents compete in a shared environment, offers a natural and exciting way to develop trading strategies.<\/p>\n<p>New multiagent functionality has been added to the Reinforcement Learning Toolbox in MATLAB R2023b, allowing you to create agents that can compete, cooperate, take-turns, act simultaneously, share learnings, and more. Using the <a href=\"https:\/\/www.mathworks.com\/help\/reinforcement-learning\/ref\/rl.env.rlmultiagentfunctionenv.html\">rlMultiAgentFunctionEnv<\/a> function, one can easily create multiagent environments that seamlessly integrate with the Reinforcement Learning Episode Manager to view your training progress.<\/p>\n<p>In this blog post, we describe a working demo that uses multiagent reinforcement learning to create optimal trading strategies for three simulated stocks, and demonstrate that competing agents outperform noncompeting agents.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Agents: <\/strong><\/p>\n<p>We must first introduce some terminology and define the agents which will learn to trade stocks. In reinforcement learning, agents can be thought of as functions that take in some state, called an <em>observation<\/em>, and then output an <em>action<\/em>. Depending on how this action interacts with the <em>environment<\/em>, they will receive a <em>reward<\/em> for that action, and update their strategy depending on the strength of the reward. This strategy is called a <em>policy, <\/em>and is a function that maps observations to actions. This loop, which can be seen in figure 1, repeats to iteratively update the policy and maximize expected rewards.<\/p>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"509\" height=\"419\" class=\"size-full wp-image-1022 aligncenter\" src=\"http:\/\/blogs.mathworks.com\/finance\/files\/2024\/05\/Figure-1.png\" alt=\"\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>In this example, we will create two <a href=\"https:\/\/www.mathworks.com\/help\/reinforcement-learning\/ug\/ppo-agents.html\">Proximal Policy Optimization (PPO) agents<\/a>\u00a0that attempt to outcompete each other, and one PPO agent that acts independently, which will serve as a control. These agents use the PPO neural network as a policy, which was developed by openAI, and is useful in reinforcement learning due to the rate in which they learn\/update. These agents could be defined using the\u00a0<a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/gs\/get-started-with-deep-network-designer.html\">Deep Network Designer<\/a>, although this example defines the networks programmatically. The network architecture for each agent follows:<\/p>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"956\" height=\"1006\" class=\"size-full wp-image-1025 aligncenter\" src=\"http:\/\/blogs.mathworks.com\/finance\/files\/2024\/05\/Figure-2.png\" alt=\"\" \/> <img decoding=\"async\" loading=\"lazy\" width=\"956\" height=\"1006\" class=\"size-full wp-image-1028 aligncenter\" src=\"http:\/\/blogs.mathworks.com\/finance\/files\/2024\/05\/Figure-3.png\" alt=\"\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><strong>Environment: <\/strong><\/p>\n<p>With the agents defined, we must now define the environment, which consists of an observation space, action space, and reward function.<\/p>\n<p>&nbsp;<\/p>\n<p><u>Observation Space: <\/u><\/p>\n<p>Our environment consists of 3 stocks simulated via geometric Brownian motion and $20,000 cash. At each time step, each agent sees 19 different values:<\/p>\n<ul>\n<li>Stocks Owned (3)<\/li>\n<li>Price Different when Bought (3)<\/li>\n<li>Cash In Hand (1)<\/li>\n<li>Price change from yesterday (3)<\/li>\n<li>% Price change from 2 days ago (3)<\/li>\n<li>% Price change from 7 days ago (3)<\/li>\n<li>% Price change from average price of 7 days ago (3)<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><u>Action Space:<\/u><\/p>\n<p>After receiving a new observation, the agent must make one of 27 possible actions: every combination of entirely buying, entirely selling, and entirely holding for all 3 stocks. For example, (buy, buy, buy), (buy, sell, hold), or (sell, sell, buy).<\/p>\n<p>&nbsp;<\/p>\n<p><u>Reward:<\/u><\/p>\n<p>There are two different reward functions at play in this example.<\/p>\n<p>Shared reward for all agents: give the agent +1 as a reward if they made a profit or if they sold stocks while its indicators were suggesting a negative trajectory. Give the agent -1 otherwise. This reward prioritizes making any profit, and can be thought of as \u201cget a reward if you make a profit and get a reward if you avoid losing money\u201d.<\/p>\n<p>Competitive reward for agents 1 and 2: give the agent +0.5 as a reward if it has made more of a profit than its competitor, give the agent -0.5 otherwise.<\/p>\n<p>Given that agent 3 does not contain the competitive reward, a greater performance in agents 1 and 2 would indicate that the competitive strategy aids in learning.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Training:<\/strong><\/p>\n<p>Now that we have done the heavy lifting and defined our agents and environment, all we need to do is call the <a href=\"https:\/\/www.mathworks.com\/help\/reinforcement-learning\/ref\/rl.agent.rlqagent.train.html\">train<\/a> function, specifying the length of each episode to be 2,597 steps, and there to be 2,500 episodes. The results from the agent as shown in the Episode Manager are shown below. As you can see, agents 1 and 2 fight against each other each episode, and this back and forth can be seen in the more jagged nature of their training curves.<\/p>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"1181\" height=\"703\" class=\"size-full wp-image-1031 aligncenter\" src=\"http:\/\/blogs.mathworks.com\/finance\/files\/2024\/05\/Figure-4.png\" alt=\"\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><strong>Conclusion:<\/strong><\/p>\n<p>As shown in the results below, agents 1 and 2 outperform agent 3 &#8212; their competitive nature aids in learning. Further, agents 1 and 2 over 1.5x their initial $20,000 on the test dataset. These results demonstrate the effectiveness of utilizing competitive agents in financial trading. While the competing agents do not always outperform their solitary counterpart (Agent 3), this work proves the utility and possibility of using this multiagent paradigm!<\/p>\n<p>In the future, this idea could be expanded to many other areas in finance &#8212; multiagent reinforcement learning is currently an active area of research, as its competitive nature lends itself well to the inherently competitive world of finance.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"844\" height=\"656\" class=\"size-full wp-image-1034 aligncenter\" src=\"http:\/\/blogs.mathworks.com\/finance\/files\/2024\/05\/Figure-5.png\" alt=\"\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/finance\/files\/2024\/05\/Figure-5.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div>\n<p>\nThe following blog was written by\u00a0Adam Peters, Software Engineer at Mathworks.<br \/>\nDownload the code for this example from Github here<br \/>\nOverview:<br \/>\nFinancial trading optimization involves developing a&#8230; <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/finance\/2024\/05\/17\/deep-learning-in-quantitative-finance-multiagent-reinforcement-learning-for-financial-trading\/\">read more >><\/a><\/p>\n","protected":false},"author":201,"featured_media":1034,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[10,22,4,43],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/finance\/wp-json\/wp\/v2\/posts\/1019"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/finance\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/finance\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/finance\/wp-json\/wp\/v2\/users\/201"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/finance\/wp-json\/wp\/v2\/comments?post=1019"}],"version-history":[{"count":7,"href":"https:\/\/blogs.mathworks.com\/finance\/wp-json\/wp\/v2\/posts\/1019\/revisions"}],"predecessor-version":[{"id":1058,"href":"https:\/\/blogs.mathworks.com\/finance\/wp-json\/wp\/v2\/posts\/1019\/revisions\/1058"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/finance\/wp-json\/wp\/v2\/media\/1034"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/finance\/wp-json\/wp\/v2\/media?parent=1019"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/finance\/wp-json\/wp\/v2\/categories?post=1019"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/finance\/wp-json\/wp\/v2\/tags?post=1019"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}