Skip to content

Latest commit

 

History

History
44 lines (33 loc) · 3.71 KB

README.md

File metadata and controls

44 lines (33 loc) · 3.71 KB

AMS-520-Project

Combining (deep) reinforcement learning and goals based investing in QWIM - FinRL

Reinforcement Learning algorithms

Reinforcement learning is a machine learning training strategy that rewards desirable actions while penalizing undesirable ones. A reinforcement learning agent can sense and comprehend its surroundings, act, and learn via trial and error in general.

The RL algorithm in FinRL is based on openai/baselines. Those are

DDPG

Deep Deterministic Policy Gradient (DDPG) is an algorithm which concurrently learns a Q-function and a policy. It uses off-policy data and the Bellman equation to learn the Q-function, and uses the Q-function to learn the policy. The DDPG contains two major network: Actor-Critic Network. And it's a Q function $\theta^Q$, a deterministic policy function $\theta^\mu$, a target Q function $\theta^{Q^\prime}$, and a target policy function $\theta^^{\mu^\prime}$.

The Q network and policy network are similar to basic Advantage Actor-Critic, except that in DDPG, the Actor directly maps states to actions (the network's output is directly the output) rather than outputting the probability distribution across a discrete action space. The learnt networks are slowly tracked by the target networks, which are time-delayed duplicates of their original networks. The use of these target value networks considerably improves learning stability. This is why: The update equations of the network are independent on the values produced by the network itself in approaches that do not employ target networks, making it prone to divergence.

Features engineering

Technical indicators are heuristic or pattern-based indications generated by a security's or contract's price, volume, and/or open interest, which are employed by traders who apply technical analysis. Technical analysts utilize indicators to forecast future price changes by evaluating previous data.

The most of them are momentum indicator. A momentum indicator (oscillator) is a technical indicator that compares current and previous values to illustrate trend direction and assess the rate of price variation. It is one of the main indicators for determining the pace at which securities move. In most cases, a momentum indicator is utilized in conjunction with other indicators.

The technical indicators that is used in this project are MACD, BOLL, RSI, CCI, ADX.

  • MACD = EMA(TP,13) − EMA(TP,26)
  • BOLLU = MA(TP,20)+2∗stdDev(TP,20)
  • BOLLD = MA(TP,20)−2∗stdDev(TP,20)
  • RSI = 100-(100/(1+ave. gain/ave.loss))
  • CCI = (TP - SMA(TP,20)) / (.015 x Mean Deviation(TP,20))
  • ADX = |+DI- -DI|/|+DI + -DI| *100

Data Description

The data souce is from yfinance, which is based on data from yahoo finance. Asset universe is the Dow Jones 30 stocks data. Training set ranges from 2017-01-01 to 2021-01-01. Trading set ranges from 2021-01-04 to 2021-11-02. The algorithm stops updated after the training period.

Environment parameter

Under "StockPortfolioEnv", the hmax defines maximum number of shares to trade. "initial_amount" defines start money in training. "transaction_cost_pct" defines transaction cost percentage per trade. ["hmax","initial_amount","transaction_cost_pct"]

RL parameter

Under "MODEL_PARAMS" in each branch, each RL parameter can be set differently. For example in DDPG, parameters are: ["batch_size","buffer_size","learning_rate","total_timesteps"]

Backtesting

We use the 60 days as lookback range for 4 years. We use Quantstats for the performance report, which compares to the minVar portfolio of the asset universe.