The forex and synthetic indices market has witnessed huge changes in prices from high, low, low-high and high-low due to fundamental and technical analysis which intends to determine the actual state of price at each point in time-frame, this has led many financial experts in the market to look beyond the market to proffer financial advice to traders across the world.
Alot will ague that the use of INDICATORS such as Fibonacci Retracement, Moving Average, Stochastic Oscillator, Relative Strength Index, Bollinger Bands and alot more which basically are built on mathematical functions are better suit in understanding price, another set of financial experts prefer to study price chart over a long period of time using the Higher TimeFrmae HTF and Lower TimeFrame LTF index which they call PRICE ACTION.
This algorithm intends to merge both the indicators and price action understanding with deep learning algorithms to understudy price and help forecast price over a short and long period of time.
Our architecture involves building a feedforward backprogrational neural network acting as a Q-network. This network consists of 3 hidden layers of 20 ReLU neurons each, followed by an output layer of 3 linear neurons. Both hidden layers will be simulated inside the financial market.
The state of the hidden layers was set experimentally, while the three linear output neurons are inherent to our system design: each representing the Q-value of a give action.
Our network interacts within a simulated market environment in discrete steps t = 0,1,2,... receiving a state vector St as input at each of those steps. After a forward propagation, each of the three linear neurons outputs the Q-network current estimate for an action value for each of the three possible outcomes , where Wk represent the set of network weights after k updates.
The estimates are fed to a e-greedy action selection method which selects the action choice for step t as either
There is an external influence on the agent to invest position_size of the choosen asset at a time, a value set by the user, leaving it with five actions: open long position, open short position, close long position, close short position and do nothing.
The selected action At is then received by the simulated market environment. With role to provide a acurate simulation of the foreign exchange market and coordinate the flow of information that reaches the system so that it follows the reinforcement learning paradigm
Each state simulated by the environment conditions includes the following information
- Type of current open position;
- Value of any open position in view of simulated market current prices Bidi and Aski, where i is an index over the entries in the dataset used by the market ;
- Current size of trading account;
- Feature vector Fi; (created using the market data entries, by the preprocessing stage inspired by the technical analysis approach)
As for the reward given to the network backpropagation, each action is rewarded as follows:
- Opening a position is rewarded by the unrealized profit it creates;
- keeping a position open is rewarded by the fluctuation of the position unrealized profit;
- Closing a position is rewarded with the attained profit
- Doing nothing receives zero reward
ENV_NAME = 'OHLCV-v0'
TIME_STEP = 30
#load path
TRAIN_PATH = "./data/train"
TEST_PATH = "./data/test"
env_train = OhlcvEnv(TIME_STEP, path=TRAIN_PATH)
env_test = OhlcvEnv(TIME_STEP, path=TEST_PATH)
np.random.seed(456)
env.seed(562)
# create model
nb_actions = env.action_space.n
model = model_create(shape=env.shape, nb_actions=nb_actions)
print(model.summary())
# finally, we configure and compile our agent
memory = SequentialMemory(limit=50000, window_length=TIME_STEP)
policy = EpsGreedyQPolicy()
# enable the dueling network
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=200,
enable_dueling_network=True, dueling_type='avg', target_model_update=1e-2, policy=policy,
processor=Normalizerprocessor())
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
while True:
# train
dqn.fit(env, nb_steps=5500, nb_max_episode_steps=10000, visualize=False, verbose=2)
try:
# validate
info = dqn.test(env_test, nb_episodes=1, visualize=False)
n_long, n_short, total_reward, portfolio = info['n_trades']['long'], info['n_trades']['short'], info[
'total_reward'], int(info['portfolio'])
np.array([info]).dump(
'./info/duel_dqn_{0}_weights_{1}LS_{2}_{3}_{4}.info'.format(ENV_NAME, portfolio, n_long, n_short,
total_reward))
dqn.save_weights(
'./model/duel_dqn_{0}_weights_{1}LS_{2}_{3}_{4}.h5f'.format(ENV_NAME, portfolio, n_long, n_short,
total_reward),
overwrite=True)
except KeyboardInterrupt:
continue
def create_model(shape, nb_actions):
model = Sequential()
model.add(LSTM(64, input_shape=shape, return_sequences=True))
model.add(LSTM(64))
model.add(Dense(32))
model.add(Activation('adam'))
model.add(Dense(nb_actions, activation='linear'))
return model
(Guide - RL) Reinforcement Q-Learning from Scratch in Python with OpenAI Gym https://www.learndatasci.com/tutorials/reinforcement-q-learning-scratch-python-openai-gym/
Train a Deep Q Network with TF-Agents https://www.tensorflow.org/agents/tutorials/1_dqn_tutorial
Deep Reinforcement Learning for Automated Stock Trading https://towardsdatascience.com/deep-reinforcement-learning-for-automated-stock-trading-f1dad0126a02
Trading Environment(OpenAI Gym) + DDQN (Keras-RL) https://github.com/miroblog/deep_rl_trader
【莫烦Python】强化学习 Reinforcement Learning
https://www.bilibili.com/video/BV13W411Y75P?from=search&seid=13844167983297755236
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.