PipsProphet -- Predicting price action with Deep Reinforcement Learning

The forex and synthetic indices market has witnessed huge changes in prices from high, low, low-high and high-low due to fundamental and technical analysis which intends to determine the actual state of price at each point in time-frame, this has led many financial experts in the market to look beyond the market to proffer financial advice to traders across the world.

Alot will ague that the use of INDICATORS such as Fibonacci Retracement, Moving Average, Stochastic Oscillator, Relative Strength Index, Bollinger Bands and alot more which basically are built on mathematical functions are better suit in understanding price, another set of financial experts prefer to study price chart over a long period of time using the Higher TimeFrmae HTF and Lower TimeFrame LTF index which they call PRICE ACTION.

This algorithm intends to merge both the indicators and price action understanding with deep learning algorithms to understudy price and help forecast price over a short and long period of time.

Model Architecture

Our architecture involves building a feedforward backprogrational neural network acting as a Q-network. This network consists of 3 hidden layers of 20 ReLU neurons each, followed by an output layer of 3 linear neurons. Both hidden layers will be simulated inside the financial market.

The state of the hidden layers was set experimentally, while the three linear output neurons are inherent to our system design: each representing the Q-value of a give action.

Our network interacts within a simulated market environment in discrete steps t = 0,1,2,... receiving a state vector S_t as input at each of those steps. After a forward propagation, each of the three linear neurons outputs the Q-network current estimate for an action value for each of the three possible outcomes , where W_k represent the set of network weights after k updates.

The estimates are fed to a e-greedy action selection method which selects the action choice for step t as either

There is an external influence on the agent to invest position_size of the choosen asset at a time, a value set by the user, leaving it with five actions: open long position, open short position, close long position, close short position and do nothing.

The selected action At is then received by the simulated market environment. With role to provide a acurate simulation of the foreign exchange market and coordinate the flow of information that reaches the system so that it follows the reinforcement learning paradigm

Each state simulated by the environment conditions includes the following information

Type of current open position;
Value of any open position in view of simulated market current prices Bid_i and Ask_i, where i is an index over the entries in the dataset used by the market ;
Current size of trading account;
Feature vector F_i; (created using the market data entries, by the preprocessing stage inspired by the technical analysis approach)

As for the reward given to the network backpropagation, each action is rewarded as follows:

Opening a position is rewarded by the unrealized profit it creates;
keeping a position open is rewarded by the fluctuation of the position unrealized profit;
Closing a position is rewarded with the attained profit
Doing nothing receives zero reward

Intialize Environment and Agent

    ENV_NAME = 'OHLCV-v0'
    TIME_STEP = 30

    #load path
    TRAIN_PATH = "./data/train"
    TEST_PATH = "./data/test"
    env_train = OhlcvEnv(TIME_STEP, path=TRAIN_PATH)
    env_test = OhlcvEnv(TIME_STEP, path=TEST_PATH)

    np.random.seed(456)
    env.seed(562)

    # create model
    nb_actions = env.action_space.n
    model = model_create(shape=env.shape, nb_actions=nb_actions)
    print(model.summary())

    # finally, we configure and compile our agent
    memory = SequentialMemory(limit=50000, window_length=TIME_STEP)
    
    policy = EpsGreedyQPolicy()
    # enable the dueling network 
    dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=200,
                   enable_dueling_network=True, dueling_type='avg', target_model_update=1e-2, policy=policy,
                   processor=Normalizerprocessor())
    dqn.compile(Adam(lr=1e-3), metrics=['mae'])

Train and Validate

while True:
    # train
    dqn.fit(env, nb_steps=5500, nb_max_episode_steps=10000, visualize=False, verbose=2)
    try:
        # validate
        info = dqn.test(env_test, nb_episodes=1, visualize=False)
        n_long, n_short, total_reward, portfolio = info['n_trades']['long'], info['n_trades']['short'], info[
            'total_reward'], int(info['portfolio'])
        np.array([info]).dump(
            './info/duel_dqn_{0}_weights_{1}LS_{2}_{3}_{4}.info'.format(ENV_NAME, portfolio, n_long, n_short,
                                                                        total_reward))
        dqn.save_weights(
            './model/duel_dqn_{0}_weights_{1}LS_{2}_{3}_{4}.h5f'.format(ENV_NAME, portfolio, n_long, n_short,
                                                                        total_reward),
            overwrite=True)
    except KeyboardInterrupt:
        continue

Configure Neural Network

def create_model(shape, nb_actions):
    model = Sequential()
    model.add(LSTM(64, input_shape=shape, return_sequences=True))
    model.add(LSTM(64))
    model.add(Dense(32))
    model.add(Activation('adam'))
    model.add(Dense(nb_actions, activation='linear'))
    return model

Reference

(Guide - RL) Reinforcement Q-Learning from Scratch in Python with OpenAI Gym https://www.learndatasci.com/tutorials/reinforcement-q-learning-scratch-python-openai-gym/

Train a Deep Q Network with TF-Agents https://www.tensorflow.org/agents/tutorials/1_dqn_tutorial

Deep Reinforcement Learning for Automated Stock Trading https://towardsdatascience.com/deep-reinforcement-learning-for-automated-stock-trading-f1dad0126a02

Trading Environment(OpenAI Gym) + DDQN (Keras-RL) https://github.com/miroblog/deep_rl_trader

【莫烦Python】强化学习 Reinforcement Learning
https://www.bilibili.com/video/BV13W411Y75P?from=search&seid=13844167983297755236

Pull requests

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.vscode		.vscode
DQN		DQN
dataset		dataset
docs		docs
models		models
pips		pips
LICENSE		LICENSE
README.md		README.md
TraderView.py		TraderView.py
data_preprocessing.py		data_preprocessing.py
dqn_trader.py		dqn_trader.py
requirements.txt		requirements.txt
ta.py		ta.py
talib.py		talib.py
temp.py		temp.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PipsProphet -- Predicting price action with Deep Reinforcement Learning

Model Architecture

Intialize Environment and Agent

Train and Validate

Configure Neural Network

Reference

Pull requests

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jerryduncan/pipsProphet

Folders and files

Latest commit

History

Repository files navigation

PipsProphet -- Predicting price action with Deep Reinforcement Learning

Model Architecture

Intialize Environment and Agent

Train and Validate

Configure Neural Network

Reference

Pull requests

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages