Open
Description
I've managed to run the code from chapter 8 successfully and the update_q seems to be creating Q values for states.
I now wanted to run the simulation a 100 times and then predict the same (or other) prices using the learnt knowledge.
I tried adding the following method to the QDecisionPolicy
def predict(self, state):
action_q_vals = self.sess.run(self.q, feed_dict={self.x: state})
action_idx = np.argmax(action_q_vals)
action = self.actions[action_idx]
print('Action {}, Q {}, STATE :{}'.format(action, action_q_vals, state))
return action
This always prints out Action 'HOLD', Q [[0. 0. 0.]] for any state given
event though I've tested printing the same in the update Q and seeing that the state I'm putting into predict is being updated to non zero values.
How can I query the policy, or is there some other mechanism that I should be using to predict using the learnt policy?
Thanks
Metadata
Metadata
Assignees
Labels
No labels