Ch 8 predicting?

I've managed to run the code from chapter 8 successfully and the update_q seems to be creating Q values for states.

I now wanted to run the simulation a 100 times and then predict the same (or other) prices using the learnt knowledge.

I tried adding the following method to the QDecisionPolicy

```
    def predict(self, state):                
        action_q_vals = self.sess.run(self.q, feed_dict={self.x: state})        
        action_idx = np.argmax(action_q_vals)
        action = self.actions[action_idx]
        print('Action {}, Q {}, STATE :{}'.format(action, action_q_vals, state))
        return action
```

This always prints out Action 'HOLD', Q [[0. 0. 0.]] for any state given
event though I've tested printing the same in the update Q and seeing that the state I'm putting into predict is being updated to non zero values.

How can I query the policy, or is there some other mechanism that I should be using to predict using the learnt policy?

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ch 8 predicting? #35

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Ch 8 predicting? #35

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions