The action space is Discrete(4), and state space is Discrete(16)
Our initial state is: 0
SFFF
FHFH
FFFH
HFFG
Taking an action: 2
The state we arrive at is 0
The reward we recieved is 0.0
Are we done? False
(Right)
SFFF
FHFH
FFFH
HFFG
Probablity Transition Function = env.P = dict(state: dict(action: (prob, next_state, reward, done)))
In state = 0
If we take action 0,
We have a probablity = 0.3333333333333333 of the next state being 0, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 0, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 4, receiving reward = 0.0, and done = False
If we take action 1,
We have a probablity = 0.3333333333333333 of the next state being 0, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 4, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 1, receiving reward = 0.0, and done = False
If we take action 2,
We have a probablity = 0.3333333333333333 of the next state being 4, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 1, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 0, receiving reward = 0.0, and done = False
If we take action 3,
We have a probablity = 0.3333333333333333 of the next state being 1, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 0, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 0, receiving reward = 0.0, and done = False
In state = 1
If we take action 0,
We have a probablity = 0.3333333333333333 of the next state being 1, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 0, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 5, receiving reward = 0.0, and done = True
If we take action 1,
We have a probablity = 0.3333333333333333 of the next state being 0, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 5, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 2, receiving reward = 0.0, and done = False
If we take action 2,
We have a probablity = 0.3333333333333333 of the next state being 5, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 2, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 1, receiving reward = 0.0, and done = False
If we take action 3,
We have a probablity = 0.3333333333333333 of the next state being 2, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 1, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 0, receiving reward = 0.0, and done = False
In state = 2
If we take action 0,
We have a probablity = 0.3333333333333333 of the next state being 2, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 1, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 6, receiving reward = 0.0, and done = False
If we take action 1,
We have a probablity = 0.3333333333333333 of the next state being 1, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 6, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 3, receiving reward = 0.0, and done = False
If we take action 2,
We have a probablity = 0.3333333333333333 of the next state being 6, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 3, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 2, receiving reward = 0.0, and done = False
If we take action 3,
We have a probablity = 0.3333333333333333 of the next state being 3, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 2, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 1, receiving reward = 0.0, and done = False
In state = 3
If we take action 0,
We have a probablity = 0.3333333333333333 of the next state being 3, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 2, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 7, receiving reward = 0.0, and done = True
If we take action 1,
We have a probablity = 0.3333333333333333 of the next state being 2, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 7, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 3, receiving reward = 0.0, and done = False
If we take action 2,
We have a probablity = 0.3333333333333333 of the next state being 7, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 3, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 3, receiving reward = 0.0, and done = False
If we take action 3,
We have a probablity = 0.3333333333333333 of the next state being 3, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 3, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 2, receiving reward = 0.0, and done = False
In state = 4
If we take action 0,
We have a probablity = 0.3333333333333333 of the next state being 0, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 4, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 8, receiving reward = 0.0, and done = False
If we take action 1,
We have a probablity = 0.3333333333333333 of the next state being 4, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 8, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 5, receiving reward = 0.0, and done = True
If we take action 2,
We have a probablity = 0.3333333333333333 of the next state being 8, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 5, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 0, receiving reward = 0.0, and done = False
If we take action 3,
We have a probablity = 0.3333333333333333 of the next state being 5, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 0, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 4, receiving reward = 0.0, and done = False
In state = 5
If we take action 0,
We have a probablity = 1.0 of the next state being 5, receiving reward = 0, and done = True
If we take action 1,
We have a probablity = 1.0 of the next state being 5, receiving reward = 0, and done = True
If we take action 2,
We have a probablity = 1.0 of the next state being 5, receiving reward = 0, and done = True
If we take action 3,
We have a probablity = 1.0 of the next state being 5, receiving reward = 0, and done = True
In state = 6
If we take action 0,
We have a probablity = 0.3333333333333333 of the next state being 2, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 5, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 10, receiving reward = 0.0, and done = False
If we take action 1,
We have a probablity = 0.3333333333333333 of the next state being 5, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 10, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 7, receiving reward = 0.0, and done = True
If we take action 2,
We have a probablity = 0.3333333333333333 of the next state being 10, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 7, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 2, receiving reward = 0.0, and done = False
If we take action 3,
We have a probablity = 0.3333333333333333 of the next state being 7, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 2, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 5, receiving reward = 0.0, and done = True
In state = 7
If we take action 0,
We have a probablity = 1.0 of the next state being 7, receiving reward = 0, and done = True
If we take action 1,
We have a probablity = 1.0 of the next state being 7, receiving reward = 0, and done = True
If we take action 2,
We have a probablity = 1.0 of the next state being 7, receiving reward = 0, and done = True
If we take action 3,
We have a probablity = 1.0 of the next state being 7, receiving reward = 0, and done = True
In state = 8
If we take action 0,
We have a probablity = 0.3333333333333333 of the next state being 4, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 8, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 12, receiving reward = 0.0, and done = True
If we take action 1,
We have a probablity = 0.3333333333333333 of the next state being 8, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 12, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 9, receiving reward = 0.0, and done = False
If we take action 2,
We have a probablity = 0.3333333333333333 of the next state being 12, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 9, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 4, receiving reward = 0.0, and done = False
If we take action 3,
We have a probablity = 0.3333333333333333 of the next state being 9, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 4, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 8, receiving reward = 0.0, and done = False
In state = 9
If we take action 0,
We have a probablity = 0.3333333333333333 of the next state being 5, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 8, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 13, receiving reward = 0.0, and done = False
If we take action 1,
We have a probablity = 0.3333333333333333 of the next state being 8, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 13, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 10, receiving reward = 0.0, and done = False
If we take action 2,
We have a probablity = 0.3333333333333333 of the next state being 13, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 10, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 5, receiving reward = 0.0, and done = True
If we take action 3,
We have a probablity = 0.3333333333333333 of the next state being 10, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 5, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 8, receiving reward = 0.0, and done = False
In state = 10
If we take action 0,
We have a probablity = 0.3333333333333333 of the next state being 6, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 9, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 14, receiving reward = 0.0, and done = False
If we take action 1,
We have a probablity = 0.3333333333333333 of the next state being 9, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 14, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 11, receiving reward = 0.0, and done = True
If we take action 2,
We have a probablity = 0.3333333333333333 of the next state being 14, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 11, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 6, receiving reward = 0.0, and done = False
If we take action 3,
We have a probablity = 0.3333333333333333 of the next state being 11, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 6, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 9, receiving reward = 0.0, and done = False
In state = 11
If we take action 0,
We have a probablity = 1.0 of the next state being 11, receiving reward = 0, and done = True
If we take action 1,
We have a probablity = 1.0 of the next state being 11, receiving reward = 0, and done = True
If we take action 2,
We have a probablity = 1.0 of the next state being 11, receiving reward = 0, and done = True
If we take action 3,
We have a probablity = 1.0 of the next state being 11, receiving reward = 0, and done = True
In state = 12
If we take action 0,
We have a probablity = 1.0 of the next state being 12, receiving reward = 0, and done = True
If we take action 1,
We have a probablity = 1.0 of the next state being 12, receiving reward = 0, and done = True
If we take action 2,
We have a probablity = 1.0 of the next state being 12, receiving reward = 0, and done = True
If we take action 3,
We have a probablity = 1.0 of the next state being 12, receiving reward = 0, and done = True
In state = 13
If we take action 0,
We have a probablity = 0.3333333333333333 of the next state being 9, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 12, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 13, receiving reward = 0.0, and done = False
If we take action 1,
We have a probablity = 0.3333333333333333 of the next state being 12, receiving reward = 0.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 13, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 14, receiving reward = 0.0, and done = False
If we take action 2,
We have a probablity = 0.3333333333333333 of the next state being 13, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 14, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 9, receiving reward = 0.0, and done = False
If we take action 3,
We have a probablity = 0.3333333333333333 of the next state being 14, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 9, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 12, receiving reward = 0.0, and done = True
In state = 14
If we take action 0,
We have a probablity = 0.3333333333333333 of the next state being 10, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 13, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 14, receiving reward = 0.0, and done = False
If we take action 1,
We have a probablity = 0.3333333333333333 of the next state being 13, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 14, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 15, receiving reward = 1.0, and done = True
If we take action 2,
We have a probablity = 0.3333333333333333 of the next state being 14, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 15, receiving reward = 1.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 10, receiving reward = 0.0, and done = False
If we take action 3,
We have a probablity = 0.3333333333333333 of the next state being 15, receiving reward = 1.0, and done = True
We have a probablity = 0.3333333333333333 of the next state being 10, receiving reward = 0.0, and done = False
We have a probablity = 0.3333333333333333 of the next state being 13, receiving reward = 0.0, and done = False
In state = 15
If we take action 0,
We have a probablity = 1.0 of the next state being 15, receiving reward = 0, and done = True
If we take action 1,
We have a probablity = 1.0 of the next state being 15, receiving reward = 0, and done = True
If we take action 2,
We have a probablity = 1.0 of the next state being 15, receiving reward = 0, and done = True
If we take action 3,
We have a probablity = 1.0 of the next state being 15, receiving reward = 0, and done = True