Q Learning Algorithm
For each s,a initialize Q(s,a) ß 0.
Observe current state s.
Do forever:
Select an action a and execute it
Receive immediate reward r
Observe new state s'
Update the table entry for Q(s,a):
Q(s,a) ß r + g maxa'[Q(s',a')]
s ß s'