© 2000 Todd Neller.  A.I.M.A. text figures © 1995 Prentice Hall.  Used by permission.
Q Learning Algorithm
•For each s,a initialize Q(s,a) ß 0.
•Observe current state s.
•Do forever:
–Select an action a and execute it
–Receive immediate reward r
–Observe new state s'
–Update the table entry for Q(s,a):
Q(s,a) ß r + g maxa'[Q(s',a')]
–s ß s'