CS 104: Introduction to Computer Science

Q Learning Algorithm

•For each s,a initialize Q(s,a) ß 0.

•Observe current state s.

•Do forever:

–Select an action a and execute it

–Receive immediate reward r

–Observe new state s'

–Update the table entry for Q(s,a):
Q(s,a) ß r + g maxa'[Q(s',a')]

–s ß s'