CS 104: Introduction to Computer Science

Grid World Q Learning Example

•For each s,a initialize Q(s,a) ß 0.

•Observe current state s.

•Do forever:

–Select an action a and execute it

–Receive immediate reward r

–Observe new state s'

–Update the table entry for Q(s,a):
Q(s,a) ß r + g maxa'[Q(s',a')]

–s ß s'

3(G)

100

immediate reward values
r(s,a)