CS 104: Introduction to Computer Science

•

For each s,a initialize Q(s,a) ß 0.

•

Observe current state s.

•

Do forever:

–

Select an action a and execute it

–

Receive immediate reward r

–

Observe new state s'

–

Update the table entry for Q(s,a):

Q(s,a) ß r + g max_a'[Q(s',a')]

–

s ß s'

3(G)