CS 104: Introduction to Computer Science

How to Learn Q


•	Q(s,a) = r(s,a) + g max_a'[Q(d(s,a),a')]


•	We don't know r and d, but this forms the
	basis for an iterative update based on an

	observed action transition and reward

•	Having taken action a in state s, and finding

	oneself in state s' with immediate reward r:
	Q(s,a) ß r + g max_a'[Q(s',a')]