CS 371 Introduction to Artificial Intelligence Homework #10

Due: Thursday, 11/16

Do "1. Learning Blackjack Strategy..." and "2. Validating Your Results".  One of the variants that follow may be done for an additional 20% extra credit.

1. Learning Blackjack Strategy with Reinforcement Learning:  Below are simplified rules for Blackjack adapted from "Play According to Hoyle - Hoyle's rules of games", 2nd rev. ed., A.H. Morehead and G. Mott-Smith eds.:

2. 1 player with computer as dealer.

CARDS. A regular pack of 52. For simplicity, assume that all cards are equally probable (as if infinite packs of cards were shuffled to form the deck).  This makes "card counting" irrelevant and thus obviates the need to model the cards remaining in the deck.  A simple random number generator generating 13 values will do nicely for our purposes.

THE DEAL. The player places a bet before the deal. Both player and dealer each receive one card face down and then each receives one card face up.  For simplicity, assume the player always bets one dollar.

THE PLAY. The values of the cards are: Ace, 1 or 11, as the holder wishes; king, queen, jack, ten, 10 each; any other card, its number. The object is to hold two or more cards that total 21 or as nearly 21 as possible without going over 21. For example, six, four, and ace count 21; seven, four, and ace count 12, for to call the ace 11 would put the player over 21. An ace and face card or ten in the first two cards are called a natural, or blackjack, and win the bet at once.
After the initial deal, the player may either stand (on his first two cards, or at any later time), or may be dealt an additional card by saying “Hit me.” He may continue to draw additional cards, but once he says “I stand” he may draw no more cards. All additional cards are dealt face up. If an additional card puts the player’s count over 21, he must show his cards and the dealer collects the bet. The player’s cards are then placed face up on the bottom of the pack.
When the player has either stood or gone over 21, the dealer turns up his facedown card. As in gambling-house games, the dealer must take additional cards as long as his total is 16 or less and must stand when his total reaches 17 or more.  If dealer goes over 21, he pays the player. If he stands on 21 or less, he collects from the player having a lower count, pays the player having a higher count, and has a stand-off with each player having the same count.

SETTLEMENT. A player may bet only against the dealer. All bets are settled for even money.  Thus, in this simplified scenario, the player bets a dollar and either (1) beats the dealer and keeps the dollar bet and wins a dollar from the dealer, (2) ties the dealer and replays, or (3) loses the dollar bet to the dealer.

Your task: Learn optimal play for this simplified form of Blackjack using reinforcement learning.  This involves several subtasks:

• Implement reinforcement learning for arbitrary nondeterministic MDPs.  Unlike other homework assignments, the implementation design is entirely up to you.  This means that you need to take extra care to document your programming and provide means to test execution.
• Implement the simplified blackjack game described above.
• Represent the game of Blackjack as a nondeterministic MDP as compactly as you can.  This requires some thought.  If you try to create a nondeterministic MDP based on all possible hands, you will have a much larger network to train than if you simply track relevant features (total of card values (assuming ace = 1), whether or not ace is present (total can be +10 as well), dealer card).
• Apply reinforcement learning and report the strategy learned.  What did you use for a discounting factor?

•
3. Validating Your Results:  Below is some strategy for blackjack:

4. Strategy of Blackjack
In a game in which dealer must hit 16 and stand on 17, the player’s strategy should usually be: Always stand on 17 or higher. Stand on any number from 13 through 16 if dealer’s showing is 6 or lower, but hit if dealer’s card is 7 through 10, or an ace. Hit 12 or under. Count an ace as 1 for any number up to 17 (that is, hit four-two-ace, counting it as 7).

• Does your strategy above match the strategy described here?
• Can reinforcement learning be used to compute the expected payout from this simplified game of blackjack?  In other words, can you learn the expected gain/loss from playing blackjack?  If so, compute the expected payout.  If not, explain why it is not possible.

•
5. Variant 1 - Varying Settlement:

6. Now consider Blackjack with the following additional rules for settlement (payout):

For a natural, 1½ times the amount of the bet.
For 21 or less in five cards, double; in six cards, triple.
For 21 composed of three sevens, triple; composed of 8-7-6, double.

• Make appropriate changes to your nondeterministic MDP and game to handle these settlements.  How much larger must your nondeterministic MDP representation be?
• Re-learn Blackjack play policy.
• Does this change strategy of play from your previous learning?  If so, describe how.

•
7. Variant 2 - Splitting:

8. Now consider Blackjack with the following additional player option:

If a player’s first two cards are a pair, such as two sixes or two jacks, he may play them as two different hands. If the player chooses to "split", the player turns both face up and places the amount of his original bet on each. Dealer gives him one card down to each. Then the player may hit, or stand on, each hand under the rules given above.

Essentially, the player dealt a pair can choose to play two games, each with a card of the pair as the face up card.

• Make appropriate changes to your nondeterministic MDP and game to handle the new action "split".  What changes did you make?
• Re-learn Blackjack play policy.
• For which pairs does your policy choose to split?

•
9. Variant 3 - Doubling Down:

10. Now consider Blackjack with the following additional player option:

A player may turn up both his cards, double his bet, and “take one down for double.” In such a case he may draw only the one card.

• Make appropriate changes to your nondeterministic MDP and game to handle the new action "double-down".  What changes did you make?
• Re-learn Blackjack play policy.
• For which pairs does your policy choose to double-down?

(c) 2000 Todd Neller