CS 111 - Introduction to Computer Science
Homework #8 |
Due at the beginning of class 24.
1. Pig Game: In file PigGame.java, implement this specification. Note: You'll want to copy your PigComputerGame.java code as a starting point.
2. Statistics: In file Statistics.java, implement this specification. In addition to public static methods implementing simple statistics such as
you will also implement a bootstrapping technique for computing confidence intervals on the mean of given data. First, we'll describe the process, then we'll outline the algorithm parts.
Imagine that you have some data, such as win or loss values from 2-player gameplay. Let a first-player win or loss be counted as value 1 or 0, respectively. Suppose our initial data array consists of 7 wins and 3 losses. Do we have 90% confidence that the first player has some advantage over the second player? Put another way, would we expect the mean win value to be above .5 with 90% confidence if we collected more data? The "bootstrap" is a versatile statistical approach that can help us with confidence intervals (and more) when the data doesn't follow common assumptions (e.g. having a normal distribution).
In this example, suppose our data array is 7 ones and 3 zeros:
[1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0]
The main assumption of bootstrapping is that this is a representative sample. The first step of bootstrapping is that we resample our original data many times. With each resample, we construct a new array with randomly selected values from our original data array. We select each element of our original array with equal probability for each position of the resample array, so the resample will usually contain duplicates and omit values in the original array. For our purposes, we'll resample 1,000,000 times:
[0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0]
[1.0, 1.0, 1.0, 1.0,
1.0, 1.0, 1.0, 0.0, 1.0, 1.0]
[0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0,
1.0]
... (999,996 resamples omitted) ...
[1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0]
If we're interested in a statistic, we compute it on each of the resamples. In this case, we want to know if the mean value is above .5 with 90% confidence, so we compute the mean of each resample:
[0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0] → 0.6
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0] → 0.9
[0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0] → 0.8
... (999,996 resamples omitted) ...
[1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0] → 0.8
Next, we put all of these resample means into an array and sort them:
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.1, 0.1, 0.1, ... (999,980 resample means omitted) ... , 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
A 90% confidence interval for the mean statistic can then be computed as the top and bottom values of the middle 90% of values in this array of resample means. Put another way, we look at the values at 5th and 95th percentiles (i.e. 5% and 95% of the way through the array). In this case, the resample mean values at the 5th and 95th percentiles are 0.5 and 0.9, respectively. Because the lower end of this interval (the 5th percentile) is 0.5, indicating an even chance of winning or losing, we cannot conclude with 90% confidence that a player with 7 out of 10 wins has an advantage.
However, the 15% and 85% percentile values are 0.6 and 0.8, respectively, so we could say that we have 70% confidence that the first player has an advantage.
To program the bootstrap technique for this assignment, you are given an implementation of percentile lookup in a sorted data array (public static double getPercentile(double[] data, double percentile)), but the rest is up to you. None of the relevant methods have method bodies that require more than 5 lines of code if expressed compactly.
Rubric: (20 points total)