CS 371 - Introduction to Artificial Intelligence
ISLR Weekly Assignments


Due: Fridays before midnight
Note: Work is to be done in pairs.

Index

Introduction

Gettysburg College requires a "fourth hour" component to each course that meets less than four lecture hours per week. This course satisfies the fourth hour component by allocating an additional three hours per week to independent learning and exercises performed beyond lecture.  Given that Computer Science as a discipline and especially Artificial Intelligence as a subdiscipline experience rapid change and introduction of new algorithmic techniques, your ability to read and acquire new knowledge is key to your lifelong learning and thus professional success.  Consider this activity to be a model for the type of self-instruction you should pursue in tandem with your future career.

The basis of this semester's fourth hour requirement will be a guided, weekly self-study of Machine Learning techniques through the excellent, freely available book An Introduction to Statistical Learning with Applications in R (ISLR) by Gareth JamesDaniela WittenTrevor Hastie and Robert Tibshirani.  Supplemented with video lectures, each week will require approximately 3 hours per week of reading, video, text lab exercise, and Moodle quiz assessment due each Friday.  These quizzes will account for 20% of the final grade.

While I am glad to assist students in their understanding of the material, the nature of this fourth hour course component requires significant independent time management and work discipline for success.  I recommend that students adopt the following weekly work rhythm:

Historical note: “Machine Learning (ML)”, the oldest term for this material, is a Computer Science Artificial Intelligence subfield encompassing and making use of all relevant techniques of from statistical and AI roots, but giving greater attention to low bias, high variance techniques (e.g. artificial neural networks).  “Data Mining” was a “rebranding” of ML with special attention to big data applications.  “Statistical Learning”, the newest term coined by statisticians, generally gives greater attention to low variance, high bias techniques.  The vast majority of work in these three similar areas concern regression, classification, and clustering problems.  The ISLR text does an excellent job of surveying the landscape and helping one discern the tradeoffs and motivations for the use of these diverse techniques to address common problems.

Week 1

Main Topics:

  1. If you wish to perform R exercises on your personal machine, download and install R.
  2. Download the PDF textbook An Introduction to Statistical Learning with Applications in R (ISLR) to your favorite device for reading.
  3. Import library "ISLR" within R.  For my installation:
  4. Read ISLR chapter 1 and chapter 2 through section 2.1.2 (pp. 1-24).
  5. Optionally watch these supplementary videos:
  6. Take the weekly Moodle quiz to assess your learning by Friday of week 1.

Week 2

Main Topics:

  1. Read sections 2.1.3 through the end of chapter 2 (p. 51).  Do the guided lab of section 2.3.  (This isn't to be submitted, but builds your ability to apply the reading in the context of R.)
  2. Optionally watch these supplementary videos:
  3. Take the weekly Moodle quiz to assess your learning by Friday of week 2.

Week 3

Main Topics:

  1. Read chapter 3 through the end of section 3.2 (p. 82).  Do the guided lab of section 3.6 through 3.6.3.
  2. Optionally watch these supplementary videos:
  3. Take the weekly Moodle quiz to assess your learning by Friday of week 3.  You'll need these datasets: week3_1.csv, week3_2.csv

Week 4

Main Topics:

  1. Complete reading through the end of chapter 3, completing the remainder of the guided lab (p. 119).
  2. Optionally watch these supplementary videos:
  3. Take the weekly Moodle quiz to assess your learning by Friday of week 4.  You'll need these datasets: iris.csv, week4_1.csv, week4_2.csv, week4_3.csv

Week 5

Main Topics:

  1. Read chapter 4 and complete the guided lab.
  2. Optionally watch these supplementary videos:
  3. Take the weekly Moodle quiz to assess your learning by Friday of week 5.  You'll need these datasets: iris.csv, week5.csv.  You'll need to prepare the iris dataset for classification according to these instructions.

Week 6

Main Topics:

  1. Read chapter 5 and complete the guided lab.
  2. Optionally watch these supplementary videos:
  3. Take the weekly Moodle quiz to assess your learning by Friday of week 6.  This week, we'll look at the material beyond R to emphasize that you're not limited to R for these common machine learning tasks.  You'll use Java-based Weka for polynomial regression and validation, and you use my simple Java code for bootstrapping.  For this, you will need:

Week 7

Main Topics:

  1. Read chapter 7 and complete the guided lab.
  2. Optionally watch these supplementary videos:
  3. Take the weekly Moodle quiz to assess your learning by Friday of week 7.  For this, you will need datasets week7_1.csv, week7_2.csv, week7_3.csv

Week 8

Main Topics:

  1. Read chapter 8 and complete the guided lab.
  2. Optionally watch these supplementary videos:
  3. Take the weekly Moodle quiz to assess your learning by Friday of week 8.  For this, you will need dataset carseats.csv and the Weka application.  NOTE: Different installations on different platforms can provide different results, so I highly recommend doing these Weka exercises on our lab machines (command "weka").  

Week 9

Main Topics:

  1. Read chapter 9 and complete the guided lab.
  2. Optionally watch these supplementary videos:
  3. Take the weekly Moodle quiz to assess your learning by Friday of week 9. For this, you will need dataset rollHold.csv and the RStudio application with the e1071 library.

Week 10

Main Topics:

  1. Read chapter 10 and complete the guided lab.
  2. Optionally watch these supplementary videos:
  3. Take the weekly Moodle quiz to assess your learning by Friday of week 10.  For this, you will need dataset iris.csv.

Acknowledgements: My thanks to Trevor Hastie, Robert Tibshirani, and Jerome Friedman for their excellent, freely available book The Elements of Statistical Learning (ESL), to Gareth JamesDaniela WittenTrevor Hastie and Robert Tibshirani for their excellent, freely available book An Introduction to Statistical Learning with Applications in R (ISLR) that provides a gentler, more accessible introduction to the important concepts of ESL, to Trevor Hastie and Robert Tibshirani for making ISLR lecture video available via YouTube, and to Kevin Markham for creating an index to these videos and associated slides.

Todd Neller