CS 371 - Introduction to Artificial Intelligence
ISLR Weekly Assignments |

Gettysburg College requires a "fourth hour" component to each course that meets less than four lecture hours per week. This course satisfies the fourth
hour component by allocating an additional three hours per week to **independent**
learning and exercises performed beyond lecture. Given that Computer
Science as a discipline and especially Artificial Intelligence as a
subdiscipline experience rapid change and introduction of new algorithmic
techniques, your ability to read and acquire new knowledge is key to your
lifelong learning and thus professional success. Consider this activity to
be a model for the type of self-instruction you should pursue in tandem with
your future career.

The basis of this semester's fourth hour requirement will be a guided, weekly
self-study of Machine Learning techniques through the excellent, freely available book
An Introduction to Statistical
Learning with Applications in R (ISLR) by
Gareth James, Daniela
Witten, Trevor Hastie and Robert
Tibshirani. Supplemented with video lectures, each week will require
approximately 3 hours per week of reading, video, text lab exercise, and Moodle
quiz assessment due each Friday. These quizzes will account for **
20% of the final grade.**

While I am glad to assist students in their understanding of the material, the nature of this fourth hour course component requires significant independent time management and work discipline for success. I recommend that students adopt the following weekly work rhythm:

- By Monday:
- Note the week's main topics.
- Read Moodle quiz questions.
- Complete readings and video viewing, so as to be able to ask questions (should you have any) during my first office hours of the week.

- By Wednesday:
- Complete text lab exercises and begin Moodle assessment exercises

- By Thursday:
- Attempt completion of the entire week's work so as to leave an extra day as a contingency plan for when the work is more difficult that expected.

- By Friday midnight:
- Complete and submit Moodle assessment exercises.

Historical note: “Machine Learning (ML)”, the oldest term for this material, is a Computer Science Artificial Intelligence subfield encompassing and making use of all relevant techniques of from statistical and AI roots, but giving greater attention to low bias, high variance techniques (e.g. artificial neural networks). “Data Mining” was a “rebranding” of ML with special attention to big data applications. “Statistical Learning”, the newest term coined by statisticians, generally gives greater attention to low variance, high bias techniques. The vast majority of work in these three similar areas concern regression, classification, and clustering problems. The ISLR text does an excellent job of surveying the landscape and helping one discern the tradeoffs and motivations for the use of these diverse techniques to address common problems.

Main Topics:

- Chapter 1:
- Basic definitions

- Chapter 2: Statistical Learning
- Statistical learning goal: estimate f
- Prediction versus model interpretability tradeoff
- Common problem classes: supervised learning (e.g. regression, classification) versus unsupervised learning ( e.g. clustering)

- If you wish to perform R exercises on your personal machine, download and install R.
- Download the PDF textbook An Introduction to Statistical Learning with Applications in R (ISLR) to your favorite device for reading.
- Import library "ISLR" within R. For my installation:
- Download all datasets the ISLR R package with all datasets for the text.

- Read ISLR chapter 1 and chapter 2 through section 2.1.2 (pp. 1-24).
- Optionally watch these supplementary videos:
- Chapter 1: Introduction (slides, playlist)
- Opening Remarks and Examples (18:18)
- Supervised and Unsupervised Learning (12:12)

- Chapter 2: Statistical Learning (slides, playlist)

- Chapter 1: Introduction (slides, playlist)
- Take the weekly Moodle quiz to assess your learning by Friday of week 1.

Main Topics:

- Chapter 2: Statistical Learning
- Prediction versus model interpretability tradeoff
- Common problem classes: supervised learning (e.g. regression, classification) versus unsupervised learning (e.g. clustering)
- Assessing model accuracy: MSE Error
- Bias-variance tradeoff
- Basic introduction to R

- Read sections 2.1.3 through the end of chapter 2 (p. 51). Do the guided lab of section 2.3. (This isn't to be submitted, but builds your ability to apply the reading in the context of R.)
- Optionally watch these supplementary videos:
- Take the weekly Moodle quiz to assess your learning by Friday of week 2.

Main Topics:

- Chapter 3:
- Simple linear regression
- Coefficient estimation
- Assessing the accuracy of coefficient estimates
- Assessing the accuracy of the model

- Simple linear regression
- Multiple linear regression
- Relationships between response and predictors
- Predictor selection
- Assessing model fit
- Prediction and confidence in prediction

- Read chapter 3 through the end of section 3.2 (p. 82). Do the guided lab of section 3.6 through 3.6.3.
- Optionally watch these supplementary videos:
- Chapter 3: Linear Regression (slides, playlist)
- Simple Linear Regression and Confidence Intervals (13:01)
- Hypothesis Testing (8:24)
- Multiple Linear Regression and Interpreting Regression Coefficients (15:38)
- Model Selection and Qualitative Predictors (14:51) (Model Selection)

- Chapter 3: Linear Regression (slides, playlist)
- Take the weekly Moodle quiz to assess your learning by Friday of week 3. You'll need these datasets: week3_1.csv, week3_2.csv

Main Topics:

- Chapter 3:
- Qualitative predictors
- Interaction terms
- Non-linear relationships and polynomial regression
- Common problems
- Application to advertising data
- Linear Regression versus K-Nearest Neighbors

- Complete reading through the end of chapter 3, completing the remainder of the guided lab (p. 119).
- Optionally watch these supplementary videos:
- Chapter 3: Linear Regression (slides, playlist)
- Model Selection and Qualitative Predictors (14:51) (Qualitative Predictors)
- Interactions and Nonlinearity (14:16)
- Lab: Linear Regression (22:10)

- Chapter 3: Linear Regression (slides, playlist)
- Take the weekly Moodle quiz to assess your learning by Friday of week 4. You'll need these datasets: iris.csv, week4_1.csv, week4_2.csv, week4_3.csv

Main Topics:

- Chapter 4: Classification
- Logistic regression and multinomial logistic regression
- Linear discriminant analysis (LDA)
- Quadratic discriminant analysis (QDA)
- K-nearest neighbor classification

- Read chapter 4 and complete the guided lab.
- Optionally watch these supplementary videos:
- Chapter 4: Classification (slides, playlist)
- Introduction to Classification (10:25)
- Logistic Regression and Maximum Likelihood (9:07)
- Multivariate Logistic Regression and Confounding (9:53)
- Case-Control Sampling and Multiclass Logistic Regression (7:28)
- Linear Discriminant Analysis and Bayes Theorem (7:12)
- Univariate Linear Discriminant Analysis (7:37)
- Multivariate Linear Discriminant Analysis and ROC Curves (17:42)
- Quadratic Discriminant Analysis and Naive Bayes (10:07)
- Lab: Logistic Regression (10:14)
- Lab: Linear Discriminant Analysis (8:22)
- Lab: K-Nearest Neighbors (5:01)

- Chapter 4: Classification (slides, playlist)
- Take the weekly Moodle quiz to assess your learning by Friday of week 5. You'll need these datasets: iris.csv, week5.csv. You'll need to prepare the iris dataset for classification according to these instructions.

Main Topics:

- Chapter 5: Validation
- Cross-Validation
- Validation set method
- Leave-one-out cross validation (LOOCV)
- k-Fold cross validation and the bias-variance trade-off

- The Bootstrap

- Cross-Validation

- Read chapter 5 and complete the guided lab.
- Optionally watch these supplementary videos:
- Chapter 5: Resampling Methods (slides, playlist)
- Estimating Prediction Error and Validation Set Approach (14:01)
- K-fold Cross-Validation (13:33)
- Cross-Validation: The Right and Wrong Ways (10:07)
- The Bootstrap (11:29)
- More on the Bootstrap (14:35)
- Lab: Cross-Validation (11:21)
- Lab: The Bootstrap (7:40)

- Chapter 5: Resampling Methods (slides, playlist)
- Take the weekly Moodle quiz to assess your learning by Friday of week 6.
This week, we'll look at the material beyond R to emphasize that you're not
limited to R for these common machine learning tasks. You'll use
Java-based Weka for polynomial regression and validation, and you use my
simple Java code for bootstrapping. For this, you will need:
- A tutorial video for polynomial regression and validation using Weka.
- week4_2.csv, auto.csv
- A tutorial video for bootstrapping with my Bootstrap.java code.
- Bootstrap.java

Main Topics:

- Chapter 7: Moving Beyond Linearity
- Polynomial regression
- Step and basis functions
- Regression and smoothing splines
- Local regressions
- Generalized Additive Models (GAMs)

- Read chapter 7 and complete the guided lab.
- Optionally watch these supplementary videos:
- Chapter 7: Moving Beyond Linearity (slides, playlist)
- Polynomial Regression and Step Functions (14:59)
- Piecewise Polynomials and Splines (13:13)
- Smoothing Splines (10:10)
- Local Regression and Generalized Additive Models (10:45)
- Lab: Polynomials (21:11)
- Lab: Splines and Generalized Additive Models (12:15)

- Chapter 7: Moving Beyond Linearity (slides, playlist)
- Take the weekly Moodle quiz to assess your learning by Friday of week 7. For this, you will need datasets week7_1.csv, week7_2.csv, week7_3.csv

Main Topics:

- Chapter 8: Tree-Based Methods
- Regression and Classification Decision Tree
- Bagging
- Random Forests
- Boosting

- Read chapter 8 and complete the guided lab.
- Optionally watch these supplementary videos:
- Chapter 8: Tree-Based Methods (slides, playlist)
- Decision Trees (14:37)
- Pruning a Decision Tree (11:45)
- Classification Trees and Comparison with Linear Models (11:00)
- Bootstrap Aggregation (Bagging) and Random Forests (13:45)
- Boosting and Variable Importance (12:03)
- Lab: Decision Trees (10:13)
- Lab: Random Forests and Boosting (15:35)

- Chapter 8: Tree-Based Methods (slides, playlist)
- Take the weekly Moodle quiz to assess your learning by Friday of week 8. For this, you will need dataset carseats.csv and the Weka application.

Main Topics:

- Chapter 9: Support Vector Machines
- Maximal margin classifier
- Support vector classifier
- Support vector machines
- 1-vs.-1 and 1-vs.-all classification with >2 classes

- Read chapter 9 and complete the guided lab.
- Optionally watch these supplementary videos:
- Take the weekly Moodle quiz to assess your learning by Friday of week 9. For this, you will need dataset rollHold.csv and the RStudio application with the e1071 library.

Main Topics:

- Chapter 10: Unsupervised Learning
- Principal Component Analysis
- K-means Clustering
- Hierarchical Clustering

- Read chapter 10 and complete the guided lab.
- Optionally watch these supplementary videos:
- Chapter 10: Unsupervised Learning (slides, playlist)
- Unsupervised Learning and Principal Components Analysis (12:37)
- Exploring Principal Components Analysis and Proportion of Variance Explained (17:39)
- K-means Clustering (17:17)
- Hierarchical Clustering (14:45)
- Breast Cancer Example of Hierarchical Clustering (9:24)
- Lab: Principal Components Analysis (6:28)
- Lab: K-means Clustering (6:31)
- Lab: Hierarchical Clustering (6:33)

- Chapter 10: Unsupervised Learning (slides, playlist)
- Take the weekly Moodle quiz to assess your learning by Friday of week 10. For this, you will need dataset iris.csv.

**Acknowledgements**: My thanks to
Trevor Hastie,
Robert Tibshirani, and
Jerome Friedman for their
excellent, freely available book
The Elements of
Statistical Learning (ESL), to
Gareth James, Daniela
Witten, Trevor Hastie and Robert
Tibshirani for their excellent, freely available book
An Introduction to Statistical
Learning with Applications in R (ISLR) that provides a gentler, more
accessible introduction to the important concepts of ESL, to
Trevor Hastie and
Robert Tibshirani for making
ISLR lecture
video available via YouTube, and to
Kevin Markham for creating
an
index to these videos and associated slides.