|
CS 371 - Introduction to Artificial Intelligence
ISLR Weekly Assignments |
Due: Fridays before midnight
Note: Work is to be done
in pairs.
Index
Gettysburg College requires a "fourth hour" component to each course that meets less than four lecture hours per week. This course satisfies the fourth
hour component by allocating an additional three hours per week to independent
learning and exercises performed beyond lecture. Given that Computer
Science as a discipline and especially Artificial Intelligence as a
subdiscipline experience rapid change and introduction of new algorithmic
techniques, your ability to read and acquire new knowledge is key to your
lifelong learning and thus professional success. Consider this activity to
be a model for the type of self-instruction you should pursue in tandem with
your future career.
The basis of this semester's fourth hour requirement will be a guided, weekly
self-study of Machine Learning techniques through the excellent, freely available book
An Introduction to Statistical
Learning with Applications in R (ISLR) by
Gareth James, Daniela
Witten, Trevor Hastie and Robert
Tibshirani. Supplemented with video lectures, each week will require
approximately 3 hours per week of reading, video, text lab exercise, and Moodle
quiz assessment due each Friday. These quizzes will account for
20% of the final grade.
While I am glad to assist students in their understanding of the material,
the nature of this fourth hour course component requires significant
independent time
management and work discipline for success. I recommend that students
adopt the following weekly work rhythm:
- By Monday:
- Note the week's main topics.
- Read Moodle quiz questions.
- Complete readings and video viewing, so as to be able to ask
questions (should you have any) during my first office hours of the
week.
- By Wednesday:
- Complete text lab exercises and begin Moodle assessment exercises
- By Thursday:
- Attempt completion of the entire week's work so as to leave
an extra day as a contingency plan for when the work is more difficult that
expected.
- By Friday midnight:
- Complete and submit Moodle assessment exercises.
Historical note: “Machine Learning (ML)”, the
oldest term for this material, is a Computer Science Artificial Intelligence
subfield encompassing and making use of all relevant techniques of from
statistical and AI roots, but giving greater attention to low bias, high
variance techniques (e.g. artificial neural networks). “Data Mining” was a
“rebranding” of ML with special attention to big data applications.
“Statistical Learning”, the newest term coined by statisticians, generally gives
greater attention to low variance, high bias techniques. The vast majority of
work in these three similar areas concern regression, classification, and
clustering problems. The ISLR text does an excellent job of surveying the
landscape and helping one discern the tradeoffs and motivations for the use of
these diverse techniques to address common problems.
Main Topics:
- Chapter 1:
- Chapter 2: Statistical Learning
- Statistical learning goal: estimate f
- Prediction versus model interpretability tradeoff
- Common problem classes: supervised learning (e.g. regression,
classification) versus unsupervised learning ( e.g. clustering)
- If you wish to perform R exercises on your personal machine, download and install R.
- Download the PDF textbook
An Introduction to Statistical
Learning with Applications in R (ISLR) to your favorite device for reading.
- Import library "ISLR" within R. For my installation:
- Download all datasets the
ISLR R
package with all datasets for the text.
- Read ISLR chapter 1 and chapter 2 through section 2.1.2 (pp. 1-24).
- Optionally watch these supplementary videos:
- Take the weekly Moodle quiz to assess your learning by Friday of week 1.
Main Topics:
- Chapter 2: Statistical Learning
- Prediction versus model interpretability tradeoff
- Common problem classes: supervised learning (e.g. regression,
classification) versus unsupervised learning (e.g. clustering)
- Assessing model accuracy: MSE Error
- Bias-variance tradeoff
- Basic introduction to R
- Read sections 2.1.3 through the end of chapter 2 (p. 51). Do the
guided lab of section 2.3. (This isn't to be submitted, but builds
your ability to apply the reading in the context of R.)
- Optionally watch these supplementary videos:
- Take the weekly Moodle quiz to assess your learning by Friday of week 2.
Main Topics:
- Chapter 3:
- Simple linear regression
- Coefficient estimation
- Assessing the accuracy of coefficient estimates
- Assessing the accuracy of the model
- Multiple linear regression
- Relationships between response and predictors
- Predictor selection
- Assessing model fit
- Prediction and confidence in prediction
- Read chapter 3 through the end of section 3.2 (p. 82). Do the
guided lab of section 3.6 through 3.6.3.
- Optionally watch these supplementary videos:
- Take the weekly Moodle quiz to assess your learning by Friday of week 3.
You'll need these datasets: week3_1.csv,
week3_2.csv
Main Topics:
- Chapter 3:
- Qualitative predictors
- Interaction terms
- Non-linear relationships and polynomial regression
- Common problems
- Application to advertising data
- Linear Regression versus K-Nearest Neighbors
- Complete reading through the end of chapter 3, completing the remainder
of the guided lab (p. 119).
- Optionally watch these supplementary videos:
- Take the weekly Moodle quiz to assess your learning by Friday of week 4.
You'll need these datasets: iris.csv,
week4_1.csv,
week4_2.csv, week4_3.csv
Main Topics:
- Chapter 4: Classification
- Logistic regression and multinomial logistic regression
- Linear discriminant analysis (LDA)
- Quadratic discriminant analysis (QDA)
- K-nearest neighbor classification
- Read chapter 4 and complete the guided lab.
- Optionally watch these supplementary videos:
- Take the weekly Moodle quiz to assess your learning by Friday of week 5.
You'll need these datasets: iris.csv,
week5.csv. You'll need to prepare the
iris dataset for classification according to these instructions.
Main Topics:
- Chapter 5: Validation
- Cross-Validation
- Validation set method
- Leave-one-out cross validation (LOOCV)
- k-Fold cross validation and the bias-variance trade-off
- The Bootstrap
- Read chapter 5 and complete the guided lab.
- Optionally watch these supplementary videos:
- Take the weekly Moodle quiz to assess your learning by Friday of week 6.
This week, we'll look at the material beyond R to emphasize that you're not
limited to R for these common machine learning tasks. You'll use
Java-based Weka for polynomial regression and validation, and you use my
simple Java code for bootstrapping. For this, you will need:
Main Topics:
- Chapter 7: Moving Beyond Linearity
- Polynomial regression
- Step and basis functions
- Regression and smoothing splines
- Local regressions
- Generalized Additive Models (GAMs)
- Read chapter 7 and complete the guided lab.
- Optionally watch these supplementary videos:
- Take the weekly Moodle quiz to assess your learning by Friday of week
7.
For this, you will need datasets week7_1.csv,
week7_2.csv,
week7_3.csv
Main Topics:
- Chapter 8: Tree-Based Methods
- Regression and Classification Decision Tree
- Bagging
- Random Forests
- Boosting
- Read chapter 8 and complete the guided lab.
- Optionally watch these supplementary videos:
- Take the weekly Moodle quiz to assess your learning by Friday of week 8.
For this, you will need dataset carseats.csv
and the Weka application. NOTE: Different installations on
different platforms can provide different results, so I highly recommend
doing these Weka exercises on our lab machines (command "weka").
Main Topics:
- Chapter 9: Support Vector Machines
- Maximal margin classifier
- Support vector classifier
- Support vector machines
- 1-vs.-1 and 1-vs.-all classification with >2 classes
- Read chapter 9 and complete the guided lab.
- Optionally watch these supplementary videos:
- Take the weekly Moodle quiz to assess your learning by Friday of week 9.
For this, you will need dataset rollHold.csv
and the RStudio application with the e1071 library.
Main Topics:
- Chapter 10: Unsupervised Learning
- Principal Component Analysis
- K-means Clustering
- Hierarchical Clustering
- Read chapter 10 and complete the guided lab.
- Optionally watch these supplementary videos:
- Take the weekly Moodle quiz to assess your learning by Friday of week
10. For this, you will need dataset iris.csv.
Acknowledgements: My thanks to
Trevor Hastie,
Robert Tibshirani, and
Jerome Friedman for their
excellent, freely available book
The Elements of
Statistical Learning (ESL), to
Gareth James, Daniela
Witten, Trevor Hastie and Robert
Tibshirani for their excellent, freely available book
An Introduction to Statistical
Learning with Applications in R (ISLR) that provides a gentler, more
accessible introduction to the important concepts of ESL, to
Trevor Hastie and
Robert Tibshirani for making
ISLR lecture
video available via YouTube, and to
Kevin Markham for creating
an
index to these videos and associated slides.
Todd Neller