|
DS 256 - Data Science Programming
Course Syllabus
|
Note: This syllabus is tentative and subject to change. Each reading
assignment should be completed before the class on the date indicated. If a
reading assigned in class does not match the reading assignment here, the
reading assigned in class supersedes.
APPC Note: Since I don't know at this point how many different class
sections there will be per week, I can only supply weekly granularity of detail
at this point. The class-session level details will be filled in on the
table following once this information is known.
Week 1 - Introduction to Data Science; Python basics
- What is Data Science?
- Data Science Lifecycle, Demonstration
- Linear Regression
- Worked Examples: CSV loading, data summarization, linear regression, plotting
- Readings: A Whirlwind Tour of Python (AWToP) by Jake VanderPlas, sections 00-03
- Python syntax basics, running code, basic semantics of variables and objects, and operators
Week 2 - Kaggle and Jupyter Notebooks; Python basics
- Readings: AWToP, sections 04-06
- Basic semantics of operators, built-in scalar types, built-in data structures
- Creating a Kaggle account
- Jupyter Notebook Basics
- Kaggle InClass Linear Regressions
Week 3 - Nonlinear and Multiple Regression; Python decisions, loops, and functions
- Readings: AWToP, sections 07-09
- Control statements, functions, exceptions, and exception handling
- Simple scientific visualization for exploratory data analysis
- Nonlinear regression and simple feature engineering
- Multiple regression
Week 4 - Logistic Regression; Python list and iteration constructs
- Readings: AWToP, sections 10-12
- Iterators, list comprehensions, generators
- Logistic Regression
- Categorical Data
- One-Hot Encoding
- Hashing Trick
Week 5 - Classification: Logistic and k-Nearest Neighbor (k-NN); Python organization, strings and string patterns, data science packages
- Readings: AWToP, sections 13-15
- Modules, packages, strings, regular expressions, data science packages
- Decisions
- Logistic classification
- k-NN classification
Week 6 - Decision Trees and Gradient-Boosted Decision Trees; IPython
- IPython, Readings: Python Data Science Handbook (PDSH) by Jake VanderPlas, Chapter 1
- Data Science:
- Decision Stubs
- Decision Trees
- Gradient-Boosted Decision Trees
Week 7 - Neural Networks and Deep Learning Basics; Numpy
- Numpy, Readings: PDSH Chapter 2
- Data Science:
- Artificial Neuron
- Neural Networks
- Deep Neural Networks
Week 8 - Clustering; Pandas
- Pandas, Readings: PDSH, Chapter 3, part 1
- Data Science:
- k-Means Clustering
- Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
- Hierarchical Clustering
Week 9 - Dimensionality Reduction; Pandas
- Pandas, Readings: PDSH, Chapter 3, part 2
- Data Science:
- Principal Component Analysis (PCA)
- t-Distributed Stochastic Neighbor Embedding (t-SNE) dimensionality reduction
Week 10 - Data Acquisition; Machine Learning
- Machine Learning, Readings: PDSH, Chapter 5
- Data Science:
(Readings for weeks 11-14 will consist of web articles and summary notes that I will supply in Jupyter notebook format.)
Week 11 - Data Cleaning and Preparation
- Missing values: elimination and imputation
- Error values: duplicates, outliers, constraint violations
- Sampling: simple, stratified, reservoir, oversampling, undersampling
Week 12 - Exploratory Data Analysis and Feature Engineering
- Kaggle Exploratory Data Analysis Case Studies
- Review: normalization, transformation, feature selection
- Autoencoders
- Ensembles
Week 13 - Validation and Model Assessment
- Holdout, k-fold cross-validation, iterated random subsampling cross-validation
- Feature significance tools and measures
- Closing the loop: iterative feature engineering
Week 14 - Scientific Visualization
- Survey and comparison of visualization packages:
- Matplotlib, Readings: PDSH, Chapter 4
- Plotly
- Seaborn
Todd Neller