Tentative Schedule: Slides, Readings, References

·     (DM) Data Mining: Concepts and Techniques, 3rd edition, by Jiawei Han, Micheline Kamber, and Jian Pei. Morgan Kaufmann, 2011

·     (IR) Introduction to information retrieval, by Christopher Manning, Prabhakar Raghavan, and Hinrich Schutze. Cambridge University Press, 2008 https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf

·     (PY) Introduction to Machine Learning with Python: A Guide for Data Scientists, by Andreas C. Muller and Sarah Guido. O’Reilly 2016

 

 

Date

Slides

Chapters

additional materials

Week 1

Syllabus and Introduction 

 DM Chapter 1,2

 

 

Preparation, python tutorial, jupyter tutorial

 

Week2

Data acquisition, web scraping

 

 

 

Data labeling

Week3

linear algebra, probability, and statistics review

 

linear algebra quick review, probability quick review, statistics basics

Week4

numpy and pandas tutorials. linear algebra examples, pandas examples with simple datasets : sample datasets

 

 

 

Data cleaning

DM Chapter 3.1, 3.2, IR 2.1, 2.2

house sales data, and data cleaning notebook

Week 5

Data transformations

IR 6.2,6.3,6.4

 

week 5, 6

Feature engineering

 

Project 1 announced,

week 6

Overview of machine learning models

DM 8.1

 

Week 7

Linear models

 

SGD animation, linear regression SGD code

Week 8

Neural networks

DM 7.5

perceptron code, softmax regression code

Week 9

Model evaluation

DM8.5

Project 2 announced

Week 10

Decision tree models

DM 8.2, 8.6

Week 11

ensemble methods, Model validation

 

 

Week 12

feature selection

IR 13.5

decision tree exercise

week 12

dimensionality reduction

 

Clustering

DM 10.1-10.3

pca and tsne demo, kmeans demo

 

frequent itemsets and rule mining

DM 6.1, 6.2