Three Units
- Tabular data
- Different types of data
- Machine learning
Tabular data
- Summarizing, visualizing, describing
- Pandas
- Vectorization (broadcasting, transformations)
- Split-apply-combine (groupby)
- Grammar of graphics (plotly, altair)
- Reshaping data (stack/unstack, melt)
- Joining data (merge)
Types of Data
- Tabular data
- Text
- Hierarchical (JSON, XML)
- Time series
- Geospatial
- (Image, briefly on Assignment 7B)
Machine Learning: Supervised
- Regression
- Linear
- K-nearest neighbors
- Classification
- K-nearest neighbors
- (Decision tree and logistic regression, only briefly)
Machine Learning: Supervised (continued)
- Test vs train error
- Cross-validation
- Model selection and hyperparameter tuning
- Ensemble methods
Machine Learning: Unsupervised
- K-means clustering
- Hierarchical clustering
“Everything is Numbers”
- One-hot encoding of categorical variables
- TF/TF-IDF representation of text
- Dates
- Map projections
- (Image, briefly on Assignment 7B)
Distance Metrics
- Variability (SD versus MAD)
- Distance between observations
- Document similarity (cosine distance)
- Test and train error (MSE)
- K-nearest neighbors
- K-means clustering
- Hierarchical clustering
- Geospatial distance (haversine)
Software skills
- Colab notebooks
- Python
- Pandas
- Plotly, Altair
- Beautiful Soup (webscraping)
- Working with APIs
- Scikit-learn
- Geopandas
Cross Disciplinary Studies Minor (CDMS) in Data Science
- Not really a “minor”
- Curriculum
- Successful completion of DATA 301 is a prerequisite
- See Dr. Glanz (Statistics)
- Even if you’re not interested in the minor, you might be interested in some of the courses!
DATA 401/402/403
- Three concurrent courses
- Many of the same topics as DATA 301, but more…
- More data
- More depth
- More math (e.g., maximum likelihood, loss functions, gradient decent)
- More methods (e.g., decision trees, neural nets)
- More programming (e.g., implementing from scratch)
- More applications (in particular, DATA 403 is a projects lab)
- Prerequisites: DATA 301, CSC 365, CSC 466, STAT 334
Thanks!
- Thanks for taking the course!
- Thanks for your patience and understanding!
- Thanks in advance for your feedback on the course evaluation!