Skip to content

I'm using this repository to save my progress on the book: "Introduction to Machine Learning".

Notifications You must be signed in to change notification settings

provezano/ML_Intro_Notebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML_Intro_Notebooks

This is a series of notebooks that mark my progress in reading and practicing the concepts presented by Muller and Guido in the book Introduction to Machine Learning with Python: A Guide for Data Scientists.

Chapter 1: Basic ML concepts and the first example with Iris dataset and KNN Classifier.

Chapter 2: Overview of a bunch of ML algorithms.

Nearest Neighbors
	- Easy to explain
	- Good as baseline
	- Not good for large and high dimensional datasets
	- non-linear time complexity

Linear Models
	- Good for large and high dimensional sparse datasets
	- Usually fast
	- Easy to explain
	- Some can perform feature selection
	- Sensible to scaling
	- Sensible to parameter tuning
	- Models are limited to hyperplanes

Naive Bayes
	- Very very fast
	- Only for classification
	- Good for large and high dimensional datasets
	- Often less accurate than Linear Models 

Decision Trees
	- Very fast
	- Robust to scaling
	- Very very easy to explain

Random Forests
	- Better than a Decision Tree alone
	- Very robust and powerful
	- Robust to scalin
	- Not very good to high-dimensional sparse data

Gradient Boosted Decision Trees
	- Often better than Random Forests
	- Slower to train tran Random Forests, but faster to predict and smaller in memory
	- Often needs parameter tuning

Support Vector Machines
	- Poweful for medium-size datasets
	- Requires scaling
	- Very sensitive to parameter tuning

Neural Networks
	- Can build very complex models
	- Sensitive to scaling of the data
	- Sensitive to parameter tuning
	- Long time to train

Chapter 3: Unsupervised Learning and Preprocessing

Scaling
	- StandardScaler
	- RobustScaler
	- MinMaxScaler
	- Normalizer

Dimensionality Reduction, Feature Extraction and Manifold Learning 
	- Principal Component Analysis (PCA)
	- Non-Negative Matrix Factorization
	- Manifold Learning with t-SNE

Clustering
	- *k*-Means Clustering
	- Agglomerative Clustering
	- DBSCAN

Clustering Evaluation
	- Adjusted Rand Index (ARI)
	- Normalized Mutual Information (NMI)
	- Sillhouette Coefficient
	- Robustness-based clustering metrics
	- Qualitative Method

Chapter 4: Representing Data and Engineering Features Chapter 5: Model Evaluation and Improvement

About

I'm using this repository to save my progress on the book: "Introduction to Machine Learning".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published