ML_Intro_Notebooks

This is a series of notebooks that mark my progress in reading and practicing the concepts presented by Muller and Guido in the book Introduction to Machine Learning with Python: A Guide for Data Scientists.

Chapter 1: Basic ML concepts and the first example with Iris dataset and KNN Classifier.

Chapter 2: Overview of a bunch of ML algorithms.

Nearest Neighbors
	- Easy to explain
	- Good as baseline
	- Not good for large and high dimensional datasets
	- non-linear time complexity

Linear Models
	- Good for large and high dimensional sparse datasets
	- Usually fast
	- Easy to explain
	- Some can perform feature selection
	- Sensible to scaling
	- Sensible to parameter tuning
	- Models are limited to hyperplanes

Naive Bayes
	- Very very fast
	- Only for classification
	- Good for large and high dimensional datasets
	- Often less accurate than Linear Models 

Decision Trees
	- Very fast
	- Robust to scaling
	- Very very easy to explain

Random Forests
	- Better than a Decision Tree alone
	- Very robust and powerful
	- Robust to scalin
	- Not very good to high-dimensional sparse data

Gradient Boosted Decision Trees
	- Often better than Random Forests
	- Slower to train tran Random Forests, but faster to predict and smaller in memory
	- Often needs parameter tuning

Support Vector Machines
	- Poweful for medium-size datasets
	- Requires scaling
	- Very sensitive to parameter tuning

Neural Networks
	- Can build very complex models
	- Sensitive to scaling of the data
	- Sensitive to parameter tuning
	- Long time to train

Chapter 3: Unsupervised Learning and Preprocessing

Scaling
	- StandardScaler
	- RobustScaler
	- MinMaxScaler
	- Normalizer

Dimensionality Reduction, Feature Extraction and Manifold Learning 
	- Principal Component Analysis (PCA)
	- Non-Negative Matrix Factorization
	- Manifold Learning with t-SNE

Clustering
	- *k*-Means Clustering
	- Agglomerative Clustering
	- DBSCAN

Clustering Evaluation
	- Adjusted Rand Index (ARI)
	- Normalized Mutual Information (NMI)
	- Sillhouette Coefficient
	- Robustness-based clustering metrics
	- Qualitative Method

Chapter 4: Representing Data and Engineering Features Chapter 5: Model Evaluation and Improvement

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML_Intro_Notebooks

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.DS_Store		.DS_Store
README.md		README.md
cap1 - First example with KNN and Iris Dataset.ipynb		cap1 - First example with KNN and Iris Dataset.ipynb
cap2.1 - k-NN - Classification Problems.ipynb		cap2.1 - k-NN - Classification Problems.ipynb
cap2.2 - k-NN - Regression Problems.ipynb		cap2.2 - k-NN - Regression Problems.ipynb
cap2.3 Linear Models - Regression Problems.ipynb		cap2.3 Linear Models - Regression Problems.ipynb
cap2.4 Linear Models - Classification Problems.ipynb		cap2.4 Linear Models - Classification Problems.ipynb
cap2.5 - Naive Bayes.ipynb		cap2.5 - Naive Bayes.ipynb
cap2.6 - Decision Tree.ipynb		cap2.6 - Decision Tree.ipynb
cap2.7 - Ensembles of Decision Trees.ipynb		cap2.7 - Ensembles of Decision Trees.ipynb
cap2.8 - Kernelized Support Vector Machines.ipynb		cap2.8 - Kernelized Support Vector Machines.ipynb
cap2.9 - Neural Networks (Deep Learning).ipynb		cap2.9 - Neural Networks (Deep Learning).ipynb
cap3.1 - Preprocessing and Scaling.ipynb		cap3.1 - Preprocessing and Scaling.ipynb
cap3.2 - Principal Component Analysis.ipynb		cap3.2 - Principal Component Analysis.ipynb
cap3.3 - Non-Negative Matrix Factorization (NMF).ipynb		cap3.3 - Non-Negative Matrix Factorization (NMF).ipynb
cap3.4 - Manifold Learning with t-SNE.ipynb		cap3.4 - Manifold Learning with t-SNE.ipynb
cap3.5 - Clustering - k-means.ipynb		cap3.5 - Clustering - k-means.ipynb
cap3.6 - Clustering - Agglomerative Clustering.ipynb		cap3.6 - Clustering - Agglomerative Clustering.ipynb
cap3.7 - Clustering - DBSCAN.ipynb		cap3.7 - Clustering - DBSCAN.ipynb
cap3.8 - Clustering - Comparing and Evaluating Clustering Algorithms.ipynb		cap3.8 - Clustering - Comparing and Evaluating Clustering Algorithms.ipynb
cap4.1 - Representing Data and Engineering Features.ipynb		cap4.1 - Representing Data and Engineering Features.ipynb
cap4.2 - Automatic Feature Selection.ipynb		cap4.2 - Automatic Feature Selection.ipynb
cap5.1 - Model Evaluation and Improvement (Cross-validation).ipynb		cap5.1 - Model Evaluation and Improvement (Cross-validation).ipynb
cap5.2 - Model Evaluation and Improvement (Grid Search).ipynb		cap5.2 - Model Evaluation and Improvement (Grid Search).ipynb
cap5.3 - Model Evaluation and Improvement (Evaluation Metrics and Scoring).ipynb		cap5.3 - Model Evaluation and Improvement (Evaluation Metrics and Scoring).ipynb
cap6.1 - Algorithm Chains and Pipelines.ipynb		cap6.1 - Algorithm Chains and Pipelines.ipynb
cap7 - Working with Text Data.ipynb		cap7 - Working with Text Data.ipynb
tmp		tmp
tmp.png		tmp.png
tree.dot		tree.dot

provezano/ML_Intro_Notebooks

Folders and files

Latest commit

History

Repository files navigation

ML_Intro_Notebooks

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages