This repository is used to hold example Jupyter notebooks and tutorials to help users get started with Data Science and Machine Learning. We recommend having first a look at the Quickstart.ipynb. The other example notebooks cover creating Kubeflow Pipelines, developing Plotly Dash apps, tensorflow, scikitlearn, pytorch and many more!
- Authors: Christian Ritter, Blair Drummond, Andrew Scribner
Kubeflow Pipelines allows users to build and deploy scalable machine learning workflows on Docker containers.
Contains two basic example notebooks: building a simple pipeline using dockerized components and using lightweight Kubeflow pipeline components.
More information about KFP
and KFP SDK API
This is another Kubeflow Pipeline example which uses the map-reduce pattern; executing a first step (map) and then aggregating all results from the map in another step (reduce). These examples also include pipelines that write data to MinIO.
There are 3 examples of plotting libraries used: Jupyter Dash, Matplotlib and Plotly.
Using the Jupyter Dash library, it is easy to develop Plotly Dash apps interactively within Jupyter environments.
Link to Jupyter Dash Repo: https://github.com/plotly/jupyter-dash
Matplotlib leaverages the Jupyter interactive widgets framework, ipympl
enables the interactive features of Matplotlib in the Jupyter notebook and in Jupyterlab. Overview from Matplotlib readme
Visit https://matplotlib.org/ for more examples, references and tutorials.
Plotly is a declarative charting library with over 30 chart types, including scientific charts, 3D graphs, statistical charts, SVG maps, financial charts, and more. Overview from Plotly readme
.
The Jupyter notebook tutorial contains examples creating a 3D scatter plot, 3D scatter plot with a regression surface, line plot with adjacent histogram and a bubble map.
The CANSIM Jupyter notebook explores the CANSIM API created by Statistics Canada. Exploring functions like getCodeSets
, getAllCubeList
and getCubeMetaData
from Statistics Canada's Web Data Service (WDS), downloading it as CSV data files.
For more information about the WDS, visit: https://www.statcan.gc.ca/eng/developers/wds/user-guide
This Jupyter notebook demonstrates a Pytorch tutorial on how to use the torchtext library to build a dataset for text classification analysis.
For more information and tutorials visit Pytorch Homepage
TorchText library Docs: https://pytorch.org/text/stable/index.html
This notebooks runs SQL queries on the S3 storage system, Minio. Minio's API is compatible with S3 storage SELECT API
. It is not effective for creating joins or other relational database tricks, but it's phenomenal at extracting exactly the data that you need, so that your queries are blazingly fast. Examples include querying data with SQL in .csv.gz, .parquet and .csv format.
Contains a demo of creating scatter plots in R using the ggplot2
library. Documentation on ggplot2
Also contains a Jupyter notebook demo using an interactive worldmap visualization with R, demonstrating how well R and Jupyter work together.
Using the Iris Dataset with Scikit-Learn and running decision tree classifiers on feature subsets.
Link to Sckitlearn docs https://scikit-learn.org/stable/
Consists of demos on how to connect to the bucket storage system using the minio client, s3fs library in Python and R.
A demo of using the Web Data Service developed by Statistics Canada, providing access to data and metadat Stats Can releases in R and Python.
More information about Web Data Service (WDS)
A tutorial from the TF homepage
using image classification via Keras.
Example notebooks are mounted on all user notebooks in the /aaw-contrib-jupyter-notebooks
folder. This is done via the start-custom.sh script in aaw-kubeflow-containers
.