This repository is about implementation of ML(LIB) library in pyspark implementing regression, classification and clustering techniques. Creating RDD's and apply transformations and actions.
- Regression :-> Spark-Regression-ML(LIB).ipynb
- Classification :-> Spark-Classification-ML(LIB).ipynb
- Clustering :-> Spark-Clustering-ML(LIB) .ipynb
- Spark_RDD :-> Airport_problem and Word Count
- PySpark_Cheat_sheet :-> PySpark_SQL_Cheat_Sheet_Python.pdf
All these notebooks are created on IBM WATSON Cloud Platform. Which provide Python + Spark Environment in python notebook.
Spark Context is created and provide as (sc) variable.
To check verison type :- sc.version