Skip to content

Latest commit

 

History

History
19 lines (13 loc) · 971 Bytes

File metadata and controls

19 lines (13 loc) · 971 Bytes

MLPipeline-Lab1-EdX

#Spark Logo + Python Logo

Power Plant Machine Learning Pipeline Application -EdX - Lab1- Big Data Analysis with Apache Spark

This notebook is an end-to-end exercise of performing Extract-Transform-Load and Exploratory Data Analysis on a real-world dataset, and then applying several different machine learning algorithms to solve a supervised regression problem on the dataset.

** This notebook covers: **

  • Part 1: Business Understanding
  • Part 2: Load Your Data
  • Part 3: Explore Your Data
  • Part 4: Visualize Your Data
  • Part 5: Data Preparation
  • Part 6: Data Modeling
  • Part 7: Tuning and Evaluation

Our goal is to accurately predict power output given a set of environmental readings from various sensors in a natural gas-fired power generation plant.