Skip to content

Predicting US Airline Delay using spark(pyspark) and Apache Arrow.The objective of this project is to perform analysis on the historical flight data to gain valuable insights and build a predictive model to predict whether a flight will be delayed or not for a given set of flight characteristics.

Notifications You must be signed in to change notification settings

SaiprakashShetty/Big-Data-Airline-Delay-Prediction

Repository files navigation

Big-Data---Airline-Delay-Prediction

Predicting US Airline Delay using spark(pyspark) and Apache Arrow.

The objective of this project is to perform analysis on the historical flight data to gain valuable insights and build a predictive model to predict whether a flight will be delayed or not for a given set of flight characteristics.

Questions to be answered post analysis:

• Which Airports have the Most Delays? • Which Routes are typically the most delayed? • Airport Origin delay per month • Airport Origin delay per day/hour • What are the primary causes for flight delays?

The objective of the predictive model(Logistic Regression) is to build a model to predict whether a flight will be delayed or not based on certain characteristics of the flight. Such a model may help both passengers as well as airline companies to predict future delays and minimize them for the future references.

Dataset is obtained from "http://stat-computing.org/dataexpo/2009/the-data.html" "https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID="

About

Predicting US Airline Delay using spark(pyspark) and Apache Arrow.The objective of this project is to perform analysis on the historical flight data to gain valuable insights and build a predictive model to predict whether a flight will be delayed or not for a given set of flight characteristics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published