This repository is for data analysis on the data provided by Uber for September 2014 of Newyork pickups
The csv file 'uber-raw-data-sep14.csv' contains all the data of (1028136, 4) dimensions. The file had 4 columns:
Date/Time
that represents the time stamp for the pickupLat
that represents the lattitude value of pickupLon
that represents the longitude value of pickupBase
that represents the TLC base company assosiated with the ride
Complete Notebook is available on the following link: https://www.kaggle.com/syedbelall/uber-pickup-in-new-york-city
The Code is provided in a script.R file and the output is in a markdown uploaded on Kaggle Some of the few outputs are given here
- The number of pick-ups increase as the day progresses.
- The number of pick-ups peaks around 6 pm when it is office leave time.
- The base B02617 is the best for Buisness perspective.
- The base B02512 is the worst for Buisness perspective.
- Tuesday have highest count of Uber Pickups
- Weekend shows highest number of pickups in nights.
- Maximum trend of uber pick-ups are in the center of city.
- Tuesday was the best day for all bases except B02764. It have Saturday as best day.
- 13th September had the most pick-ups
- The data is not enough to predict new Uber pickup call
- Even if you're provided with 3 variables, you can't predict the fourth for example:
- if you're provided with Lon, Lat, and Hour, you can still not predict which day it could possibly from, because every day of week shows same trend
- if you're provided with Lon, Lat and weekday, you can still not predict which hour it belong to because the traffic is scattered without showing a trend throught day.