- Download the 2013 taxi data using this shell script.
- [This R script] (https://github.com/msr-ds3/nyctaxi/blob/master/exploratory_analysis/load_one_week.R) loads the csvs, adds necessary and convenient columns (e.g. neighborhood names) and saves them as
taxi_clean
inone_week_taxi.Rdata
. To use the dataframe, simply callload('one_week_taxi.Rdata')
. - This R script uses
taxi_clean
to create a dataframe callesshifts_clean
of drivers (hack_license
s) and their shifts (as measured by the cutoff analysis here), and a dataframe calledtaxi_clean_shifts
with a shift number for each ride, and stores it in an Rdata file calledshifts_clean.Rdata
.
####NOTE: AS OF 7/26 YOU SHOULD MOVE ALL .RDATA FILES INTO THE RDATA FOLDER, AND SAVE ALL FUTURE RDATA FILES TO THAT FOLDER
##Descriptives
- Cool figures, plots, and maps (output of some of the scripts below) are in this dir
- This script creates a function (
visualize_trips_by_shift
) that can plot the route of a random taxicab driver over the course of a shift or a day of the week (visualize_trips_by_day
).- Usage:
visualize_trips_by_shift(df, hacklicense, shift = NULL)
.df
is the dataframe (usuallytaxi_clean
but sometimes a subset of that.hacklicense
is thehack_license
of the driver (usually randomly chosen fromdf
).shift
is optional - it takes a shift number; when ommitted, all shifts will be shown as a faceted plot.visualize_trips_by_day(df, hacklicense, day = NULL)
works in a similar manner except that it can take in a particular day in the format "Mon", "Tue", etc.
- Usage:
- Stats for one week of taxi rides by day of week, hour of day, pickup location, and dropoff location are computed by this R script.
- Trip based descriptive plotting (distributions of distance, time, fare, etc) can be found here
- Neighborhood popularity plots (in R) are here
- Interactive popularity heatmaps by neighborhood can be created using this script
- Ggmap (not-interactive) popularity heatmaps can be created using the functions in here
- Driver based descriptive plotting (distributions of distance, time, fare, etc, by number of drivers) are here
- Visualize shifts, and rides within them, for n random drivers by calling the
visualize_rides_and_shifts()
function created by this R script.
- Some plots using shift intervals [here] (https://github.com/msr-ds3/nyctaxi/blob/master/exploratory_analysis/plots_with_shift_interval.R)
- Features to be included in the design matrix for the shifts prediction task are listed in this markdown file.
- The design matrix can be created and saved as an Rdata file using the script here
- Descriptive plots for both regression and classification for each individual feature here
- Created some models and efficiency prediction here
- future work: Features to be included in the design matrix
- Visualizing flow over the day.
- Analysis on carpooling possibilities, here
- Plots on carpooling analysis.
- Probabilites of lat/lng destinations given a source neighborhood and a hour of day.
- Diving into carpool savings in more depth, at this link.
- A shiny app to visualize NYC taxi flow as a heatmap can be found here
- A shiny app (inspired by Todd Schneider's post) to visualize average trip times from neigborhood to neighborhood.
- An app to see popular neighborhood destinations, and unusual neighborhoods.
- Java code that can de-anonymize medallions and hack licenses.
- Play the "predict the driver's efficiency" guessing game using this script.