Namma Metro Ridership Tracker 🚇

Overview

The Bangalore Metro Rail Corporation Limited (BMRCL) publishes daily ridership data every 24 hours. Unfortunately, they do not provide historical data beyond one day. This repository contains a Python script and Jupyter Notebook to automate the process of downloading ridership data from BMRCL and storing it in a csv file. As the dataset evolves over time, it will allow for analysis of ridership and usage patterns.

Namma Metro rail network of Bangalore circa November 2024. Source: www.bmrc.co.in

Features

Dynamic Content Handling: Toggles the Kannada/English button (in headless browser mode) to retrieve data in English.
CSV File Management: Automatically creates and appends data to a CSV file, optimizing by removing duplicate entries. (This prevents duplication of data rows if, for example, the script is run multiple times a day.)
Error Handling: Includes checks for connectivity issues, page load time, and element availability, thus ensuring robust performance when working with an Indian public service website.
Script Automation: Included cronjobs.sh with instructions to trigger ridership.py daily at specified times.

Dataset

NammaMetro_Ridership_Dataset.csv is updated daily with each row representing the previous day's ridership stats. BMRCL's Ridership page offers the following data points:

Record Date
Total Smart Cards (= Stored Value Card + One Day Pass + Three Day Pass + Five Day Pass)
Tokens, Total NCMC, Group Ticket
Total QR (= QR NammaMetro + QR WhatsApp + QR Paytm)

The first entry of this dataset was recorded on 2024-10-26. As the dataset grows, one day and one row at a time, it will become a valuable resource for anyone interested in transportation trends and urban studies.

Installation

Clone this repository.

    git clone https://github.com/your-username/namma-metro-ridership-tracker.git
    cd Namma Metro-Ridership-Tracker

Install the required packages. Ensure you have Python 3.8+ and install dependencies:

    pip install -r requirements.txt

requirements.txt includes selenium and pandas

Usage

To collect the latest ridership data, run:

    python ridership.py

The Python program will automatically check for an existing dataset file Namma Metro_Ridership_Dataset.csv, create one if necessary, and append the current day's data row. The included Jupyter Notebook does exactly what the program does but allows you to follow along step-by-step. Open it with:

    jupyter notebook ridership.ipynb

Setup cronjob.sh (Optional)

The cronjobs.sh script automates the execution of ridership.py to collect daily ridership data from BMRCL at different times of the day. If the job is successful, it logs a timestamp to cron_log.txt. Otherwise, it appends the error output to a tmp folder.

The jobs run at 17:37 UTC, 20:52 UTC, and 01:23 UTC. Feel free to customise as needed. Scheduling multiple cron jobs in a 24-hour period increases the likelihood that data is captured every day. The program eliminates duplication of data in the dataset.

Doable Danny is a good place to learn more about cron jobs.

Project Structure

namma-metro-ridership-tracker (repo)

README.md
ridership.py —— Main Python script for scraping and storing data
ridership.ipynb —— Jupyter Notebook for exploratory data analysis
requirements.txt —— Required Python packages
Namma Metro_Ridership_Dataset.csv —— Collected ridership dataset (growing over time)
cronjobs.sh —— shell script to automatically run the program at a specific time

Future Work

Planned features and improvements include:

Data Visualization: Create plots to analyze trends in ridership.
~~Automated Scheduler: Set up a CRON job to automate daily scraping.~~ DONE!
~~Enhanced Error Handling and Logging: Failed attempts and missing data should break elegantly and be logged.~~ DONE!
Other City Metros: Metro corporations across India work in silos; each one with its own format for published data, if at all. One script to scrape 'em all!

License

This project is licensed under the BSD Zero-Clause License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.gitignore		.gitignore
LICENSE		LICENSE
NammaMetro_Ridership_Dataset.csv		NammaMetro_Ridership_Dataset.csv
README.md		README.md
cronjobs.sh		cronjobs.sh
nammametro_datapage.png		nammametro_datapage.png
nammametro_networkmap.jpg		nammametro_networkmap.jpg
requirements.txt		requirements.txt
ridership.py		ridership.py
ridership_1data.ipynb		ridership_1data.ipynb
ridership_2analysis.ipynb		ridership_2analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Namma Metro Ridership Tracker 🚇

Overview

Features

Dataset

Installation

Usage

Setup cronjob.sh (Optional)

Project Structure

Future Work

License

About

Languages

License

thecont1/namma-metro-ridership-tracker

Folders and files

Latest commit

History

Repository files navigation

Namma Metro Ridership Tracker 🚇

Overview

Features

Dataset

Installation

Usage

Setup cronjob.sh (Optional)

Project Structure

Future Work

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages