Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gathering tasks for COEP hackathon, 2 Feb 2019 #20

Open
answerquest opened this issue Jan 18, 2019 · 11 comments
Open

Gathering tasks for COEP hackathon, 2 Feb 2019 #20

answerquest opened this issue Jan 18, 2019 · 11 comments

Comments

@answerquest
Copy link
Collaborator

answerquest commented Jan 18, 2019

main participant audience : 3rd year computer engineering and IT students of COEP. But event will be optional to attend and will be open for others.

@answerquest
Copy link
Collaborator Author

answerquest commented Jan 18, 2019

Tree canopy of Pune measurement using satellite data.

Ref: https://www.citylab.com/environment/2018/12/urban-tree-canopy-maps-artificial-intelligence-descartes-labs/578701/
Have to find data sources etc. Start working on this before 2nd feb if you want to get any headway.
Use ward maps of Pune to create rankings of wards based on canopy coverage, etc.

@answerquest
Copy link
Collaborator Author

answerquest commented Jan 19, 2019

Public transport data Animation, Visualization, Analysis

For Viz: Runparticles : http://renderfast.com/RunParticles/
Render various transport datasets into video using the above tool.

Analysis : Find the traffic choke-points, busy corridors, less-frequented stops, etc.

Possible datasets : Pune bus static GTFS, realtime logs
Hyderabad metro, Kochi metro GTFS

We will have a large dataset of GPS logs of select bus routes in Pune, will be released a few days before the event.

Links:

  • Folium : Python library that directly creates interactive maps

@answerquest
Copy link
Collaborator Author

answerquest commented Jan 20, 2019

PMJDY data analysis

See this discussion thread: https://groups.google.com/d/msg/datameet/ErNY82gA7dw/TOmnF7dLFQAJ
I've forked it and am updating data to latest: https://github.com/datameet-pune/pmjdy

Here's an example of data analysis done on it 2 yrs ago, we can take it further, make animated / interactive visualizations etc: https://zenodo.org/record/263919#.XCWWT99fjZs

Use this zenodo page for citations: https://zenodo.org/record/1410405#.XCWYEN9fjZs

@answerquest
Copy link
Collaborator Author

answerquest commented Jan 20, 2019

Pune Tree census data analysis, comparison

Get datasets from here: http://nikhilvj.co.in/files/trees/
Gathered from : http://treecensus.punecorporation.org/
Disclaimer there:

For the first time, Pune Municipal Corporation (PMC) has undertaken Geo-enabled Tree Census using GIS & GPS Technology for the Pune city. So far 3300000 trees have been censused by using this technology.

Tree census data for PMC is hereby uploaded as a draft version, for few wards on PMC website. After receiving suggestions / objections, the data will be finalised by PMC. It is hereby requested to one and all that comments/Suggestion may please be given within next 30 days. This can be done by sending the email at treecensus [at] punecorporation.org

This integration of botanical information with I.T. applications will be useful to all the residents of Pune in addition to researchers as well as public authorities. It is also hoped that this experiment will go a long way in increasing the green cover of Pune.

The uploaded data is raw data, hence suggestion will be highly appreciated and valid suggestion after approval of authorities will be inculcated in the system.

This is draft data from them, they have requested for feedback on it. So this dataset should be analysed for anomalies, etc.

@answerquest
Copy link
Collaborator Author

answerquest commented Jan 20, 2019

Openstreetmap mapping: Rural roads in Maharashtra

https://tasks.teachosm.org/contribute?difficulty=ALL&organisation=datameet

@answerquest
Copy link
Collaborator Author

answerquest commented Jan 22, 2019

Scrape data from PMC STP app. (sewage treatment plant)

App: https://play.google.com/store/apps/details?id=com.ionicframework.pmcstp846325&rdid=com.ionicframework.pmcstp846325

The dept folks don't have raw data collecting at their end; the app-based system was set up by a vendor who's gone now. They have requested open data portal to extract the data from the app itself. The app fetches data dynamically. Android developers can run it on simulation and archive the data packets, convert them to CSV so Open data portal can publish the archived data. The archived data can be of great value to researchers, environmental groups to analyse how much sewage is treated, how much is untreated, how it affects the water bodies, etc.

@answerquest
Copy link
Collaborator Author

answerquest commented Jan 25, 2019

Data Cleaning tasks for data hosted on Pune Open Data Portal

Tabular Listing:
https://drive.google.com/open?id=10DQBIXHcC5LvRD6z-kpFbC20Hv7xen9yYzO6AWPU6HE
(sorted by categories, as of 21 Jan 2019)

There are cases where the excel container has messed up dates, interpreting dd/mm/yyyy as mm/dd/yyyy. Also, multiple-row headers, merged cells etc make some of the data unsuitable for programmatic reading. Possible things that can be done:

Fix dates

Make CSVs with one header row, no gaps etc

Create an accompanying document / cover letter that details what each column stands for etc

Make unpivoted ('narrow') versions of pivoted ('wide') data

@answerquest
Copy link
Collaborator Author

answerquest commented Jan 25, 2019

MH Talukas PDFs and shapefiles comparison

Taluka PDF maps from MRSAC: http://www.mrsac.gov.in/en/taluka-maps
pages go till http://www.mrsac.gov.in/en/taluka-maps?page=0,12

Each PDF has outlines and names of villages in the Taluka.

MH Villages shapefile : https://drive.google.com/open?id=0B3gxOiUzXTR-RVdZNXh4X1huUG8

What we have to do is

  • Open PDF of taluka
  • Load MH Villages shapefile in QGIS and filter down to the taluka in focus
  • Set display properties to show village name as label
  • Visually inspect both, and make notes of any discrepancies.

Possible discrepancies

  • Some villages are shown to be under another taluka
  • One village is split into multiple parts in the shapefile.
  • Names are different for the same location/shape in PDF and shapefile
  • Boundaries don't relate.

Discrepancies can be logged in this tracking sheet (request organisers for access), or can be compiled separately if there is more details. We should try as far as possible to standardise it into tables and not keep it verbose.

Larger aim of the exercise

To document discrepancies between the official PDFs and the villages shapefile that Datameet has

Why

  • Shapefile has census 2011 codes for each village, which the Taluka PDFs do not have
  • Shapefile has village maps in digitized geospatial data form, can be immensely useful for researchers
  • But it is compiled from non-official sources, so it needs to be audited to ensure that it is showing accurate information as per official data and can be relied upon.
  • The PDFs are officially published by state agency MRSAC for public use, so can be considered as official source that we can audit the shapefile with.

@answerquest
Copy link
Collaborator Author

Finding ward number geospatially

Given a dataset of entities in Pune with lat-long locations, use QGIS or other geospatial tools to determine the ward number under which each data-point falls, and create an additional column in this dataset indicating ward number.

Supporting data: Pune ward maps, latest as well as previous.

Data this exercise can be done on: http://opendata.punecorporation.org/Citizen/CitizenDatasets/Index?categoryId=37

@answerquest
Copy link
Collaborator Author

answerquest commented Jan 26, 2019

Linguistic / NLP analysis on Grievances / Feedback datasets

Linguistic / NLP analysis on Grievances / Feedback datasets hosted on Pune Open Data Portal.
Bring data-driven insights into what people are saying, what people want the most.

@answerquest
Copy link
Collaborator Author

GIS : Make road routes from stop locations

We have Pune's bus routes data is in the form of sequence of stops, which translates to a series of lat-long points like in the screenshot below.
bus-route-lines

  1. Make a program that takes in this array of lat-long points, and generates an on-road route, with the sequence properly maintained.
  • You can use any routing API like google, openstreetmap, tomtom, graphhopper etc. Some will need payment for more advanced services so try to use free version only.
  • One approach could be to break up the route into A>B, B>C, C>D..., make road-route of each, and then merge them to create one contiguous route.
  1. Run this program on Pune's bus routes data.
  • Will need to process the existing dataset to get the desired input of lat-long points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant