Skip to content

Latest commit

 

History

History
88 lines (51 loc) · 4.21 KB

README.md

File metadata and controls

88 lines (51 loc) · 4.21 KB

TfLStorm

Real time stream processing of the London Bus network in Apache Storm.

This project contains the prototype code for this project. Below is a breakdown of the project structure and instructions for running the project.

Bolts

Contains all of the bolts defined in the Apache Storm topology, with 1 per class.

Spouts

Contains the spouts defined in the Apache Storm topology, these accept incoming data from the various conneciton points, and emit it into the Storm computation graph.

Connection

This package contains the interfaces through which the Storm topology accepts imput and interacts with external services. Specifically, it contains the code for the making requests to the TFL API and to connect to the Redis data store.

XML

This conatins the largely automatically generated code that is used to build a java object representation the road incident data XML tree. This code was generated by creating a .xsd from sample data, and then passing this into a JAXB project which generated the java code from this specification. More information can be found here.

Timetable

This conatins classes associated with loading the timetable data from file. This is not taken from TfL directly as it is Large and changes rarely. Rather it is stored locally and read into Redis. The xml package here conatins the xml parsing code for the timetable data, in the same format as the XML information defined above.

Timetabledata

This is the timetable XML data.

Util

Contains various helper classes.

Polygon

Conatins the classes defining the representation of a road incident area polygon, and the line segment objects used to define the areas of road that are affected by an incident. The implementation extends the code provided here.

Web

Contains the web presentation layer - the interactive table and the live heatmap.

Instalation and Running

Requirements

  • Built and tested on Ubuntu 16.04
  • 3+ GB of free RAM
  • Apache Storm - project has been built on 1.1.1 - This requires Zookeeper as part of the instalation.
  • Redis installed and available locally
  • Application keys from TfL - They can be applied for here
  • Building the .jar requires the latest JDK and Maven. Node.js required for the web server.

Running the project

To run the project, the timetable data must first be loaded into Redis under the correct key. The TimetableLoader program will do this.

First compile the project with Maven: mvn package

Then run the .jar with the specified class path: java -cp ${path}/target/storm-starter-*.jar org.apache.storm.starter.TimetableLoader

This can take several minutes.

Once complete, the topology can be submitted to Storm with ${path}/storm jar ${path}/target/storm-starter-*.jar org.apache.storm.starter.TfLTopology ${TopologyName}

All being well, this will push the topology to storm and start it.

Notes

  • This is a prototype - there are hardcoded filepaths, urls and error handling is limited.
  • The system takes considerable time to set up properly and this project has been built with assumptions about the system it will be running on.
  • Lastly, this has been entirely coded without the help of an IDE. Please excuse code style problems like indentation and unused imports.

Credits

The following libraries/tools are used in this project: