This project is dedicated to enriching the metadata of Polar TREC Data. All the work is part of the course work CSCI 599 : Content Detection and Big Data Analysis, taken under Dr. Chris Mattmann at Univeristy of Southern California.
The project is based on integration between Apache Tika and other libraries. We are leveraging the Apache Tika integration with Standford NLP, SOLR and created new parsers like Regex NER to get more insight into the content of the polar data. It uses Apache Tika 's Tika-app, Tika-server and tika python libaries to extract the content and metadata.
Team 007
USC Viterbi School of Engineering