Skip to content

SnehalAdsule/CSCI-599-Content-Management-and-Big-Data-Analysis

Repository files navigation

CSCI 599 Content Extraction ,Metadata enrichment and Name entity recognition

This project is dedicated to enriching the metadata of Polar TREC Data. All the work is part of the course work CSCI 599 : Content Detection and Big Data Analysis, taken under Dr. Chris Mattmann at Univeristy of Southern California.

The project is based on integration between Apache Tika and other libraries. We are leveraging the Apache Tika integration with Standford NLP, SOLR and created new parsers like Regex NER to get more insight into the content of the polar data. It uses Apache Tika 's Tika-app, Tika-server and tika python libaries to extract the content and metadata.

Team 007
USC Viterbi School of Engineering

About

Knowledge Discovery & Big Data Analytics - Information Retrieval

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published