-
Notifications
You must be signed in to change notification settings - Fork 1
File Structure
Jan Ehmueller edited this page Jul 28, 2017
·
5 revisions
Contains diagrams about our pipelines and the keynote of the bachelor's podium.
Contains evaluation results created for our bachelor's theses.
Submodule of the implisense repo containing the source code of the ImpliSense import.
Contains local jars used as dependency and added to the fat jar. Currently only contains the CoreNLP with Michael's model.
Contains the Luigi source code.
Contains the Mapbox python component.
Contains sbt related files (added plugins and sbt version).
Contains scripts used to deploy jobs and automate the CI.
Data models are in the sub packages model
for the packages (and not listed).
-
main
-
resources
: files used in jobs-
configs
: config files for jobs and normalizations
-
-
scala
: source code-
de/hpi/ingestion
: main package-
curation
: curation related jobs (e.g. commit job) -
dataimport
: classes used for imports of every data source-
dbpedia
: DBpedia import and transformations -
kompass
: Kompass import and transformations -
spiegel
: import of Spiegel Online articles -
wikidata
: Wikidata import ans transformations -
wikipedia
: import of the Wikipedia
-
-
datalake
:Subject
related classes, import into thedatalake
and the CSV export for the neo4j -
datamerge
: jobs for merging new datasources and connecting relations to master nodes -
deduplication
: classes and jobs for the deduplication-
blockingschemes
: blocking schemes used for the blocking -
similarity
: similarity measures used for the reduplication
-
-
framework
: traits for theSparkJob
framework -
graphxplore
: jobs working with the business graph -
implicits
: classes containing self written implicits (e.g. implicits for collections) -
textmining
: jobs for the Named Entity Linking and Relation Extraction-
kryo
: serializers needed to serialize the trie -
nel
: jobs needed to perform NEL on newspaper articles -
preprocessing
: creation of the knowledge base (i.e. transforming Wikipedia) -
re
: jobs for the Relation Extraction -
tokenizer
: trait for and implementations of tokenizers
-
-
versioncontrol
: jobs for restoring and diffing versions
-
-
-
-
test
-
resources
: files used for the tests -
scala
: test source code
-