Skip to content

This repository contains the code for ERCLab crawler back-end (Spark, Kafka, MongoDB, & HBase)

License

Notifications You must be signed in to change notification settings

geo47/ERCLabCrawler-backend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ERCLabCrawler-backend (Java SpringBoot application)

This repository contains the code for ERCLab crawler back-end (Java SpringBoot, Spark, Kafka, MongoDB, & HBase)

  • Kafka broker implemented in ERCLabCrawler clinent in Python sends the extracted data to ERCLabCrawler backend.
  • Kafka consumer receives the data from Client and apply data cleaning and ML operations simultanously using Spark & SparkML.
  • Finally stores the cleaned data in MongoDB or HBase databases.

About

This repository contains the code for ERCLab crawler back-end (Spark, Kafka, MongoDB, & HBase)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages