covid-19 Case Data Analysis (Indian States)

This Spark App analyses various covid cases data and enables you to create custom mathematical insights using a unified data structure and a trait method. After processing data it then writes to Cassandra which is then used as primary source for Data Visualization. Some analysis as part of demonstration using this app are as follows:

Maximum number of deaths reported among other states till Aug 29 2020.
Maximum number of Recovery reported among other states till Aug 29 2020.
Effective Increases in covid-19 Cases for all states per day
Minimum effective increase among other states till Aug 29 2020
Effective Increases in covid-19 Cases per total tests for all states per day
Effective increase for state kerala
Effective increase per million for state of Kerala

Primary data source

We are currently using two APIs maintained by covid19india

Confirmed cases from states_daily_api
Recovered cases from states_daily_api
Deceased cases from states_daily_api
Positive cases from state_test_data_api
Negative cases from state_test_data_api
Total Tested cases from state_test_data_api
Total People currently in Quarantine cases from state_test_data_api

Installation

Inorder to run this app in local system, prequisites and correct versions are required

Prequisites

spark version 2.4.6 compiled with scala version 2.12
scala 2.12
SBT 1.3.13 or higher
cassandra 4.0
cqlsh 5.0.1

Download and set up Cassandra and cqlsh in your local system referring apache cassandra doc here

Start cassandra service

$ sudo service cassandra start

Set up cassadnra keyspace and table.

$ cqlsh

This would open up cassadnra cqlsh session in your terminal. Now create a Keyspace named exactly as below (Keyspace and table names are hard coded in driver script. Any change would throw NoNodeFoundException by the datastax driver).

cqlsh> CREATE KEYSPACE covid19 WITH replication = {'class': 'SimpleStrategy', 'replication_factor':  '1'}  AND  durable_writes = true;

Access inside keyspace

cqlsh> USE covid19;

Create tables with appropriate partition key

cqlsh: covid19> CREATE TABLE state_data(property text, state_code text, state_value float, date date, PRIMARY KEY (property, state_code, date));

Installation is complete. you can stop cassandra service

$ sudo service cassandra stop

Running locally

git clone from master
Rename sample-cassandra.conf inside src/main/resources folder to application.conf. Update correct values under local_cassandra object.
start cassandra service

$ sbt compile
$ sbt package
$ sbt run local

Here App would start running in local machine. Fist fetching data from API, processing and finally writing to Cassandra. You can verify by logging into cqlsh and executing following

cqlsh> SELECT * FROM covid19.state_data LIMIT 100;

Running on Amazon EMR Cluster with Amazon Keyspace

Create Amazon AWS account and create an EMR instance referring this AWS Doc here
Set up Amazon Keyspace using this doc here
Rename sample-cassandra.conf inside src/main/resources folder to application.conf. Update correct values under amazon_cassandra object.
Go to project folder in your local system and build JAR file.

sbt assembly

SSH into EMR master node instance and set up cassandra trustore file.

This would generate a covid19-assembly-0.1.0-SNAPSHOT.jar file in src/target folder.

Create an Amazon S3 bucket referring to doc here
Upload covid19-assembly-0.1.0-SNAPSHOT.jar to S3 bucket.
Start and ssh to EMR instance and download jar file from S3 bucket.

aws s3 cp your_s3_path ./

execure spark submit commanf

spark-submit covid19-assembly-0.1.0-SNAPSHOT.jar aws

This would run the spark app and writing data to Amazon Keyspaces.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
project		project
src		src
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

covid-19 Case Data Analysis (Indian States)

Primary data source

Installation

Prequisites

Running locally

Running on Amazon EMR Cluster with Amazon Keyspace

About

Releases

Packages

Languages

nihadtp/covid19AnlaysisSpark

Folders and files

Latest commit

History

Repository files navigation

covid-19 Case Data Analysis (Indian States)

Primary data source

Installation

Prequisites

Running locally

Running on Amazon EMR Cluster with Amazon Keyspace

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages