Skip to content
Nihad TP edited this page Aug 28, 2020 · 1 revision

Data Structure

All data to be procesed are converted into following structure:

case class allStatus(stateMap: Map[String, Float], property: String, date: DateTime)

  1. Here stateMap is Map of all state code and its corresponding value for a given property on a given day. Eg for state Maharashtra 200 people have confirmed covid19 on a particular date and Kerala 20 poeple and so on then map type would look like this Map("Maharashtra" -> 200, "Kerala" -> 20, ...)

  2. property is type of value to be processed Eg : Confirmed or Deceased or Recovered etc.

  3. date is the date at which the value is recorded.

Operate Method

Operate method takes in two or three allStatus datatype and a function that does mathematical operation among them and returns same allStatus type.

operate(d1: allStatus, d2: allStatus)(f: (Float, Float) -> Float): allStatus

With three arguments

operate(d1: allStatus, d2: allStatus, d3: allStatus)(f: (Float, Float, Float) -> Float): allStatus

The function argument in the method takes in data from each value from all states from stateMap from d1 and d2 of same state and on same date and returns an output for the same state and same date. With the same output it creates another allStatus structure with same state and date it calculated from. This same output can used again as parameter to the operate method since its type is allStatus. Example

Suppose you have allStatus data of Confirmed case and Recovered case for same date. For sake of simplicity lets assume we have only 2 states. The 2 allStatus data would look like this.

val conf = allStatus(stateMap = Map("Kerla" -> 10, "Maharashtra" -> 100), property = "Confirmed", date = 01-05-2020)

val rec = allStatus(stateMap = Map("Kerla" -> 30, "Maharashtra" -> 30), property = "Recovered", date = 01-05-2020)

And you want to calculate Effective increase of cases each day ie. Confimred - Recovered. Then operate method would look like this

operate(conf, rec)((x,y) => x-y)

This would return an all status data with following ouptut:

allStatus(stateMap = Map("Kerla" -> -20, "Maharashtra" -> 70), property = "operate2", date = 01-05-2020)

covidMap Method

This method is used to operate RDDs of allStatus type.

covidMap(RDD[allStatus], RDD[allStatus], newPropertyName: String)(f:(Float, Float) -> Float): RDD[allStatus]

Here the functionality is same as above but it operates on RDDs of allStatus and extra parameter newPropertyName is to set new property name for allStatus type.

Also if any data on a particular date is missing for one property and is available for another property then that i omitted. For example if there is data for confirmed case on 05-05-2020 and no data for recovred data for the same date then computaion for that particulat case will not take place.

Same is the case for state code inside stataMap. If there is value for data on Kerala state on Confirmed case and not for the same state in recovered case. Then ouptout allStatus will show Kerala as NA.

Cassandra Methods

Cassandra methods takes allStatus data and writes them to cassandra. It creates each session for a partion and writes them to DB concurrently for fast writing

Clone this wiki locally