Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamograph RoadMap #12

Open
18 of 57 tasks
alberskib opened this issue May 22, 2014 · 7 comments
Open
18 of 57 tasks

Dynamograph RoadMap #12

alberskib opened this issue May 22, 2014 · 7 comments
Assignees

Comments

@alberskib
Copy link
Contributor

19.05 : 08.06 - First iteration GO:
Steps:
  • design of initial table layout for GO
  • get to know scarph library
  • design of (intial) scala model for GO
  • investiage usefullnes of present GO parser
  • usage of present parsers of write new one(custom) for GO
Artifacts:
  • document describing initial table layout for GO
  • scala model for GO (code)
  • dynamodb code for creation GO tables(according to design) as well code for retrieving GO data from DynamoDB
  • tests for code
  • examplary GO data saved into DynamoDB
  • parser for GO
09.06 : 29.06 - Second iteration ncbiTaxonomy:
Steps:
  • improve things related to the GO
  • design initial design table layout for ncbiTaxonomy - get to know data
  • design initial design scala model for ncbiTaxonomy
  • investiage usefullnes of present ncbiTaxonomy parser
  • usage of present parsers of write new one(custom) for ncbiTaxonomy
Artifacts:
  • document describing table layout for ncbiTaxonomy
  • scala model for ncbiTaxonomy (code)
  • dynamodb code for creation ncbiTaxonomy tables(according to design) as well for retrieving ncbiTaxonomy data from DynamoDB
  • tests for code
  • examplary ncbiTaxonomy data saved into DynamoDB
  • parser for ncbiTaxonomy
30.06 : 13.07 - Third iteration RefSeq:
Steps:
  • design table layout for RefSeq - get to know data
  • figure out connection of s3 with dynamodb
  • design scala model for RefSeq that handles s3 queries as well as dynamodb
  • search for proper parser of RefSeq data or build custom solution
Artifacts:
  • document describing table layout for RefSeq and cooperation of s3 with dynamodb for RefSeq
  • scala model for RefSeq (code)
  • dynamodb code for creation RefSeq tables(according to design) cooperation with s3 and hadling special cases
  • tests for code
  • examplary RefSeq data saved into DynamoDB
  • parser for RefSeq
14.07 : 27.07 - Fourth iteration UniRef: // or futher work on steps/artifacts from previous iteration
Steps:
  • design table layout for UniRef - get to know data
  • design scala model for UniRef
  • investiage usefullnes of present UniRef parser
  • usage of present parsers of write new one(custom) for UniRef
Artifacts:
  • document describing table layout for UniRef
  • scala model for UniRef (code)
  • dynamodb code for creation UniRef tables(according to design) as well for retrieving UniRef data from DynamoDB
  • tests for code
  • examplary UniRef data saved into DynamoDB
  • parser for UniRef
28.07 : 10.08 - Fifth iteration:
Steps:
  • execute performance tests
  • introduce improvements
  • evaluate soutions with mentors
  • introduce suggestions after evaluation
  • preparation of isage examples
Artifacts:
  • report from performance tests
  • document describing evaluation of solution with places to improve
  • solution draft
  • documentation draft
  • examples showing how to use solution
11.08 : 18.08 - Final delivery/release
Steps:
  • prepare packages
  • scrub code, documentation
Artifcats:
  • project documentation
  • working solution
  • GO, ncbiTaxonomy UniProtKB and UniRef data stored in DynamoDB

Each iteraton also focus on code quality(includes refactoring etc).

@alberskib
Copy link
Contributor Author

@bio4j/dynamograph Please take a look into Roadmap and express your opinion. Do you think that presented plan is reasonable I mean it contains too much aims or not enough? I added UniProtKB and UniRef as additional data types but if you think that there is some better data I will replace it.

@laughedelic
Copy link
Member

@alberskib I just added checkboxes in the current period, so that the current progress is more visual, check please what is already more or less done.

@eparejatobes
Copy link
Member

Looks good in general :)

I'd add aws resource management in general (create and destroy tables, autoscaling groups etc). About the dataset uniprot and all that is maybe too much, and I think that refseq could be more interesting, also for seeing how a mixed dynamo/s3 solution performs (refseq includes a lot of seq data with the need for range access).

@alberskib
Copy link
Contributor Author

@eparejatobes
By aws resource management you mean creation code that will provide such functionality or manually do such thing?
Mixing dynamoDb with s3 seems extremely interesting so I definitiely will handle this dataset.
I sligthly modify RoadMap.

@alberskib
Copy link
Contributor Author

If you know any other datasets that should be handled please let me know (generally if you suggest modification of selected datasets).

@eparejatobes
Copy link
Member

@alberskib I mean code of course, like what we talk about during our previous meeting. @evdokim can probably show you some examples

@eparejatobes
Copy link
Member

We're taking the midterm evaluation as an opportunity for refining and updating this. Some comments about it:

  • We are deprecating RefSeq from Bio4j (this is still not announced, but it will be in the next few days), so it would make no sense to work with it here. The ENA would be the equivalent resource integrated.
  • we should start creating issues and milestones for the next steps, and keep a more detailed tracking of them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants