GWAS Toolkit Comparisons #701
Replies: 2 comments
-
(Posted by @alimanfoo)
Just to say this is very nicely put! |
Beta Was this translation helpful? Give feedback.
-
Thanks @alimanfoo! I'm curious how often you'd say you end up doing more ad-hoc work in any one analysis vs simply being able to chain operations already in scikit-allel (or any other library). Phrased differently, what fraction of the code in your pipelines, or time spent writing them, ends up being specific to a single experiment/study? |
Beta Was this translation helpful? Give feedback.
-
(Posted by @eric-czech)
We've recently been doing some work to survey the landscape of GWAS toolkits to better understand what feature sets are supported by some libraries and not others. A summary of this can be found in this spreadsheet.
This should serve as a good topic for aggregating thoughts on crucial analytical capabilities that any pipeline should support, ideally with a focus on anything necessary after variants have been called (e.g. QC, population structure analysis, association testing, regression diagnostics, and possibly operations on summary stats like meta-analysis and fine-mapping). An implicit assumption here is that "toolkits" will become more interesting than individual "tools" (namely single-core CLI apps) as scalability challenges make it more and more important to facilitate ad-hoc analysis within some larger distributed computing framework, so that should confine the scope of this topic significantly.
This would also be a good place for thoughts on how newly released libraries compare to the current, prominent ones in the Spark, Python, and R ecosystems.
Beta Was this translation helpful? Give feedback.
All reactions