Skip to content

Commit

Permalink
Merge pull request #179 from percyfal/author-update
Browse files Browse the repository at this point in the history
Add PU author affiliations and acknowledgements
  • Loading branch information
jeromekelleher authored Nov 21, 2024
2 parents 1157fec + a11a6b5 commit 63dce2a
Showing 1 changed file with 35 additions and 22 deletions.
57 changes: 35 additions & 22 deletions paper.tex
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
\author[8]{Sam Tallman} % https://orcid.org/0000-0001-7183-6276
\author[1]{Rafal Wojdyla} % https://orcid.org/0009-0005-0735-7090
\author[9]{Shadi Zabad} % https://orcid.org/0000-0002-8003-9284
\author[10]{Per Unneberg} % https://orcid.org/0000-0001-5735-3315

% Senior
\author[1,\authfn{2}]{Jeff Hammerbacher} % https://orcid.org/0000-0001-6596-8563
Expand All @@ -64,6 +65,9 @@
\affil[7]{Wellcome Sanger Institute}
\affil[8]{Genomics England}
\affil[9]{School of Computer Science, McGill University, Montreal, QC, Canada}
\affil[10]{Department of Cell and Molecular Biology, National
Bioinformatics Infrastructure Sweden, Science for Life Laboratory,
Uppsala University, Uppsala, Sweden}

%%% Author Notes
\authnote{\authfn{1}Joint first author.}
Expand Down Expand Up @@ -1022,33 +1026,34 @@ \subsection{Case study: 1,063 spruce whole-genome samples}
\toprule
{Field} & {type} & {storage} & {compress} & {\%total} \\
\midrule
/call\_PL & int16 & 6.12 TiB & 12.0 & 92.75\% \\
/call\_genotype & int8 & 282.45 GiB & 26.0 & 4.18\% \\
/call\_genotype\_mask & bool & 42.15 GiB & 180.0 & 0.62\% \\
/variant\_DP4 & int32 & 26.24 GiB & 2.1 & 0.39\% \\
/variant\_MQSBZ & float32 & 13.08 GiB & 1.1 & 0.19\% \\
/variant\_RPBZ & float32 & 13.04 GiB & 1.1 & 0.19\% \\
/variant\_MQBZ & float32 & 12.97 GiB & 1.1 & 0.19\% \\
/variant\_SCBZ & float32 & 12.94 GiB & 1.1 & 0.19\% \\
/variant\_VDB & float32 & 12.84 GiB & 1.1 & 0.19\% \\
/variant\_quality & float32 & 12.84 GiB & 1.1 & 0.19\% \\
/variant\_BQBZ & float32 & 12.64 GiB & 1.1 & 0.19\% \\
/variant\_SGB & float32 & 12.12 GiB & 1.2 & 0.18\% \\
/variant\_position & int32 & 9.65 GiB & 1.4 & 0.14\% \\
/variant\_AC & int16 & 7.42 GiB & 2.8 & 0.11\% \\
/variant\_DP & int32 & 6.71 GiB & 2.1 & 0.10\% \\
/variant\_allele & object & 5.37 GiB & 21.0 & 0.08\% \\
/variant\_AN & int16 & 3.05 GiB & 2.3 & 0.05\% \\
/call\_genotype\_phased & bool & 2.03 GiB & 1800.0 & 0.03\% \\
/variant\_filter & bool & 1.46 GiB & 2.4 & 0.02\% \\
/variant\_MQ & int8 & 489.83 MiB & 7.3 & 0.01\% \\
/call\_PL & int16 & 6.1 TiB & 12.0 & 92.75\% \\
/call\_genotype & int8 & 282.5 GiB & 26.0 & 4.18\% \\
/call\_genotype\_mask & bool & 42.2 GiB & 180.0 & 0.62\% \\
/variant\_DP4 & int32 & 26.2 GiB & 2.1 & 0.39\% \\
/variant\_MQSBZ & float32 & 13.1 GiB & 1.1 & 0.19\% \\
/variant\_RPBZ & float32 & 13.0 GiB & 1.1 & 0.19\% \\
/variant\_MQBZ & float32 & 13.0 GiB & 1.1 & 0.19\% \\
/variant\_SCBZ & float32 & 12.9 GiB & 1.1 & 0.19\% \\
/variant\_VDB & float32 & 12.8 GiB & 1.1 & 0.19\% \\
/variant\_quality & float32 & 12.8 GiB & 1.1 & 0.19\% \\
/variant\_BQBZ & float32 & 12.6 GiB & 1.1 & 0.19\% \\
/variant\_SGB & float32 & 12.1 GiB & 1.2 & 0.18\% \\
/variant\_position & int32 & 9.7 GiB & 1.4 & 0.14\% \\
/variant\_AC & int16 & 7.4 GiB & 2.8 & 0.11\% \\
/variant\_DP & int32 & 6.7 GiB & 2.1 & 0.10\% \\
/variant\_allele & object & 5.4 GiB & 21.0 & 0.08\% \\
/variant\_AN & int16 & 3.1 GiB & 2.3 & 0.05\% \\
/call\_genotype\_phased & bool & 2.0 GiB & 1800.0 & 0.03\% \\
/variant\_filter & bool & 1.5 GiB & 2.4 & 0.02\% \\
/variant\_MQ & int8 & 489.8 MiB & 7.3 & 0.01\% \\
\bottomrule
\end{tabular}
\end{table}

Table~\ref{tab-spruce-data} shows that the dataset storage size is
dominated by a major component, call\_PL, which accounts for nearly
93\% of the total. call\_PL is yet another field with low compression potential due to inherent noisiness.
93\% of the total. call\_PL is yet another field with low compression
potential due to inherent noisiness.

%% FIXME: reference regarding VCF POS?
The spruce data set poses an interesting challenge to the VCF storage
Expand Down Expand Up @@ -1644,7 +1649,10 @@ \subsection{Funding}
JK and AM acknowledge the Bill \& Melinda Gates Foundation (INV-001927).
TM acknowledges funding from The New Zealand Institute for Plant \& Food
Research Ltd Kiwifruit Royalty Investment Programme.
PU was supported by the SciLifeLab \& Wallenberg Data Driven Life
Science Program, Knut and Alice Wallenberg Foundation (grants: KAW
2020.0239 and KAW 2017.0003), and by the National Bioinformatics
Infrastructure Sweden (NBIS) at SciLifeLab
% \subsection{Author's Contributions}
\section{Acknowledgements}
Expand All @@ -1665,6 +1673,11 @@ \section{Acknowledgements}
Biomedical Research Centre. The views expressed are those of the author(s) and
not necessarily those of the NHS, the NIHR or the Department of Health.
Computation for the Spruce case study were enabled by resources
provided by the National Academic Infrastructure for Supercomputing in
Sweden (NAISS), partially funded by the Swedish Research Council
through grant agreement no. 2022-06725.
Genozip was used under the terms of the free Genozip Academic license.
Genozip was only used on simulated data, in compliance with
the ``No Commercial Data'' criterion.
Expand Down

0 comments on commit 63dce2a

Please sign in to comment.