You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm being lazy and didn't feel like making 3 issues, but obviously they'll be 3 PRs and if people agree that this is something we wish to move forward then we can spin out format specific issues for discussion.
For background see samtools/bcftools#1961. While this is perhaps just an abuse, as it's redundant data and fundamentally it's an identifier and not really needing to be an enumeration with any meaningful ordering (so should be a string), it raises the thought that maybe we need 64-bit data elements in place before we get an issue that requires them right that instant.
For BCF, there are empty slots in the data types already (and in some cases implemented in htslib).
For BAM it's a mixed bag - fixed fields are hard limited and we can't easily change them without fundamentally breaking the data layout of existing files, but for aux tags there is a trivial compatible way of adding l, L (long / 64-bit ints) and maybe d (double).
For CRAM a similar issue applies to BAM; fixed structure elements are hard to change, but aux tags share a similar encoding to BAM.
The text formats have limits applied only out of interoperability for their binary counterparts, and indeed htslib already supports longer values for some of the fields in SAM (and limited writing out as BAM when present).
The text was updated successfully, but these errors were encountered:
Java does not really have unsigned (see the first paragraph of #460 (comment)), so we should probably only consider adding representations for int64_t, not uint64_t as well. (Surely 63 bits of magnitude is enough for anyone! 😄)
So e.g. for BAM that would mean e.g. just l (signed “long” int64_t) and maybe d (double).
I'm being lazy and didn't feel like making 3 issues, but obviously they'll be 3 PRs and if people agree that this is something we wish to move forward then we can spin out format specific issues for discussion.
For background see samtools/bcftools#1961. While this is perhaps just an abuse, as it's redundant data and fundamentally it's an identifier and not really needing to be an enumeration with any meaningful ordering (so should be a string), it raises the thought that maybe we need 64-bit data elements in place before we get an issue that requires them right that instant.
For BCF, there are empty slots in the data types already (and in some cases implemented in htslib).
For BAM it's a mixed bag - fixed fields are hard limited and we can't easily change them without fundamentally breaking the data layout of existing files, but for aux tags there is a trivial compatible way of adding
l
,L
(long / 64-bit ints) and maybed
(double).For CRAM a similar issue applies to BAM; fixed structure elements are hard to change, but aux tags share a similar encoding to BAM.
The text formats have limits applied only out of interoperability for their binary counterparts, and indeed htslib already supports longer values for some of the fields in SAM (and limited writing out as BAM when present).
The text was updated successfully, but these errors were encountered: