-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does .
mean for an A
, R
, or G
indexed field?
#737
Comments
Vaguely related issue: #419 |
The meaning of a single dot is ambiguous, it can mean both. |
Is there appetite for modifying the spec to be unambiguous? I appreciate folks may be loathe to add more complexity. We had a quick spitball over here and think trailing commas could resolve the ambiguity with a bit of backwards-incompatibility. Specifically: a non-missing array-type field value should end with Using a hopefully intuitive JSON-inspired syntax for the meaning column in which
Existing VCFs with empty strings in the last element of array-type field values are now interpreted incorrectly: they're one element shorter than expected. For A, R, and G indexed fields we can error or fix the size automatically. For . indexed fields this is an undetectable change of semantics. VCFs using this hypothetical new spec (4.5?) would confuse existing tools, again the particularly bad case is . indexed fields which have no "checkbit". |
My reading of the specs is that the first option isn't actually valid: Section 1.4.2: The "must" doesn't leave a lot of room for missing and the missing allowed in Section 1.6 applies to a record that is missing all INFO/FORMAT fields. Section 1.6.2 seems to disagree with this and allow it: |
Yes, but not in a backwards incompatible manner. |
Just so I'm clear on what is acceptable: VCFv4.4 is backwards compatible if and only if, for any VCFv4.3 file, if I change the |
Hey all,
I'm trying to pin down exactly what
.
means for anA
,R
orG
indexed field. In particular does.
mean:Hail requires the user to explicitly acknowledge that their VCF has arrays of missing values and that they know how to interpret that. This is becoming an issue because folks are showing up to the support forum with VCFs that have INFO fields that look like
AS_VQSLOD=.,.;AS_YNG=.,.
. I'd like to pin down the interpretation of;FOO=.;
so that Hail can be less pedantic about these VQSLOD and YNG annotations.To be clear,
.,.
and longer arrays are clear to us: an array of missing values. As is intermingling:3,.
is an array with two values one present one non-missing.The text was updated successfully, but these errors were encountered: