Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attach precise release timestamp to kafka messages #1

Open
theferrit32 opened this issue Jan 22, 2021 · 4 comments
Open

Attach precise release timestamp to kafka messages #1

theferrit32 opened this issue Jan 22, 2021 · 4 comments
Labels
clinvar Clinvar data exchange and reporting

Comments

@theferrit32
Copy link
Contributor

theferrit32 commented Jan 22, 2021

Could attempt to look up release file on clinvar ftp site. The problem is that within our system, we might have records modified between the time clinvar releases and it makes it to our ingest, resulting in out of order changes.

Add field release_timestamp with precise ISO8601 datetime of when original upstream release message was received, so all messages within the release have the same release_date and same release_timestamp. In this case downstream records timestamped between release_timestamp and the receipt of the messages created here will be "overridden" and flagged for review. Chance of this scenario is relatively small.

@larrybabb larrybabb added the clinvar Clinvar data exchange and reporting label Feb 22, 2021
@theferrit32
Copy link
Contributor Author

@larrybabb if this is something we still need we should see whether we or dsp can pull this precise value from the source FTP file. Or we should think about whether/why this is necessary.

@theferrit32
Copy link
Contributor Author

I think this should be bumped up in priority just so that we don't let it go too far without a more sound handling of timestamps and comparison logic in downstream applications (genegraph)

@theferrit32
Copy link
Contributor Author

SPARQL 1.1 supports logic operators for xsd:dateTime objects. If we use a consistent fully-qualified ISO8601 formatting with +00:00 timezone, we can also use simple string comparison to determine relative ordering (until the year 10000). Right now we are not storing timestamps in records in genegraph as xsd:dateTime.

https://www.w3.org/TR/sparql11-query/#OperatorMapping

@larrybabb
Copy link

i'll defer to @tnavatar and @toneillbroad for the final say

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clinvar Clinvar data exchange and reporting
Projects
None yet
Development

No branches or pull requests

2 participants