Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent bug fixes #341

Merged
merged 7 commits into from
Jul 30, 2024
Merged

Intermittent bug fixes #341

merged 7 commits into from
Jul 30, 2024

Conversation

jimmymathews
Copy link
Collaborator

This PR does two things:

  1. Deals with most of Infinite loop checking for completed metric #339 by implementing some fallback behaviors in certain error states.
  2. Updates an occurrence of feature insertion to fix intermittent test failures noted in the thread of Vectorize operations on CellDataArrays #338.

(1) I was not able to really verify that the check noted in #339 was "infinite", but it is still incorrect and surely related to failing jobs and the complex logic related to requesting that features get computed. This complexity is due mostly to the fact that the counts metric is the only one which is meant to return to clients without any "pending" flag, so the client does not have to poll. I cleaned up this logic a little bit and introduced a 5 minute timeout that clears a feature that seems to have no active jobs and is still incomplete (allowing that it might compute correctly after a new request in the future). I also reduced the number of database connections made by the workers by consolidation.

(2) The ADIFeaturesUploader is now only used in one place, but when the schema was changed slightly to use more autoincrementing identifiers, this one usage was not updated, leading to certain errors. This is now updated.

@jimmymathews
Copy link
Collaborator Author

This also fixes a timing issue that in exceptional circumstances could make the worker containers remain idle, failing to catch the postgres NOTIFY signals.

@jimmymathews
Copy link
Collaborator Author

This also implements #331, since it is a minor generalization of the timeout already implemented here at the whole-feature level.

@jimmymathews jimmymathews merged commit dcc5927 into main Jul 30, 2024
1 check passed
@jimmymathews jimmymathews deleted the quickfixnotify branch September 20, 2024 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant