-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[not for merge] Status poller rearrangement patch stack #3295
Draft
benclifford
wants to merge
12
commits into
master
Choose a base branch
from
benc-status-refactor
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+110
−29
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
benclifford
force-pushed
the
benc-status-refactor
branch
2 times, most recently
from
March 26, 2024 11:16
6ecc4a8
to
3518372
Compare
benclifford
changed the title
[not for merge] status refactor
[not for merge] Status poller rearrangement patch stack
Mar 26, 2024
benclifford
force-pushed
the
benc-status-refactor
branch
2 times, most recently
from
March 28, 2024 16:48
15ad3f1
to
0460015
Compare
benclifford
force-pushed
the
benc-status-refactor
branch
4 times, most recently
from
April 9, 2024 20:07
da14f7a
to
e38ea68
Compare
benclifford
force-pushed
the
benc-status-refactor
branch
from
July 16, 2024 14:57
e38ea68
to
9fa50e5
Compare
block ID and job ID mappings contain the full historical list of blocks, but prior to this PR, the mapping was used as source of current jobs that should be scaled in
…that it exists? specifically raised by khk in the context of def scale_in
…nits" that only make sense in proportional to other scaling load amounts (i.e. ratios) - htex uses "tasks" as the unit. wq now uses "cores" as the unit. variables and text inside strategy.py should explain this. variables and docstrings should be clearer about this.
…ction behaviour changes: none
changes: none
TODO: this reveals a possible bug here that FAILED entries in simulated status are not immedaitely sent, but instead only get sent at the next poller update? unless submitted entries which are sent immediately? that should be an easy fix after this PR, though...
…caling strategy will see submitted jobs immediately, before a provider status refresh happens. this makes the scaling code immediately aware of what just happened, rather than for one poll period acting as if nothing had happened. When making changes that will later be reflected in the _status table, then those changes should be immediately also be made in the cached _status table eplciitly. before this PR, this code path does not happen in the case of a failed submission, where a failure status will appear when a refresh happens, but not before. in that case the scaling code will act as if the failed submission did not happen and will continue to submit repeatedly until a refresh happens much later. this PR makes _status be updated in this case too. fixes 3235
it can be subclassed to add in executor-specific status (as happens with htex)
benclifford
force-pushed
the
benc-status-refactor
branch
from
July 27, 2024 13:38
9fa50e5
to
92b0c46
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a patch stack with a bunch of provider status rearrangement.
The primary user-related motivation is to fix issues #3235 and #2627
Another goal is to rationalise the handling of multiple sources of block/job status information.
As of the time of writing, this PR contains many small steps that culminate in fixing (I hope) both of those issues, but leaves the scaling code still needing cosmetic tidyup. It looks like PollItem is now a facade that mostly handles reporting things to the monitoring system, which could also be moved into the status handling executor code...
This patch stack is managed as an stgit patch stack on my (@benclifford ) laptop so don't go pushing things to this branch because that's a hassle for me to deal with.
Because the scaling code is quite hard to comprehensively understand, and other attempts to change this code have shown to be very hard for everyone to review, I would like to merge this code patch-by-patch, each one its own PR, with it being clearly defined for each one if I am expecting any behaviour change, and if so what that behaviour change is, ideally accompanied by tests; and I would like reviewers to pay attention to the behaviour of each PR, rather than assuming its probably right.
When reviewing each PR individually, this current PR should serve as end-goal context for why a change is being made. Each commit on this branch is a future-PR-in-preparation and should make sense on its own.