-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fixes to products for when REPLAY IC's are used #2755
Add fixes to products for when REPLAY IC's are used #2755
Conversation
The linters.yaml has been updated so that shellnorms can run whenever a push occurs to EricSinsky-NOAA/develop.
The changes to linters.yaml have been reverted.
Changes have been added to account for the fixes needed to be made when there is an offset. These changes involve adjusting the atmosphere and ocean fhrs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See some suggestions.
NDATE has been removed and has been replaced with date.
The determination on whether or not the replay_diag_table should be used has been moved to config.fcst.
The dest_file variable has been moved outside of if-statement in forecast_postdet.
A bugfix has been added in exglobal_stage_ic to ensure ${MEMDIR:3} is not intrepreted in base 8.
A fix has been added for cases where there is an fhr that is a decimal number.
Rocoto has very little control over hanging batch system commands. If a batch system is being hammered, and I will say that PBSPro is highly prone to this type of behavior as it has had scaling/threading problems in the past. Which is one reason why I was very surprised to see PBSPro selected as the batch system for WCOSS. And it is very easy to overload PBSPro with naive user behaviors. Rocoto has already been highly tuned to do everything possible to put as little load as possible on PBSPro as problems were first seen and addressed on Cheyenne. |
If you (or others) have other processes that run |
The enkfgdascleanup job is PENDING in the queue for the C96C48_hybatmDA case on Hera.
|
@TerrenceMcGuinness-NOAA This looks like an issue with Slurm based on this conversation. Here is an excerpt from Slurm support:
I'd suggest you report it to RDHPCS. @DavidHuber-NOAA I will return to this shortly. So far as I can tell (no log and no retires) that this job didn't run Still need to use Slurm queries on the slurm job number. |
Made suggested updates to the Rocoto configurations for Timeouts and restated all the CI cases for this PR on WCOSS2:
|
Experiment C96_atm3DVar_extended_d443bf9c STALLED on Wcoss2 at 08/12/24 02:30:21 PM |
Confirmed: The STALLED condition on WCOSS2 is a false negative. No UNAVAILBLE were observed, only UNKOWNS. Added tasking to make the STALLED flag more robust. |
Got passed PBS/Rocoto anomalies that were leading to false negatives for STALLING because of UNKNOWN/UNAVAILBLE states, and arrived at |
Experiment C48_S2SW_d443bf9c FAIL on Wcoss2 at 08/12/24 03:54:24 PM Error logs:
Follow link here to view the contents of the above file(s): (link) |
@EricSinsky-NOAA FYI, I have a PR in CI testing that refactors the staging job: #2651 I ended up removing the https://github.com/KateFriedman-NOAA/global-workflow/blob/feature/issue_2475/parm/stage/stage.yaml.j2#L104 If my PR goes in first, I can work with you to make any needed updates in your branch to accommodate the staging job refactor. Let me know! |
CI Passed Hera at
|
Since (a) the previous failures on WCOSS were unrelated to this PR and (b) everything changes is covered by tests run on other machines, I'm going to go ahead and merge this. FYI: @KateFriedman-NOAA |
…e_rocoto * origin/develop: Jenkins Pipeline Updates (NOAA-EMC#2815) Add Gaea C5 to CI (NOAA-EMC#2814) Add support for forecast-only runs on AWS (NOAA-EMC#2711) Add fixes to products for when REPLAY IC's are used (NOAA-EMC#2755) Add capability to run forecast in segments (NOAA-EMC#2795)
@KateFriedman-NOAA Thank you for letting us know about these conflicts with the replay-related variables. We can continue working to resolve these conflicts in @NeilBarton-NOAA's PR #2788. |
Description
This PR fixes a couple issues that arise when replay initial conditions are used. These issues only occur when
REPLAY_ICS
is set toYES
andOFFSET_START_HOUR
is greater than0
. The following items are addressed in this PR.diag_table_replay
) that is used only whenREPLAY_ICS
is set toYES
. This diag_table accounts for the offset that occurs when using replay IC's.OFFSET_START_HOUR
is greater than0
, the firstfhr
is${OFFSET_START_HOUR}+(${DELTIM}/3600)
, which is defined inforecast_predet.sh
and will allow data for the first lead time to be generated. The filename with this lead time will still be labelled withOFFSET_START_HOUR
.This PR was split from PR #2680.
Refs #2725, #2754
Type of change
Change characteristics
How has this been tested?
These changes were cloned, built and ran on WCOSS2. These changes were tested by running GEFS.
Checklist