-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash from ParallelMap when both partition and memory are auto (default) #370
Comments
Thanks for filing @KatharineShapcott! EDIT: Another user mentioned in #369 that with (the quite greedy) setting |
Hi @tensionhead, I think the problem is that the settings syncopy are giving into ACME do not exist, right? There is no partition='auto' setting. |
Mmh, that is the default internal behaviour of Syncopy apparently, and that never gave any problems before afaik. We did not change anything in this regard. Again, I think @pantaray is the best suited to address this problem. If the acme API somehow changed (and we missed that entirely with our tests for some reasons), then we need to take action. |
Hi all! ACME version 2022.8 changed the default behavior when using spyClient = spy.esi_cluster_setup(partition="8GBL", n_jobs=10)
filtered = spy.preprocessing(out, filter_class='but', order = 4, freq=[600, 900], filter_type='bp', direction = 'twopass', rectify = True, chan_per_worker=1) With this approach the |
Actually I get this behaviour because the spyClient fails to find any jobs and then doesn't exist. But interesting that parallel True isn't necessary anymore, that would save me from having this problem. It sounds like syncopy depends on ACME 2022.8, should I upgrade? |
This is caused by a dask bug which can be avoided by manually pinning
Probably not, since the new options in ACME 2022.8 are not yet ported to Syncopy. |
I don't think it's a bug, it's happening if the cluster is too full and no jobs can be allocated in the available time. My version of click is |
Ah, got it.
Yes, that's right. If the cluster is busy you can try playing around with |
So this should not be a problem coming directly from Syncopy? The |
I'm not sure what's the problem, to be honest. The version of ACME installed in the environment uses syncopy/syncopy/shared/kwarg_decorators.py Line 487 in a997b17
I'm not sure where the |
Here's a minimal example
|
Seems to happen if parallel is True but there's no pre-existing esi_cluster_setup |
I thought the culprit was this:
|
Mmh, so that's probably why we are all confused.. this paricular line 657 was not changed since ages, and according to @pantaray the keyword value EDIT: as you can see here, |
But do any of your tests use |
|
The traceback points into acme internals.. would you mind taking this discussion to the respective acme issue? |
our automated tests don't call |
Thanks for posting the full traceback, @KatharineShapcott ! Now I understand, what's going on. The key message is:
This is a bug in this version of ACME that has been fixed since, cf. esi-neuroscience/acme#42 You could try to simply update ACME using conda's |
In the test setup (before any tests are actually run), syncopy/syncopy/tests/conftest.py Line 27 in a997b17
And yes, for local clusters, the |
Great thanks, that makes sense now! I fixed it :) |
Describe the bug
In computational_routine.compute the default values of
partition
andmem_per_job
are both 'auto' which causes a crash in acme.To Reproduce
Steps to reproduce the behavior:
filtered = spy.preprocessing(out, filter_class='but', order = 4, freq=[600, 900], filter_type='bp', direction = 'twopass', parallel=True, rectify = True, chan_per_worker=1)
Expected behavior
I would have thought it tries to guess the partition itself with those settings but there's no ability to do that in ACME without a
mem_per_job
being supplied.System Profile:
OS: ESI cluster
Please paste the output of the following command here
Created: Mon Oct 31 10:08:12 2022
System Profile:
3.8.13 (default, Mar 28 2022, 11:38:47)
[GCC 7.5.0]
ACME: 0.21
Dask: 2021.10.0
NumPy: 1.21.5
SciPy: 1.7.3
The text was updated successfully, but these errors were encountered: