-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queued experiments fail, can't see logs to understand what's up #4332
Comments
Depends on iterative/dvc#9425 |
cc @dberenbaum |
@shcheklein Does Edit: asking to see if it's only about iterative/dvc#9425 or if it is also related to iterative/dvc#9616. |
I run into the same issue. My impression is that this happens mainly when running lots of experiments (100 or more). I create experiments using the cli like this:
Also I do not get any output from dvc queue logs . When I apply the failed experiment to workspace it runs without any problem. |
Making it |
The priority makes sense, but there are two underlying issues, and I'm not sure if we are trying to cover both here:
|
@dberenbaum I think eventually we try to cover both. If I understand correctly the The first item will be fixed automatically if the first one is fixed, right? What is your take? What is the complexity and scope of both on the DVC side? |
What did you mean here? I don't know that either one will fix the other. We are discussing in the tickets above what the options are to solve each and what level of effort it takes. |
Screen.Recording.2023-07-23.at.10.29.18.AM.mov
Single experiment runs fine even within a queue. Might be related to some resource allocation when it tries to do 2 of them, but I can't say what's going on.
It's related to me researching user complaining:
The text was updated successfully, but these errors were encountered: