-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VM jobs run indefinitely #152
Comments
We encounter the same problem, the docker grading container is not running anymore, but the job does not get deleted from the tango jobs queue and is therefore blocking the queue. |
Was this issue resolved? I'm hitting the same problem with the default image (ubuntu). |
No, it wasn't. Still no answer on our end for what was going on. |
Were you able to get this to work with any image? What did you have to do for the default image? |
Hm just looking at the symptoms, it looks like
Can you give me some more context on your setup? Have you had autograders run successfully in the past? Did you change anything recently? Are Autolab and Tango deployed on the same machine? |
@nitsanshai AutoLab has actually worked fine for me for over two years; the problem only started when we significantly overhauled the Docker image used by the autograder. I unfortunately don't have immediate access to that image anymore (I could probably drum it up, though) but that was the only thing we changed prior to this behavior. @devanshk Yes, we've had autograders run successfully in the past. It was only when we added an entirely new Docker image for the autograder (one with significant JVM/Scala dependencies) that we observed these problems. That was the only change. Otherwise it's a vanilla one-click install on a single machine, and has otherwise been working fine with no issues. |
Could you join our Slack channel? It would be easier to debug this 1-on-1 and post our findings back here. |
Sure, but I can't really debug this in the short-term. I filed this ticket in June of last year when these issues cropped up, but we had to move past it months ago. I'm actively using AutoLab for my course right now and that prevents me from doing any live debugging on this issue until the semester is over. |
If you're able to create a test course, we could try things out there - or if you send me your dockerfile, I can replicate your bug on my end and experiment with it. |
i restarted whole docker-compoe stack, and now it seems working but got Runtime Trace
|
@pratikbin did you manage to figure out what was going on? I seem to be having the same problem locally. |
We're testing out a new autograder image on AutoLab, but after making a submission and the autograder spinning up, the launched job simply runs forever.
Expected Behavior
I would expect the job to eventually halt (particularly around the "Timeout" interval set in the autograder, which is 360 seconds in our case), and a Runtime Trace to be available showing the entire command and its output.
Actual Behavior
Each time I refresh the status page, all the "time elapsed" columns increment, indicating that the jobs are still running. However, when I actually SSH into the running Tango container, I don't see anything after running the
docker ps -a
command?Are these jobs actually running? Or is it some bug in the database? Either way, how can I stop them?
Steps to Reproduce the Behavior
???
Honestly not sure. The autograder settings have a 360 second timeout set, so why these jobs are still running is beyond me. There's no discernible output for any of the jobs; under the "Runtime Trace" for each job, it has entries for adding the job to the queue and eventually the "Job [x] started", but that's the very last one. Nothing after that; no debugging output.
The text was updated successfully, but these errors were encountered: