Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda: unlock on timeout error #2469

Merged
merged 1 commit into from
Aug 17, 2024

Commits on Aug 16, 2024

  1. cuda: unlock on timeout error

    When attempting to checkpoint a container with CUDA processes,
    CRIU could fail with the following error:
    
    	Error (criu/cr-dump.c:1791): Timeout reached. Try to interrupt: 1
    	Error (cuda_plugin.c:143): cuda_plugin: Unable to read output of cuda-checkpoint: Interrupted system call
    	Error (cuda_plugin.c:384): cuda_plugin: PAUSE_DEVICES failed with
    
    In this situation, the target process is locked, but CRIU fails due to
    a timeout and exits with an error. We need to make sure that the target
    PID is unlocked in such case.
    
    Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
    rst0git committed Aug 16, 2024
    Configuration menu
    Copy the full SHA
    39d29f3 View commit details
    Browse the repository at this point in the history