Skip to content

Commit

Permalink
dcache-bulk: aborted request gets stuck in the STARTED state
Browse files Browse the repository at this point in the history
Motivation:

On FNAL production we have found that file permission or existence errors on
paths/targets at the root of the request leave the request stuck in the
STARTED state.

Modification:

The initial bulk request job needs to check for request completion after
unregistering itself from the completion handler.

Result:

Jobs which originally get stuck now complete; their failure information
contains the reason for premature completion.

(NOTE:  the 'null' DEPTH seems to appear on FNAL production [7.2], but
but I have not yet been able to reproduce it using 7.2 on the testbed.)

Target: master
Request: 8.0
Request: 7.2
Requires-notes: yes
Requires-book: no
Patch: https://rb.dcache.org/r/13442/
Acked-by: Tigran
  • Loading branch information
alrossi committed Feb 15, 2022
1 parent 2f10e66 commit fb67d1e
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
import org.dcache.services.bulk.BulkRequestNotFoundException;
import org.dcache.services.bulk.BulkRequestStatus;
import org.dcache.services.bulk.BulkRequestStatus.Status;
import org.dcache.services.bulk.BulkRequestStorageException;
import org.dcache.services.bulk.BulkServiceException;
import org.dcache.services.bulk.BulkStorageException;
import org.dcache.services.bulk.job.BulkJob;
Expand Down Expand Up @@ -136,6 +137,11 @@ public synchronized void abortRequestTarget(String requestId, String target,
statistics.incrementJobsAborted();
}

public synchronized void abortRequest(String requestId) throws BulkRequestStorageException {
requestStore.update(requestId, COMPLETED);
statistics.incrementRequestsCompleted();
}

@Override
public synchronized void cancelRequest(Subject subject, String requestId)
throws BulkServiceException {
Expand Down Expand Up @@ -229,7 +235,6 @@ public synchronized void requestTargetCompleted(BulkJob job) throws BulkServiceE
}
}


@Required
public void setCallbackExecutorService(ExecutorService service) {
callbackExecutorService = service;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,14 @@ void abortRequestTarget(String requestId, String target,
Throwable exception)
throws BulkServiceException;

/**
* Unrecoverable internal failure. Mark the request as terminated.
*
* @param requestId unique identifier
*/
void abortRequest(String requestId)
throws BulkServiceException;

/**
* Services request (from user) to (cancel) the request.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,15 @@ protected void doRun() {

protected void postCompletion() {
completionHandler.requestProcessingFinished(key.getJobId());

if (completionHandler.isRequestCompleted()) {
try {
submissionHandler.abortRequest(key.getRequestId());
} catch (BulkServiceException e) {
LOGGER.error("RequestJob, postCompletion() for {}: {}.", key.getRequestId(),
e.getMessage());
}
}
}

private void handleDirectory(String target, FileAttributes attributes)
Expand Down

0 comments on commit fb67d1e

Please sign in to comment.