You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we create a new communicator from a resilient communicator (e.g through split or dup, we add it to the list of tracked communicators, to be deleted in case of a failure. We need to decide what happens if creation of the communicator is successful, we can add the error handler, but we cannot add it to the track list (it requires a malloc, which may fail due to insufficient memory).
I think it is valid to say the call has failed, we remove the communicator, and return an MPI error code to the application. This would need to be documented in the specification. Allowing the call to succeed will lead to an error after a failure, because we will not be able to remove the damaged communicator.
If we take this approach, we need to determine which MPI error code to return. I recommend MPI_ERROR_OTHER or MPI_ERROR_INTERN.
The text was updated successfully, but these errors were encountered:
When we create a new communicator from a resilient communicator (e.g through split or dup, we add it to the list of tracked communicators, to be deleted in case of a failure. We need to decide what happens if creation of the communicator is successful, we can add the error handler, but we cannot add it to the track list (it requires a malloc, which may fail due to insufficient memory).
I think it is valid to say the call has failed, we remove the communicator, and return an MPI error code to the application. This would need to be documented in the specification. Allowing the call to succeed will lead to an error after a failure, because we will not be able to remove the damaged communicator.
If we take this approach, we need to determine which MPI error code to return. I recommend MPI_ERROR_OTHER or MPI_ERROR_INTERN.
The text was updated successfully, but these errors were encountered: