What to do if we cannot track a communicator? #29

rfvander · 2017-03-30T21:45:03Z

When we create a new communicator from a resilient communicator (e.g through split or dup, we add it to the list of tracked communicators, to be deleted in case of a failure. We need to decide what happens if creation of the communicator is successful, we can add the error handler, but we cannot add it to the track list (it requires a malloc, which may fail due to insufficient memory).
I think it is valid to say the call has failed, we remove the communicator, and return an MPI error code to the application. This would need to be documented in the specification. Allowing the call to succeed will lead to an error after a failure, because we will not be able to remove the damaged communicator.
If we take this approach, we need to determine which MPI error code to return. I recommend MPI_ERROR_OTHER or MPI_ERROR_INTERN.

rfvander added the question label Mar 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What to do if we cannot track a communicator? #29

What to do if we cannot track a communicator? #29

rfvander commented Mar 30, 2017 •

edited

Loading

What to do if we cannot track a communicator? #29

What to do if we cannot track a communicator? #29

Comments

rfvander commented Mar 30, 2017 • edited Loading

rfvander commented Mar 30, 2017 •

edited

Loading