Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify completion semantics for send #63

Open
JiakunYan opened this issue Dec 19, 2023 · 1 comment
Open

Clarify completion semantics for send #63

JiakunYan opened this issue Dec 19, 2023 · 1 comment

Comments

@JiakunYan
Copy link
Collaborator

Roughly speaking, there are two completion semantics for a send operation:

  • case 1: the send operation is considered completed when the send buffer can be reused.
  • case 2: the send operation is considered completed when the completion of its corresponding receive no longer depends on calling LCI_progress on this side.

Currently, the completion of LCI is more on the case 1 side: LCI_sends, LCI_sendm, and LCI_sendmn do not take a completion object because its send buffer can be immediately reused; the completion semantics of LCI_sendl depends on the rendezvous protocol (case 1 completion for the "writeimm" protocol and case 2 completion for the write protocol).

The lack of case 2 completion semantics can result in hanging at the very end as some processes send their last messages and exit but the others can never get them.

@omor1
Copy link
Member

omor1 commented Dec 27, 2023

I think the proper way to handle this is to track if any sends aren't completed at the network level and block in LCI_finalize until they are complete, calling LCI_progress as necessary.

This should be simple to track, I think: have atomic counters for "started" and "completed" sends, progress in a loop during finalize until they are equal. Or just a single counter for "in progress" sends and compare to 0, but that maybe adds slightly more cache invalidations? Not sure how much it matters, really.

I don't think we need to explicitly expose the fact that sends aren't complete at the hardware level (due to buffering) to the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants