-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#12652: switch all-gather to worker initiated edm termination mode #14078
base: main
Are you sure you want to change the base?
Conversation
@@ -750,7 +738,7 @@ operation::ProgramWithCallbacks all_gather_multi_core_with_workers_helper( | |||
auto &sender_edm_builder = is_buffer_in_clockwise_direction(b) ? clockwise_edm_builders.at(i) : counter_clockwise_edm_builders.at(i); | |||
log_trace(tt::LogOp, "Adding sender EDM channel"); | |||
EriscDatamoverBuilder::ChannelBufferInterface const& sender_channel_buffer_info = | |||
sender_edm_builder.add_sender_channel(sender_worker_writer_semaphore_id, clockwise_link_buffer_num_messages_to_send.at(b), sender_worker_coords); | |||
sender_edm_builder.add_sender_channel(sender_worker_writer_semaphore_id, 1, sender_worker_coords); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is changing num_eth_messages_to_forward
to 1 going to affect perf? Or is it just a semantic change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the code for readability. This just means the channel will actually forward data - the specific value itself doesn't matter so the API should be cleaned up. Thanks for the question.
@@ -302,6 +302,7 @@ operation::ProgramWithCallbacks all_gather_multi_core_with_workers_helper( | |||
worker_defines["INTERLEAVED_MEM_LAYOUT"] = "1"; | |||
} | |||
|
|||
constexpr ttnn::ccl::EriscDataMoverTerminationMode edm_termination_mode = ttnn::ccl::EriscDataMoverTerminationMode::WORKER_INITIATED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does all gather not use EriscDataMoverTerminationMode::MESSAGE_COUNT_REACHED
mode at all now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Both to simplify host code and to enable migration to fabric we won't be able to use a message_count type of mode anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just added some questions for understanding.
Hit some regressions and this work is being paused so I'll mark it draft. |
Ticket
Link to Github Issue
Problem description
This change isn't strictly required. However, making this change will simplify some follow-up changes.
What's changed
This is a purely tactical/refactoring change that will simplify some follow up all-gather changes. Namely:
Checklist