[QUESTION] is there any restriction to use allgather with moe_expert_capacity_factor? #1277

Louis-J · 2024-11-07T06:39:16Z

Your question
Ask a clear and concise question about Megatron-LM.

There is an assert in megatron/core/transformer/transformer_config.py: 401

        if self.moe_expert_capacity_factor is not None:
           ** if self.moe_token_dispatcher_type not in ["alltoall", "alltoall_seq"]:**
                raise ValueError(
                    'moe_expert_capacity_factor only works with alltoall token dispatcher'
                )

The code to process with capacity_factor and pad in router.py seems it won't change the output tensor's dimsize. And I don't see any different process to do with capacity_factor in token_dispatcher.py. So why should I use only 'alltoall' or 'alltoall_seq' with moe_expert_capacity_factor?

thanks for reply.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] is there any restriction to use allgather with moe_expert_capacity_factor? #1277

[QUESTION] is there any restriction to use allgather with moe_expert_capacity_factor? #1277

Louis-J commented Nov 7, 2024

[QUESTION] is there any restriction to use allgather with moe_expert_capacity_factor? #1277

[QUESTION] is there any restriction to use allgather with moe_expert_capacity_factor? #1277

Comments

Louis-J commented Nov 7, 2024