You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While this works well for many scenarios, it poses challenges for models that require multiple, disparate inputs that cannot be easily concatenated into a single tensor. For example, in multi-modal learning tasks, we might need to process point cloud data [b,n,c] and image data [b,c1,h,w] simultaneously.
Current Workarounds and Their Limitations
Concatenating inputs: Not suitable for inputs with different dimensions or semantic meanings.
Using a dictionary input: Breaks compatibility with the current BaseModel interface.
Creating a custom data structure: Also breaks compatibility and requires significant changes to existing codebases.
Setting inputs before calling forward: Requires modifying training loops and doesn't align with PyTorch's typical usage patterns.
Feature Request
We propose extending the BaseModel to support multiple input tensors in a way that maintains backward compatibility. This could potentially be achieved by:
Allowing inputs to be a tuple or list of tensors.
Adding an optional parameter for additional inputs.
Creating a new base class specifically for multi-input models.
Benefits
Improved flexibility for complex model architectures.
Better support for multi-modal learning tasks.
Easier integration of models with multiple input types.
Maintains consistency with PyTorch's typical usage patterns.
Questions for Discussion
What is the best way to implement this feature while maintaining backward compatibility?
Are there any potential drawbacks or performance implications to consider?
How might this change affect other parts of the mmengine ecosystem?
We appreciate your consideration of this feature request and look forward to any feedback or discussion on this topic.
Any other context?
No response
The text was updated successfully, but these errors were encountered:
There appears to be an inconsistency between the type annotation of the inputs parameter in the BaseModel.forward() method and how it's actually used in other parts of the code.
In BaseModel.forward(), the inputs parameter is annotated as torch.Tensor:
This change would make the type annotation consistent with how inputs is actually used in the codebase.
Additional Context
This issue was discovered while reviewing the BaseModel class implementation in mmengine/model/base_model.py. The inconsistency could potentially lead to type checking errors or unexpected behavior when using static type checkers or IDEs with type inference.
Thank you for your attention to this matter. We appreciate your work on mmengine and are happy to provide any additional information or clarification if needed.
What is the feature?
Description
The current implementation of
BaseModel
in mmengine assumes a singleinputs
parameter of typetorch.Tensor
in theforward
method:While this works well for many scenarios, it poses challenges for models that require multiple, disparate inputs that cannot be easily concatenated into a single tensor. For example, in multi-modal learning tasks, we might need to process point cloud data
[b,n,c]
and image data[b,c1,h,w]
simultaneously.Current Workarounds and Their Limitations
BaseModel
interface.Feature Request
We propose extending the
BaseModel
to support multiple input tensors in a way that maintains backward compatibility. This could potentially be achieved by:inputs
to be a tuple or list of tensors.Benefits
Questions for Discussion
We appreciate your consideration of this feature request and look forward to any feedback or discussion on this topic.
Any other context?
No response
The text was updated successfully, but these errors were encountered: