Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NeuronX compiler: specify data type #2378

Open
CoolFish88 opened this issue Sep 11, 2024 · 1 comment
Open

NeuronX compiler: specify data type #2378

CoolFish88 opened this issue Sep 11, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@CoolFish88
Copy link

Description

Currently, the options for Transformers-NeuronX Engine in LMI don't include the possibility to specify the data type for compilation. It would be nice to have this parameter added to the set.

Will this change the current api? How?
Yes, a new parameter needs to be added and propagated to the neuron compiler

Who will benefit from this enhancement?
Everyone

References

  • list reference and related literature
  • list known implementations
@CoolFish88 CoolFish88 added the enhancement New feature or request label Sep 11, 2024
@tosterberg tosterberg self-assigned this Sep 11, 2024
@tosterberg
Copy link
Contributor

Thanks @CoolFish88 - Dtype as a parameter is available for Neuron model compilation and runtime in the form of option.dtype=bf16 in a serving.properties or OPTION_DTYPE=bf16. It does appear that the documentation is not clear on this fact, as it skips over common option and only outlines advanced options. I will make an update to the documentation in this regard. In the near term you can see the options that are available to you here https://github.com/deepjavalibrary/djl-serving/blob/master/engines/python/setup/djl_python/properties_manager/tnx_properties.py#L32-L35. This list will expand as the Neuron frameworks support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants