You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the options for Transformers-NeuronX Engine in LMI don't include the possibility to specify the data type for compilation. It would be nice to have this parameter added to the set.
Will this change the current api? How?
Yes, a new parameter needs to be added and propagated to the neuron compiler
Who will benefit from this enhancement?
Everyone
References
list reference and related literature
list known implementations
The text was updated successfully, but these errors were encountered:
Thanks @CoolFish88 - Dtype as a parameter is available for Neuron model compilation and runtime in the form of option.dtype=bf16 in a serving.properties or OPTION_DTYPE=bf16. It does appear that the documentation is not clear on this fact, as it skips over common option and only outlines advanced options. I will make an update to the documentation in this regard. In the near term you can see the options that are available to you here https://github.com/deepjavalibrary/djl-serving/blob/master/engines/python/setup/djl_python/properties_manager/tnx_properties.py#L32-L35. This list will expand as the Neuron frameworks support.
Description
Currently, the options for Transformers-NeuronX Engine in LMI don't include the possibility to specify the data type for compilation. It would be nice to have this parameter added to the set.
Will this change the current api? How?
Yes, a new parameter needs to be added and propagated to the neuron compiler
Who will benefit from this enhancement?
Everyone
References
The text was updated successfully, but these errors were encountered: