You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fails for both Mistral-7B-Instruct-v0.2 and intfloat/e5-mistral-7b-instruct
Only fails with tp_degree=1, good for 2 <= tp_degree <=numOfCores().
transformer.neuronx doc says supports trivial case tp_degree=1, so I'd like to understand why this fails:
Currently, the Neuron runtime supports tensor-parallelism degrees 1, 2, 8, and 32 on Trn1 and supports tensor-parallelism degrees 1, 2, 4, 8, and 24 on Inf2.
2024-10-31 20:41:40.000807: 2106605 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000363: 2107021 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000424: 2107022 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000519: 2107023 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000537: 2107024 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000609: 2107022 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000607: 2107022 ERROR ||NEURON_CC_WRAPPER||: Got a cached failed neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_559cd11e4fcf4be622be+4497a662/model.neff. Will skip compilation, please set --retry_failed_compilation for recompilation:
Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/f84e5f80-8279-4c58-80c2-a1186641dfbf/model.MODULE_559cd11e4fcf4be622be+4497a662.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/f84e5f80-8279-4c58-80c2-a1186641dfbf/model.MODULE_559cd11e4fcf4be622be+4497a662.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-10-31T18:44:12Z [NLA001] Unhandled exception with message: === BIR verification failed ===
Reason: Invalid access of 2 partitions starting at partition 34
Instruction: I-29351-1_TSPAddAddr
Opcode: TensorScalarPtr
Instruction Source: (I-29351-1_TSPAddAddr)
Input index: 0
Argument AP:
Access Pattern: [[1,2],[1,1],[1,1]]
Offset: 2
Memory Location: {_scatter.2175.39316}@SB<32,25224>(16x4)#Internal DebugInfo: <_scatter.2175.39316||UNDEF||[16, 1, 1]>
- Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
.
2024-10-31 20:42:05.000599: 2107025 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000688: 2107026 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000747: 2107027 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000869: 2107028 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000071: 2107024 ERROR ||NEURON_CC_WRAPPER||: Got a cached failed neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_0d4c413eb11699570bf4+4497a662/model.neff. Will skip compilation, please set --retry_failed_compilation for recompilation:
Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/42676261-93c2-45f4-a5c1-bf31a687e166/model.MODULE_0d4c413eb11699570bf4+4497a662.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/42676261-93c2-45f4-a5c1-bf31a687e166/model.MODULE_0d4c413eb11699570bf4+4497a662.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-10-31T18:44:16Z [NLA001] Unhandled exception with message: === BIR verification failed ===
Reason: Invalid access of 2 partitions starting at partition 98
Instruction: I-28849-1_TSPAddAddr
Opcode: TensorScalarPtr
Instruction Source: (I-28849-1_TSPAddAddr)
Input index: 0
Argument AP:
Access Pattern: [[1,2],[1,1],[1,1]]
Offset: 2
Memory Location: {_scatter.1233.39310}@SB<96,16520>(16x4)#Internal DebugInfo: <_scatter.1233.39310||UNDEF||[16, 1, 1]>
- Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
.
2024-10-31 20:42:06.000071: 2107025 ERROR ||NEURON_CC_WRAPPER||: Got a cached failed neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_d3007680bbf6cce0a595+4497a662/model.neff. Will skip compilation, please set --retry_failed_compilation for recompilation:
Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/bed88bad-46a5-484b-a976-d7393924cbc0/model.MODULE_d3007680bbf6cce0a595+4497a662.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/bed88bad-46a5-484b-a976-d7393924cbc0/model.MODULE_d3007680bbf6cce0a595+4497a662.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-10-31T18:44:17Z [NLA001] Unhandled exception with message: === BIR verification failed ===
Reason: Invalid access of 2 partitions starting at partition 98
Instruction: I-29289-1_TSPAddAddr
Opcode: TensorScalarPtr
Instruction Source: (I-29289-1_TSPAddAddr)
Input index: 0
Argument AP:
Access Pattern: [[1,2],[1,1],[1,1]]
Offset: 2
Memory Location: {_scatter.1861.39470}@SB<96,16520>(16x4)#Internal DebugInfo: <_scatter.1861.39470||UNDEF||[16, 1, 1]>
- Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
.
2024-10-31 20:42:06.000071: 2107023 ERROR ||NEURON_CC_WRAPPER||: Got a cached failed neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_d178bf000d9abf4dd214+4497a662/model.neff. Will skip compilation, please set --retry_failed_compilation for recompilation:
Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/7aa8b9c9-3fd7-4e20-b9a4-de7371fae903/model.MODULE_d178bf000d9abf4dd214+4497a662.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/7aa8b9c9-3fd7-4e20-b9a4-de7371fae903/model.MODULE_d178bf000d9abf4dd214+4497a662.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-10-31T18:44:15Z [NLA001] Unhandled exception with message: === BIR verification failed ===
Reason: Invalid access of 2 partitions starting at partition 34
Instruction: I-28091-1_TSPAddAddr
Opcode: TensorScalarPtr
Instruction Source: (I-28091-1_TSPAddAddr)
Input index: 0
Argument AP:
Access Pattern: [[1,2],[1,1],[1,1]]
Offset: 2
Memory Location: {_scatter.448.38758}@SB<32,25840>(16x4)#Internal DebugInfo: <_scatter.448.38758||UNDEF||[16, 1, 1]>
- Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
.
2024-10-31 20:42:06.000073: 2107024 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000073: 2107025 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000073: 2107023 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000108: 2107029 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000248: 2107021 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_69e0b8755c46e7a87e83+4497a662/model.neff. Exiting with a successfully compiled graph.
2024-10-31 20:42:06.000253: 2107021 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000256: 2107026 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_e74e812529bec5753a4f+4497a662/model.neff. Exiting with a successfully compiled graph.
2024-10-31 20:42:06.000261: 2107026 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000264: 2107027 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_166f1194574259e8acf2+4497a662/model.neff. Exiting with a successfully compiled graph.
2024-10-31 20:42:06.000269: 2107027 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000669: 2107028 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_0e03025bf8624037ea45+4497a662/model.neff. Exiting with a successfully compiled graph.
2024-10-31 20:42:06.000675: 2107028 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000680: 2107029 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_aa51fc17d48f04a07d55+4497a662/model.neff. Exiting with a successfully compiled graph.
2024-10-31 20:42:06.000698: 2107029 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000984: 2107030 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:07.000840: 2107030 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_67975c331a8b20e8cf83+4497a662/model.neff. Exiting with a successfully compiled graph.
2024-10-31 20:42:07.000862: 2107030 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
The text was updated successfully, but these errors were encountered:
weiliw-amz
changed the title
intfloat/e5-mistral-7b-instruct Compilation Failure with tp_degree=1
Mistral-7B Compilation Failure with tp_degree=1
Oct 31, 2024
Thank you for reporting this issue @weiliw-amz. Our team is looking into this and will let you know if any update or if we need any further information from you.
Fails for both
Mistral-7B-Instruct-v0.2
andintfloat/e5-mistral-7b-instruct
Only fails with
tp_degree=1
, good for2 <= tp_degree <=numOfCores()
.transformer.neuronx doc says supports trivial case
tp_degree=1
, so I'd like to understand why this fails:Versions:
Download model:
Compile commands:
Error log:
The text was updated successfully, but these errors were encountered: