Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistral-7B Compilation Failure with tp_degree=1 #1023

Open
weiliw-amz opened this issue Oct 31, 2024 · 2 comments
Open

Mistral-7B Compilation Failure with tp_degree=1 #1023

weiliw-amz opened this issue Oct 31, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@weiliw-amz
Copy link

weiliw-amz commented Oct 31, 2024

Fails for both Mistral-7B-Instruct-v0.2 and intfloat/e5-mistral-7b-instruct

Only fails with tp_degree=1, good for 2 <= tp_degree <=numOfCores().

transformer.neuronx doc says supports trivial case tp_degree=1, so I'd like to understand why this fails:

Currently, the Neuron runtime supports tensor-parallelism degrees 1, 2, 8, and 32 on Trn1 and supports tensor-parallelism degrees 1, 2, 4, 8, and 24 on Inf2.

Versions:

aws-neuronx-runtime-discovery 2.9
libneuronxla                  2.0.2335
neuronx-cc                    2.15.141.0+d3cfc8ca
neuronx-distributed           0.9.0
optimum-neuron                0.0.25
torch-neuronx                 2.1.2.2.3.1
transformers-neuronx          0.12.313

Download model:

mkdir /home/ubuntu/mistral_7b
cd /home/ubuntu/mistral_7b
git clone https://huggingface.co/intfloat/e5-mistral-7b-instruct

Compile commands:

from optimum.neuron import NeuronModelForCausalLM
from transformers import AutoTokenizer
from transformers import AutoModel

model_folder = "/home/ubuntu/mistral_7b"
model_task = 'text-generation'
model_name = 'e5-mistral-7b-instruct'
model_path = f"{model_folder}/{model_name}"

n_cores = 1
compiler_args = {"num_cores": n_cores, "auto_cast_type": 'fp16', "task": model_task}
input_shapes = {"batch_size": 1, "sequence_length": 2048}

model = NeuronModelForCausalLM.from_pretrained(model_path, export=True, **compiler_args, **input_shapes) 
model.save_pretrained(f"{model_path}-Neuron-{n_cores}")

Error log:

2024-10-31 20:41:40.000807:  2106605  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000363:  2107021  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000424:  2107022  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000519:  2107023  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000537:  2107024  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000609:  2107022  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000607:  2107022  ERROR ||NEURON_CC_WRAPPER||: Got a cached failed neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_559cd11e4fcf4be622be+4497a662/model.neff. Will skip compilation, please set --retry_failed_compilation for recompilation: 
 Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/f84e5f80-8279-4c58-80c2-a1186641dfbf/model.MODULE_559cd11e4fcf4be622be+4497a662.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/f84e5f80-8279-4c58-80c2-a1186641dfbf/model.MODULE_559cd11e4fcf4be622be+4497a662.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-10-31T18:44:12Z [NLA001]  Unhandled exception with message: === BIR verification failed ===
Reason: Invalid access of 2 partitions starting at partition 34
Instruction: I-29351-1_TSPAddAddr
Opcode: TensorScalarPtr
Instruction Source: (I-29351-1_TSPAddAddr)
Input index: 0
Argument AP:
Access Pattern: [[1,2],[1,1],[1,1]]
Offset: 2
Memory Location: {_scatter.2175.39316}@SB<32,25224>(16x4)#Internal DebugInfo: <_scatter.2175.39316||UNDEF||[16, 1, 1]>
 - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
.
2024-10-31 20:42:05.000599:  2107025  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000688:  2107026  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000747:  2107027  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:05.000869:  2107028  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000071:  2107024  ERROR ||NEURON_CC_WRAPPER||: Got a cached failed neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_0d4c413eb11699570bf4+4497a662/model.neff. Will skip compilation, please set --retry_failed_compilation for recompilation: 
 Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/42676261-93c2-45f4-a5c1-bf31a687e166/model.MODULE_0d4c413eb11699570bf4+4497a662.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/42676261-93c2-45f4-a5c1-bf31a687e166/model.MODULE_0d4c413eb11699570bf4+4497a662.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-10-31T18:44:16Z [NLA001]  Unhandled exception with message: === BIR verification failed ===
Reason: Invalid access of 2 partitions starting at partition 98
Instruction: I-28849-1_TSPAddAddr
Opcode: TensorScalarPtr
Instruction Source: (I-28849-1_TSPAddAddr)
Input index: 0
Argument AP:
Access Pattern: [[1,2],[1,1],[1,1]]
Offset: 2
Memory Location: {_scatter.1233.39310}@SB<96,16520>(16x4)#Internal DebugInfo: <_scatter.1233.39310||UNDEF||[16, 1, 1]>
 - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
.
2024-10-31 20:42:06.000071:  2107025  ERROR ||NEURON_CC_WRAPPER||: Got a cached failed neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_d3007680bbf6cce0a595+4497a662/model.neff. Will skip compilation, please set --retry_failed_compilation for recompilation: 
 Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/bed88bad-46a5-484b-a976-d7393924cbc0/model.MODULE_d3007680bbf6cce0a595+4497a662.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/bed88bad-46a5-484b-a976-d7393924cbc0/model.MODULE_d3007680bbf6cce0a595+4497a662.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-10-31T18:44:17Z [NLA001]  Unhandled exception with message: === BIR verification failed ===
Reason: Invalid access of 2 partitions starting at partition 98
Instruction: I-29289-1_TSPAddAddr
Opcode: TensorScalarPtr
Instruction Source: (I-29289-1_TSPAddAddr)
Input index: 0
Argument AP:
Access Pattern: [[1,2],[1,1],[1,1]]
Offset: 2
Memory Location: {_scatter.1861.39470}@SB<96,16520>(16x4)#Internal DebugInfo: <_scatter.1861.39470||UNDEF||[16, 1, 1]>
 - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
.
2024-10-31 20:42:06.000071:  2107023  ERROR ||NEURON_CC_WRAPPER||: Got a cached failed neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_d178bf000d9abf4dd214+4497a662/model.neff. Will skip compilation, please set --retry_failed_compilation for recompilation: 
 Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/7aa8b9c9-3fd7-4e20-b9a4-de7371fae903/model.MODULE_d178bf000d9abf4dd214+4497a662.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/7aa8b9c9-3fd7-4e20-b9a4-de7371fae903/model.MODULE_d178bf000d9abf4dd214+4497a662.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-10-31T18:44:15Z [NLA001]  Unhandled exception with message: === BIR verification failed ===
Reason: Invalid access of 2 partitions starting at partition 34
Instruction: I-28091-1_TSPAddAddr
Opcode: TensorScalarPtr
Instruction Source: (I-28091-1_TSPAddAddr)
Input index: 0
Argument AP:
Access Pattern: [[1,2],[1,1],[1,1]]
Offset: 2
Memory Location: {_scatter.448.38758}@SB<32,25840>(16x4)#Internal DebugInfo: <_scatter.448.38758||UNDEF||[16, 1, 1]>
 - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
.
2024-10-31 20:42:06.000073:  2107024  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000073:  2107025  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000073:  2107023  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000108:  2107029  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000248:  2107021  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_69e0b8755c46e7a87e83+4497a662/model.neff. Exiting with a successfully compiled graph.
2024-10-31 20:42:06.000253:  2107021  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000256:  2107026  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_e74e812529bec5753a4f+4497a662/model.neff. Exiting with a successfully compiled graph.
2024-10-31 20:42:06.000261:  2107026  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000264:  2107027  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_166f1194574259e8acf2+4497a662/model.neff. Exiting with a successfully compiled graph.
2024-10-31 20:42:06.000269:  2107027  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000669:  2107028  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_0e03025bf8624037ea45+4497a662/model.neff. Exiting with a successfully compiled graph.
2024-10-31 20:42:06.000675:  2107028  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000680:  2107029  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_aa51fc17d48f04a07d55+4497a662/model.neff. Exiting with a successfully compiled graph.
2024-10-31 20:42:06.000698:  2107029  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:06.000984:  2107030  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-31 20:42:07.000840:  2107030  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.15.141.0+d3cfc8ca/MODULE_67975c331a8b20e8cf83+4497a662/model.neff. Exiting with a successfully compiled graph.
2024-10-31 20:42:07.000862:  2107030  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
@weiliw-amz weiliw-amz changed the title intfloat/e5-mistral-7b-instruct Compilation Failure with tp_degree=1 Mistral-7B Compilation Failure with tp_degree=1 Oct 31, 2024
@delongmeng-aws
Copy link

Thank you for reporting this issue @weiliw-amz. Our team is looking into this and will let you know if any update or if we need any further information from you.

@delongmeng-aws
Copy link

Hi @weiliw-amz, we were able to reproduce the issue, and are further looking into the root cause and potential fix.

@aws-taylor aws-taylor added the bug Something isn't working label Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants