Llama 2 7b chat model output quality is low #2093

ghost · 2024-06-21T08:53:23Z

I have a finetuned llama 2 7B chat model which I am deploying to an endpoint using DJL container. After deploying when I tested the model, the model output quality has degraded (The output seems to be echoing same answer for some questions asked).

Before using DJL container, I was using TGI container and the model was working absolutely fine.
I understand there could be difference in the way of inferencing for both these containers but is there a way of overriding the inference code.
Following is the sample prompt that I am using to prompt the model:
"[INST] <>
Respond only with the answer and do not provide any explanation or additional text. If you don't know the answer to a question, please answer with 'I dont know'.Answer should be as short as possible.
<>
Below context is text extracted from a medical document. Answer the question asked based on the context given.
Context: {text}
Question: {question} [/INST]"

The model is finetuned on the above mentioned prompt so we need to inference in such a way that it comprehends this format of the prompt and gives the answer.

Any resources/suggestions would be really helpful.

lanking520 · 2024-06-25T17:23:25Z

Could you provide your deployment config? Trying to help here. Logs will also help

ghost · 2024-06-27T14:35:34Z

I had used a serving.properties file which has the following configurations
1 engine=MPI
2 option.task=text-generation
3 option.trust_remote_code=true
4 option.tensor_parallel_degree=1
5 option.model_id={{model_id}}
6 option.dtype=fp16
7 option.tgi_compat=true
8 option.rolling_batch=lmi-dist

My endpoint config is very simple:
{
"VariantName": "variant1",
"ModelName": model_name,
"InstanceType": "ml.g5.24xlarge",
"InitialInstanceCount": 1,
"ModelDataDownloadTimeoutInSeconds": 3600,
"ContainerStartupHealthCheckTimeoutInSeconds": 3600,
}
Also please note here, I am not facing any errors while deploying, the deployment is successful but the output formats are different.
Expected output according to the DJL documentation for TGI compatible output feature:
[
{
"generated_text": "Deep Learning is a really cool field"
}
]

What I am getting:
{
"generated_text": "Deep Learning is a really cool field"
}

ALso the quality of output degraded significantly with DJL container as compared to TGI container

lanking520 · 2024-06-28T07:05:30Z

could you share a sample prompt you use and parameters? And exepcted output if possbile?

ghost · 2024-06-28T09:11:46Z

I have mentioned the sample prompt in the issue description. Mentioning below again for reference:
"""[INST] <>
Respond only with the answer and do not provide any explanation or additional text. If you don't know the answer to a question, please answer with 'I dont know'.Answer should be as short as possible.
<>
Below context is text extracted from a medical document. Answer the question asked based on the context given.
Context: {text}
Question: {question} [/INST]"""

Expected output
if question is: What is patient name ?
Model response : [{'generated_text : 'John H'}]

I am using a fine tuned model which is trained on the above mentioned format of prompt and answer

ghost added the bug Something isn't working label Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 2 7b chat model output quality is low #2093

Llama 2 7b chat model output quality is low #2093

ghost commented Jun 21, 2024

lanking520 commented Jun 25, 2024

ghost commented Jun 27, 2024

lanking520 commented Jun 28, 2024

ghost commented Jun 28, 2024

Llama 2 7b chat model output quality is low #2093

Llama 2 7b chat model output quality is low #2093

Comments

ghost commented Jun 21, 2024

lanking520 commented Jun 25, 2024

ghost commented Jun 27, 2024

lanking520 commented Jun 28, 2024

ghost commented Jun 28, 2024