Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 2 7b chat model output quality is low #2093

Open
ghost opened this issue Jun 21, 2024 · 4 comments
Open

Llama 2 7b chat model output quality is low #2093

ghost opened this issue Jun 21, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@ghost
Copy link

ghost commented Jun 21, 2024

I have a finetuned llama 2 7B chat model which I am deploying to an endpoint using DJL container. After deploying when I tested the model, the model output quality has degraded (The output seems to be echoing same answer for some questions asked).

Before using DJL container, I was using TGI container and the model was working absolutely fine.
I understand there could be difference in the way of inferencing for both these containers but is there a way of overriding the inference code.
Following is the sample prompt that I am using to prompt the model:
"[INST] <>
Respond only with the answer and do not provide any explanation or additional text. If you don't know the answer to a question, please answer with 'I dont know'.Answer should be as short as possible.
<>
Below context is text extracted from a medical document. Answer the question asked based on the context given.
Context: {text}
Question: {question} [/INST]"

The model is finetuned on the above mentioned prompt so we need to inference in such a way that it comprehends this format of the prompt and gives the answer.

Any resources/suggestions would be really helpful.

@ghost ghost added the bug Something isn't working label Jun 21, 2024
@lanking520
Copy link
Contributor

Could you provide your deployment config? Trying to help here. Logs will also help

@ghost
Copy link
Author

ghost commented Jun 27, 2024

I had used a serving.properties file which has the following configurations
1 engine=MPI
2 option.task=text-generation
3 option.trust_remote_code=true
4 option.tensor_parallel_degree=1
5 option.model_id={{model_id}}
6 option.dtype=fp16
7 option.tgi_compat=true
8 option.rolling_batch=lmi-dist

My endpoint config is very simple:
{
"VariantName": "variant1",
"ModelName": model_name,
"InstanceType": "ml.g5.24xlarge",
"InitialInstanceCount": 1,
"ModelDataDownloadTimeoutInSeconds": 3600,
"ContainerStartupHealthCheckTimeoutInSeconds": 3600,
}
Also please note here, I am not facing any errors while deploying, the deployment is successful but the output formats are different.
Expected output according to the DJL documentation for TGI compatible output feature:
[
{
"generated_text": "Deep Learning is a really cool field"
}
]

What I am getting:
{
"generated_text": "Deep Learning is a really cool field"
}

ALso the quality of output degraded significantly with DJL container as compared to TGI container

@lanking520
Copy link
Contributor

could you share a sample prompt you use and parameters? And exepcted output if possbile?

@ghost
Copy link
Author

ghost commented Jun 28, 2024

I have mentioned the sample prompt in the issue description. Mentioning below again for reference:
"""[INST] <>
Respond only with the answer and do not provide any explanation or additional text. If you don't know the answer to a question, please answer with 'I dont know'.Answer should be as short as possible.
<>
Below context is text extracted from a medical document. Answer the question asked based on the context given.
Context: {text}
Question: {question} [/INST]"""

Expected output
if question is: What is patient name ?
Model response : [{'generated_text : 'John H'}]

I am using a fine tuned model which is trained on the above mentioned format of prompt and answer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant