Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to evaluate AWQ ? #1980

Open
chunniunai220ml opened this issue Aug 14, 2024 · 7 comments
Open

how to evaluate AWQ ? #1980

chunniunai220ml opened this issue Aug 14, 2024 · 7 comments
Assignees

Comments

@chunniunai220ml
Copy link

https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples

how to set eval_func?

https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only/run_clm_no_trainer.py

it seems no AWQ quantization, just RTN , GPTQ . and as readme.md said, weight-only id fake quantization, why save qmodel (user_model.save(args.output_dir) )?

@Kaihui-intel Kaihui-intel self-assigned this Aug 15, 2024
@Kaihui-intel
Copy link
Contributor

@chunniunai220ml
Copy link
Author

Hello, @chunniunai220ml Thanks for your interest in Intel(R) Neural Compressor. https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples This document describes the 2. x API. 2.x example link is https://github.com/intel/neural-compressor/tree/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm

Thank for your reply, i followed 2.x example link , bash script as follow:
python -u run_clm_no_trainer.py
--model $model_path
--dataset ${DATASET_NAME}
--approach weight-only
--output_dir ${tuned_checkpoint}
--quantize
--batch_size ${batch_size}
--woq_algo AWQ
--calib_iters 128
--woq_group_size 128
--woq_bits 4
--tasks hellaswag
--accuracy
https://github.com/intel/neural-compressor/blob/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm/run_clm_no_trainer.py#L355, it seems just evaluate original model instead of qmodel.
if i want to evaluate qmodel, can i just modify #L355 as
q_model.eval()
eval_args = LMEvalParser(
model="hf",
user_model=q_model, #user_model,
tokenizer=tokenizer,
batch_size=args.batch_size,
tasks=args.tasks,}

as readme.md said, Weight-only quantization based on fake quantization, why save qmodel in #L338? i think the qmodel weights dtype is not INT4 in storage.
and the run_clm_no_trainer.py only supprt cpu, where is muti-GPU supported codes?

@Kaihui-intel
Copy link
Contributor

Kaihui-intel commented Aug 15, 2024

sure, the q_model need to export a compressed model https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#export-compressed-model

you can refer to https://github.com/intel/intel-extension-for-transformers/tree/v1.5/examples/huggingface/pytorch/text-generation/quantization v1.5 to quantize int4 model, it has integrated this export compressed model.
It also includes GPU scripts.

3.x API is stay-tuned.

@chunniunai220ml
Copy link
Author

chunniunai220ml commented Aug 15, 2024

sure, the q_model need to export a compressed model https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#export-compressed-model

you can refer to https://github.com/intel/intel-extension-for-transformers/tree/v1.5/examples/huggingface/pytorch/text-generation/quantization v1.5 to quantize int4 model, it has integrated this export compressed model. It also includes GPU scripts.

3.x API is stay-tuned.

does it works well on nvidia V100? the readme,md seems only describe intel-gpu installation

besides, when run on CPU, it's stranged that the codes always killed for no reason after processing several blocks

@Kaihui-intel
Copy link
Contributor

Kaihui-intel commented Aug 16, 2024

I suggest you try using 3.x api, q_model is the export compressed model.

We will soon update the example of 3. x, which supports detection of auto-device.
https://github.com/intel/neural-compressor/tree/kaihui/woq_3x_eg
But we haven't tested the performance on nv GPUs.

on dev branch:
https://github.com/intel/neural-compressor/tree/kaihui/woq_3x_eg/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only

@chunniunai220ml
Copy link
Author

chunniunai220ml commented Aug 16, 2024

I suggest you try using 3.x api, q_model is the export compressed model.

We will soon update the example of 3. x, which supports detection of auto-device. https://github.com/intel/neural-compressor/tree/kaihui/woq_3x_eg But we haven't tested the performance on nv GPUs.

on dev branch: https://github.com/intel/neural-compressor/tree/kaihui/woq_3x_eg/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only

i git kaihui/woq_3x_eg branch , and run :
CUDA_VISIBLE_DEVICES="2" python run_clm_no_trainer.py
--model $model_path
--woq_algo AWQ
--woq_bits 4
--woq_group_size 128
--calib_iters 128
--woq_scheme asym
--quantize
--batch_size 1
--tasks wikitext
--accuracy
AutoModelForCausalLM.from_pretrained(debice='cuda')
neural-compressor/neural_compressor/torch/algorithms/weight_only/awq.py line 240, in block_calibration:
model(*args, **kwargs),the inputs device is cpu, so bug reported:
: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

but another bug in eval:
from intel_extension_for_transformers.transformers.llm.evaluation.lm_eval import evaluate, LMEvalParser
File "/*/anaconda3/lib/python3.11/site-packages/intel_extension_for_transformers/transformers/init.py", line 19, in
from .config import (
File "/8/anaconda3/lib/python3.11/site-packages/intel_extension_for_transformers/transformers/config.py", line 21, in
from neural_compressor.conf.config import (
ModuleNotFoundError: No module named 'neural_compressor.conf'

and, how to load saved_results/quantmodel.pt to evaluate?

@pengxin99
Copy link
Contributor

Hi, @chunniunai220ml, try with the old version like 2.6 may solve this issue:
ModuleNotFoundError: No module named 'neural_compressor.conf'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants