how to evaluate AWQ ? #1980

chunniunai220ml · 2024-08-14T11:52:34Z

https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples

how to set eval_func?

https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only/run_clm_no_trainer.py

it seems no AWQ quantization, just RTN , GPTQ . and as readme.md said, weight-only id fake quantization, why save qmodel (user_model.save(args.output_dir) )?

Kaihui-intel · 2024-08-15T08:39:16Z

Hello, @chunniunai220ml
Thanks for your interest in Intel(R) Neural Compressor.
https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples
This document describes the 2. x API.
2.x example link is
https://github.com/intel/neural-compressor/tree/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm

chunniunai220ml · 2024-08-15T13:04:51Z

Hello, @chunniunai220ml Thanks for your interest in Intel(R) Neural Compressor. https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples This document describes the 2. x API. 2.x example link is https://github.com/intel/neural-compressor/tree/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm

Thank for your reply, i followed 2.x example link , bash script as follow:
python -u run_clm_no_trainer.py
--model $model_path
--dataset ${DATASET_NAME}
--approach weight-only
--output_dir ${tuned_checkpoint}
--quantize
--batch_size ${batch_size}
--woq_algo AWQ
--calib_iters 128
--woq_group_size 128
--woq_bits 4
--tasks hellaswag
--accuracy
https://github.com/intel/neural-compressor/blob/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm/run_clm_no_trainer.py#L355, it seems just evaluate original model instead of qmodel.
if i want to evaluate qmodel, can i just modify #L355 as
q_model.eval()
eval_args = LMEvalParser(
model="hf",
user_model=q_model, #user_model,
tokenizer=tokenizer,
batch_size=args.batch_size,
tasks=args.tasks,}

as readme.md said, Weight-only quantization based on fake quantization, why save qmodel in #L338? i think the qmodel weights dtype is not INT4 in storage.
and the run_clm_no_trainer.py only supprt cpu, where is muti-GPU supported codes?

Kaihui-intel · 2024-08-15T13:55:49Z

sure, the q_model need to export a compressed model https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#export-compressed-model

you can refer to https://github.com/intel/intel-extension-for-transformers/tree/v1.5/examples/huggingface/pytorch/text-generation/quantization v1.5 to quantize int4 model, it has integrated this export compressed model.
It also includes GPU scripts.

3.x API is stay-tuned.

chunniunai220ml · 2024-08-15T15:06:26Z

sure, the q_model need to export a compressed model https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#export-compressed-model

you can refer to https://github.com/intel/intel-extension-for-transformers/tree/v1.5/examples/huggingface/pytorch/text-generation/quantization v1.5 to quantize int4 model, it has integrated this export compressed model. It also includes GPU scripts.

3.x API is stay-tuned.

does it works well on nvidia V100? the readme,md seems only describe intel-gpu installation

besides, when run on CPU, it's stranged that the codes always killed for no reason after processing several blocks

Kaihui-intel · 2024-08-16T03:18:05Z

I suggest you try using 3.x api, q_model is the export compressed model.

We will soon update the example of 3. x, which supports detection of auto-device.
https://github.com/intel/neural-compressor/tree/kaihui/woq_3x_eg
But we haven't tested the performance on nv GPUs.

on dev branch:
https://github.com/intel/neural-compressor/tree/kaihui/woq_3x_eg/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only

chunniunai220ml · 2024-08-16T06:46:00Z

I suggest you try using 3.x api, q_model is the export compressed model.

We will soon update the example of 3. x, which supports detection of auto-device. https://github.com/intel/neural-compressor/tree/kaihui/woq_3x_eg But we haven't tested the performance on nv GPUs.

on dev branch: https://github.com/intel/neural-compressor/tree/kaihui/woq_3x_eg/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only

i git kaihui/woq_3x_eg branch , and run :
CUDA_VISIBLE_DEVICES="2" python run_clm_no_trainer.py
--model $model_path
--woq_algo AWQ
--woq_bits 4
--woq_group_size 128
--calib_iters 128
--woq_scheme asym
--quantize
--batch_size 1
--tasks wikitext
--accuracy
AutoModelForCausalLM.from_pretrained(debice='cuda')
neural-compressor/neural_compressor/torch/algorithms/weight_only/awq.py line 240, in block_calibration:
model(*args, **kwargs),the inputs device is cpu, so bug reported:
: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

but another bug in eval:
from intel_extension_for_transformers.transformers.llm.evaluation.lm_eval import evaluate, LMEvalParser
File "/*/anaconda3/lib/python3.11/site-packages/intel_extension_for_transformers/transformers/init.py", line 19, in
from .config import (
File "/8/anaconda3/lib/python3.11/site-packages/intel_extension_for_transformers/transformers/config.py", line 21, in
from neural_compressor.conf.config import (
ModuleNotFoundError: No module named 'neural_compressor.conf'

and, how to load saved_results/quantmodel.pt to evaluate?

pengxin99 · 2024-08-29T02:31:10Z

Hi, @chunniunai220ml, try with the old version like 2.6 may solve this issue:
ModuleNotFoundError: No module named 'neural_compressor.conf'

Kaihui-intel self-assigned this Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to evaluate AWQ ? #1980

how to evaluate AWQ ? #1980

chunniunai220ml commented Aug 14, 2024

Kaihui-intel commented Aug 15, 2024

chunniunai220ml commented Aug 15, 2024

Kaihui-intel commented Aug 15, 2024 •

edited

Loading

chunniunai220ml commented Aug 15, 2024 •

edited

Loading

Kaihui-intel commented Aug 16, 2024 •

edited

Loading

chunniunai220ml commented Aug 16, 2024 •

edited

Loading

pengxin99 commented Aug 29, 2024

how to evaluate AWQ ? #1980

how to evaluate AWQ ? #1980

Comments

chunniunai220ml commented Aug 14, 2024

Kaihui-intel commented Aug 15, 2024

chunniunai220ml commented Aug 15, 2024

Kaihui-intel commented Aug 15, 2024 • edited Loading

chunniunai220ml commented Aug 15, 2024 • edited Loading

Kaihui-intel commented Aug 16, 2024 • edited Loading

chunniunai220ml commented Aug 16, 2024 • edited Loading

pengxin99 commented Aug 29, 2024

Kaihui-intel commented Aug 15, 2024 •

edited

Loading

chunniunai220ml commented Aug 15, 2024 •

edited

Loading

Kaihui-intel commented Aug 16, 2024 •

edited

Loading

chunniunai220ml commented Aug 16, 2024 •

edited

Loading