Quantize tensor is not enough #4

dreambear1234 · 2020-01-14T09:44:07Z

Great job, you give the best quantization accuracy as I know.
I'm very interested in your paper and code, but I have some issue about this paper.
As far as my knowledge.
For a full quantized model, all tensor&weight&bias should be quantized.

from your code, I get some question.

bias is not quantized earlier. use scale_bias = scale_input * scale_weight maybe useful(some difference between float and int8 exist), but if bias value bigger than scale_bias*128(for int8 mode), this error will be very large. Don't you care about it ?
input/output tensor quantize is not enough in your model.
Did you use "QuantAct" as input/output quantize?
from https://github.com/amirgholami/ZeroQ/blob/master/utils/quantize_model.py#L46
only tensor after relu/relu6 are quantized.
for mobilenet-v2 network,
a. you lose quantization for first input quantization;
b. In every Bottleneck, the third convolution has no relu/relu6, quantization is lost;
c. all bypass add structure has no quantization. so add cannot run as a quantize operation.
BatchNorm should be fused with convolution ?
In most cnn engine, batchnorm is fused in convolution or fully_connect before inference, so it is necessary to fuse batchnorm with conv/fc before quantization. I have no idea whether final accuracy will increase or decrease after fuse.
Act_max and Act_min values ?
According to my test(for model mobilenet-v2), after https://github.com/amirgholami/ZeroQ/blob/master/uniform_test.py#L89
all Act_max is 6, all Act_min is 0, it is just the range of relu6, for most other method who calculate act_min&act_max value, 0 & 6 is the most common case, I didn't find any difference from result.

Thank you for your reply.

yaohuicai · 2020-01-18T02:20:21Z

Hi,

Thanks for your interest in our work and your detailed questions. It should be noted that currently we are not doing fully quantized model since a fully quantized model needs to match specific hardware details which may vary significantly across CPUs, GPUs or FPGAs. As a general quantization method, we believe model size is important to the memory bottleneck and multiplications are the main part of the computation, so we tried to compress all weights in a neural network and make all the multiplications performed in low-precision. Please see the detailed answers below:

Typically the models that we studied do not use bias. Leaving bias not quantized will not cause floating-point multiplications as there are ReLUs after convolutions where bias is merged into the activations.
(a) The input images are ranged from 0-255 which are already quantized versions, so there is no need to quantize them. (b) After adding the result of the third convolution to the result from the other branch, the final results will be passed to ReLU6 and then quantized after ReLU6. Therefore, it only incurs floating-point addition. (c) The same reason as (b).
Here we did not fuse batch norm and convolutional layer as the batch normalization is essentially a linear transformation, which can be fused into the scaling factor in the quantization operation and will not hurt the accuracy.
We also observed the same phenomenon which we believe is normal.

jakc4103 · 2020-02-06T10:57:54Z

(b) After adding the result of the third convolution to the result from the other branch, the final results will be passed to ReLU6 and then quantized after ReLU6

I think there is no ReLU6 append after elementwise addition in mobilenetv2. Quantization might be lost as @dreambear1234 said.

liming312 · 2020-06-26T19:23:21Z

Any more comments?

Amin-Azar · 2021-03-09T09:08:35Z

Thanks for the nice paper. I am assuming this is not the complete code of the paper? All of the things @dreambear1234 said are still valid and a concern.

@yaohuicai, how are you doing multiplications in low-precision? Based on this code ( https://github.com/amirgholami/ZeroQ/blob/master/classification/utils/quantize_model.py#L45 ) , only acts after ReLU is quantized and you lose quantization on Conv and FC layers. I guess your answer to Q3 is simply not correct. We have tried it and fused BN is affecting the accuracy in quantized models considerably.
Please update us on whether there is an updated code available, otherwise, I believe this code won't give a fair comparison against other methods. We are planning to move only the distillation code to the distiller framework, have the proper quantization on activations, do batch norm folding, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantize tensor is not enough #4

Quantize tensor is not enough #4

dreambear1234 commented Jan 14, 2020 •

edited

Loading

yaohuicai commented Jan 18, 2020

jakc4103 commented Feb 6, 2020

liming312 commented Jun 26, 2020

Amin-Azar commented Mar 9, 2021 •

edited

Loading

Quantize tensor is not enough #4

Quantize tensor is not enough #4

Comments

dreambear1234 commented Jan 14, 2020 • edited Loading

yaohuicai commented Jan 18, 2020

jakc4103 commented Feb 6, 2020

liming312 commented Jun 26, 2020

Amin-Azar commented Mar 9, 2021 • edited Loading

dreambear1234 commented Jan 14, 2020 •

edited

Loading

Amin-Azar commented Mar 9, 2021 •

edited

Loading