feat: add brevitas channel-wise support #807

fd0r · 2024-07-24T09:50:58Z

No description provided.

fd0r · 2024-07-25T12:05:37Z

src/concrete/ml/onnx/convert.py

+    if isinstance(dummy_input, torch.Tensor):
+        dummy_input = dummy_input.to("cpu")
+    else:
+        dummy_input = tuple(elt.to("cpu") for elt in dummy_input)
+
    # Export to ONNX
    torch.onnx.export(
-        torch_module,
+        torch_module.to("cpu"),
        dummy_input,
        str(output_onnx_file_path),
        opset_version=OPSET_VERSION_FOR_ONNX_EXPORT,


We need the module and the inputs to be on CPU for the exporter to work properly

fd0r · 2024-07-25T12:06:19Z

src/concrete/ml/pytest/torch_models.py

+class CommonIntWeightPerChannelQuant(Int8WeightPerTensorFloat):
+    """CommonIntWeightPerChannelQuant."""
+
+    scaling_per_output_channel = True


The per-channel quantizer from Brevitas

fd0r · 2024-07-25T12:06:43Z

src/concrete/ml/pytest/torch_models.py

+        Returns:
+            Neural network prediction
+        """
+        x = self.pool1(self.relu1(self.conv1(x)))
+        x = self.pool2(self.relu2(self.conv2(x)))
+        x = torch.flatten(x, 1)
+        x = self.relu3(self.fc1(x))
+        x = self.relu4(self.fc2(x))
+        x = self.fc3(x)
+        return x
+
+
+# pylint: disable-next=too-many-instance-attributes
+class QuantLeNet(FloatLeNet):
+    """Quantized LeNet with per-channel quantization."""
+
+    def __init__(
+        self,
+        weight_bit_width=4,
+        act_bit_width=4,
+        acc_bit_width=32,
+        weight_quant=CommonIntAccumulatorAwareWeightQuant,
+    ):
+        super().__init__()
+
+        self.conv1 = qnn.QuantConv2d(
+            bias=False,
+            in_channels=1,
+            out_channels=6,
+            kernel_size=5,
+            stride=1,
+            padding=0,
+            input_bit_width=act_bit_width,
+            input_quant=CommonUintActQuant,
+            weight_accumulator_bit_width=acc_bit_width,
+            weight_bit_width=weight_bit_width,
+            weight_restrict_scaling_type=RestrictValueType.LOG_FP,
+            weight_quant=weight_quant,
+        )
+        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
+        self.relu1 = qnn.QuantReLU(
+            inplace=True, act_quant=CommonUintActQuant, bit_width=act_bit_width
+        )
+
+        self.conv2 = qnn.QuantConv2d(
+            bias=False,
+            in_channels=6,
+            out_channels=16,
+            kernel_size=5,
+            stride=1,
+            padding=0,
+            input_bit_width=act_bit_width,
+            input_quant=CommonUintActQuant,
+            weight_accumulator_bit_width=acc_bit_width,
+            weight_bit_width=weight_bit_width,
+            weight_restrict_scaling_type=RestrictValueType.LOG_FP,
+            weight_quant=weight_quant,
+        )
+        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
+        self.relu2 = qnn.QuantReLU(
+            inplace=True, act_quant=CommonUintActQuant, bit_width=act_bit_width
+        )
+
+        self.fc1 = qnn.QuantLinear(
+            400,
+            120,
+            bias=True,
+            input_bit_width=act_bit_width,
+            input_quant=CommonUintActQuant,
+            weight_accumulator_bit_width=acc_bit_width,
+            weight_bit_width=weight_bit_width,
+            weight_restrict_scaling_type=RestrictValueType.LOG_FP,
+            weight_quant=weight_quant,
+        )
+        self.relu3 = qnn.QuantReLU(act_quant=CommonUintActQuant, bit_width=act_bit_width)
+        self.fc2 = qnn.QuantLinear(
+            120,
+            84,
+            bias=True,
+            input_bit_width=act_bit_width,
+            input_quant=CommonUintActQuant,
+            weight_accumulator_bit_width=acc_bit_width,
+            weight_bit_width=weight_bit_width,
+            weight_restrict_scaling_type=RestrictValueType.LOG_FP,
+            weight_quant=weight_quant,
+        )
+        self.relu4 = qnn.QuantReLU(act_quant=CommonUintActQuant, bit_width=act_bit_width)
+        self.fc3 = qnn.QuantLinear(
+            84,
+            10,
+            bias=True,
+            input_bit_width=act_bit_width,
+            input_quant=CommonUintActQuant,
+            weight_accumulator_bit_width=acc_bit_width,
+            weight_bit_width=weight_bit_width,
+            weight_restrict_scaling_type=RestrictValueType.LOG_FP,
+            weight_quant=weight_quant,
+        )
+
+        self.apply(weight_init)


A LeNet provided by a user

jfrery

Thanks a lot for tackling this.

My main comment is: How do we make sure per channel quantization is indeed activated and used in concrete-ml?

Currently we only test that a per channel brevitas quantization can be compiled in CML but are we sure we use all the scales properly?

I am thinking, maybe have a simple model with a single conv and make sure that the number of scales == number of channel?

jfrery · 2024-07-25T12:15:25Z

src/concrete/ml/quantization/quantized_ops.py

+        if q_input2.quantizer.scale.shape == tuple():
+            m_matmul = q_input1.quantizer.scale * q_input2.quantizer.scale
+        else:
+            # TODO: add assert on shapes


TODO to remove or convert in fixme

jfrery · 2024-07-25T12:15:51Z

src/concrete/ml/quantization/quantized_ops.py

+            m_matmul = q_input1.quantizer.scale * q_input2.quantizer.scale
+        else:
+            # TODO: add assert on shapes
+            weight_quant_scale = numpy.transpose(q_input2.quantizer.scale, axes=(1, 0))


Are these axes 1, 0 always going to be true?

agree, might be worth a comment if that's the case

yes always true, the assert will catch any issue

which assert ? 🤔

jfrery · 2024-07-25T12:16:33Z

src/concrete/ml/quantization/quantized_ops.py

+        if q_weights.quantizer.scale.shape == tuple():
+            m_matmul = q_input.quantizer.scale * q_weights.quantizer.scale
+        else:
+            # TODO: add assert on shapes


TODO to remove or convert in fixme + issue

src/concrete/ml/quantization/base_quantized_op.py

RomanBredehoft · 2024-07-25T14:09:04Z

src/concrete/ml/quantization/quantized_ops.py

+            # TODO: add assert on shapes
+            weight_quant_scale = numpy.transpose(
+                q_weights.quantizer.scale,
+                axes=(1, 0, 2, 3),


same answer, the assert above (to be added) will catch any issue but it should always have this shape for conv2d.

I'll add some errors in the case of conv1d or conv3d

RomanBredehoft · 2024-07-25T14:14:18Z

tests/torch/test_brevitas_qat.py

+        out_qm = quantized_module(images.detach().numpy())
+        mse = ((out - out_qm) ** 2).mean()
+        # Arbitrary threshold to check that the predictions are relatively similar
+        assert mse < 1e-4


probably better to use our tools like check_float_array_equal, which can provide rtol or atol, or check_r2_score or similar

if not, then probably better to make this mse check a fixture so that we can re-use it elsewhere and not forget about this arbitrary value here

yeah true check_float_array_equal would probably be better indeed. I'll change that.

jfrery

Removing my request as I will be off for some time. Good luck with the PR!

github-actions · 2024-07-31T13:06:31Z

Coverage passed ✅

Coverage details

---------- coverage: platform linux, python 3.8.18-final-0 -----------
Name    Stmts   Miss  Cover   Missing
-------------------------------------
TOTAL    8179      0   100%

60 files skipped due to complete coverage.

cla-bot bot added the cla-signed label Jul 24, 2024

fd0r force-pushed the channelwise_quantization_support branch 3 times, most recently from 7848d8e to 30a1791 Compare July 25, 2024 08:33

fd0r commented Jul 25, 2024

View reviewed changes

fd0r marked this pull request as ready for review July 25, 2024 12:07

fd0r requested a review from a team as a code owner July 25, 2024 12:07

jfrery requested changes Jul 25, 2024

View reviewed changes

RomanBredehoft reviewed Jul 25, 2024

View reviewed changes

src/concrete/ml/quantization/base_quantized_op.py Show resolved Hide resolved

RomanBredehoft reviewed Jul 25, 2024

View reviewed changes

jfrery previously approved these changes Jul 26, 2024

View reviewed changes

fd0r dismissed jfrery’s stale review via ce8de8a July 29, 2024 13:48

fd0r force-pushed the channelwise_quantization_support branch 5 times, most recently from a9ac260 to fad224d Compare July 29, 2024 21:51

feat: add brevitas channel-wise support

d645e82

fd0r force-pushed the channelwise_quantization_support branch from fad224d to d645e82 Compare July 31, 2024 11:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add brevitas channel-wise support #807

feat: add brevitas channel-wise support #807

fd0r commented Jul 24, 2024

fd0r Jul 25, 2024

fd0r Jul 25, 2024

fd0r Jul 25, 2024

jfrery left a comment

jfrery Jul 25, 2024

jfrery Jul 25, 2024

RomanBredehoft Jul 25, 2024

fd0r Jul 25, 2024

RomanBredehoft Jul 26, 2024

jfrery Jul 25, 2024

RomanBredehoft Jul 25, 2024

fd0r Jul 25, 2024

RomanBredehoft Jul 25, 2024

fd0r Jul 25, 2024

jfrery left a comment

github-actions bot commented Jul 31, 2024

feat: add brevitas channel-wise support #807

Are you sure you want to change the base?

feat: add brevitas channel-wise support #807

Conversation

fd0r commented Jul 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfrery left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfrery left a comment

Choose a reason for hiding this comment

github-actions bot commented Jul 31, 2024

Coverage passed ✅