This repository contains supplementary code for the paper MUSCO: Multi-Stage COmpression of neural networks. It demonstrates how a neural network with convolutional and fully connected layers can be compressed using iterative tensor decomposition of weight tensors.
numpy
scipy
scikit-tensor-py3
tensorly-musco
absl-py
tqdm
tensorflow-gpu (TensorRT support)
pip install musco-tf
from musco.tf import CompressorVBMF, Optimizer
model = load_model("model.h5")
compressor = CompressorVBMF(model)
while True:
model = compressor.compress_iteration(number=5)
# Fine-tune compressed model.
# Compressor decomposes 5 layers on each iteration
# and returns compressed model. You have to fine-tune
# model after each iteration to restore accuracy.
# Compressor automatically selects the best parameters
# for decomposition on each iteration.
# You can freeze and quantize model after compression.
optimizer = Optimizer(precision="FP16", max_batch_size=16)
optimizer.freeze(model)
optimizer.optimize("frozen.pb")
If you used our research, we kindly ask you to cite the corresponding paper.
@article{gusak2019one,
title={MUSCO: Multi-Stage Compression of neural networks},
author={Gusak, Julia and Kholiavchenko, Maksym and Ponomarev, Evgeny and Markeeva, Larisa and Oseledets, Ivan and Cichocki, Andrzej},
journal={arXiv preprint arXiv:1903.09973},
year={2019}
}
Project is distributed under Apache License 2.0.