Official TensorFlow implementation of AICT: AN ADAPTIVE IMAGE COMPRESSION TRANSFORMER (Accepted in IEEE ICIP 2023).
Swin Transformer
ConvNeXt
Adaptive Resolution
Neural Codecs
Image Compression
TensorFlow
Please do not hesitate to open an issue to inform of any problem you may find within this repository. Also, you can email me for questions or comments.
- This repository is built upon the official TensorFlow implementation of Channel-Wise Autoregressive Entropy Models for Learned Image Compression. This baseline is referred to as Conv-ChARM.
- We provide lightweight versions of the models by removing the latent residual prediction (LRP) transform and slicing latent means and scales, as done in the Tensorflow reimplementation of SwinT-ChARM from the original paper TRANSFORMER-BASED TRANSFORM CODING.
- Refer to the ResizeCompression github repo, as the official implementation of the paper Estimating the Resize Parameter in End-to-end Learned Image Compression.
- Refer to the TensorFlow Compression (TFC) library to build your own ML models with end-to-end optimized data compression built in.
- Refer to the API documentation for a complete classes and functions description of the TensorFlow Compression (TFC) library.
Python >= 3.6
tensorflow_compression
tensorflow_datasets
tensorflow_addons
einops
All packages used in this repository are listed in requirements.txt. To install those, run:
pip install -r requirements.txt
aict-main
│
├── conv-charm.py # Conv-ChARM Model
├── swint-charm.py # SwinT-ChARM Model
├── ict.py # ICT Model
├── aict.py # AICT Model
│
├── layers/
│ └── convNext.py # ConvNeXt block layers
│ └── swins/ # Swin Transformer block layers
│ └── scaleAdaptation.py # Scale Adaptation module layers
│
├── utils.py # Utility functions
├── config.py # Architecture configurations
├── requirements.txt # Requirements
└── figures/ # Overall model diagram
Every model can be trained and evaluated individually using:
python aict.py train
python aict.py evaluate --test_dir [-I] --tfci_output_dir [-O] --png_output_dir [-P] --results_file [-R]
Table 1. BD-rate↓ (PSNR) performance of BPG (4:4:4), Conv-ChARM, SwinT-ChARM, ICT, and AICT compared to the VTM-18.0 for the four considered datasets.
Image Codec | Kodak | Tecnick | JPEG-AI | CLIC21 | Average |
---|---|---|---|---|---|
BPG444 | 22.28% | 28.02% | 28.37% | 28.02% | 26.67% |
Conv-ChARM | 2.58% | 3.72% | 9.66% | 2.14% | 4.53% |
SwinT-ChARM | -1.92% | -2.50% | 2.91% | -3.22% | -1.18% |
ICT (ours) | -5.10% | -5.91% | -1.14% | -6.44% | -4.65% |
AICT (ours) | -5.09% | -5.99% | -2.03% | -7.33% | -5.11% |
If you use this library for research purposes, please cite:
@INPROCEEDINGS{10222799,
author={Ghorbel, Ahmed and Hamidouche, Wassim and Morin, Luce},
booktitle={2023 IEEE International Conference on Image Processing (ICIP)},
title={AICT: An Adaptive Image Compression Transformer},
year={2023},
volume={},
number={},
pages={126-130},
keywords={Video coding;Adaptation models;Visualization;Image coding;Codecs;Transform coding;Benchmark testing;Neural Image Compression;Adaptive Resolution;Spatio-Channel Entropy Modeling;Self-attention;Transformer},
doi={10.1109/ICIP49359.2023.10222799}
}
This project is licensed under the MIT License. See LICENSE for more details