Code repository for Rakuten Data Challenge : Multimodal Product Classification and Retrieval.
Team Transformer's solution : Deep Multi-level Boosted Fusion Learning Framework for Multi-modal Product Classification
Paper Link : https://sigir-ecom.github.io/ecom20DCPapers/SIGIR_eCom20_DC_paper_8.pdf
Data challenge link : https://sigir-ecom.github.io/data-task.html
In this paper, we present our approach for the ’Multimodal Product Classification’ task as a part of the 2020 SIGIR Workshop On eCommerce (ECOM20). The specific objective of this task is to build and submit systems that classify previously unseen products into their corresponding product type codes. We propose a deep Multi-Modal Multi-level Boosted Fusion Learning Framework used to categorize large-scale multi-modal (text and image) product data into product type codes. Our proposed final methodology achieved a macro F1- score of 91.94 on the phase 1 test dataset which is the top-scoring submission and third position on the scoreboard for phase 2 test dataset with macro F1-score of 90.53.
-
SEResnext50_train_predict.ipynb : Fine tune the pre-trained SEResnext50 model on Rakuten images
-
camembert_train_predict.ipynb : Fine tune the pre-trained Cammebert model on French text; Custom Cammbert model with vector output (used later for feature fusion)
-
flaubert_train_predict.ipynb : Fine tune the pre-trained Flaubert model on French text; Custom Flaubert model with vector output (used later for feature fusion)
- multi-modal_concatenate_fusion.ipynb : Concatenate the features extracted and train NN module on top
- Boosted Late-Fusion.ipynb : Train LightGBM model with class probability as input
Multi-modal Joint Representation Learning
Late Fusion Model