Code of paper 《Automatic Generation of Interactive Nonlinear Video for Online Apparel Shopping Navigation》.
Notably, using interactive video to implement shopping navigation is a new research topic, and our solution is of course not perfect. Please refer to our failure cases and limitations before using this repo.
- [2022-10-27] Add example files, in examples/.
- [2022-10-25] Update README, to introduce out work.
- [2022-10-19] Create the project, and upload main codes of this paper.
We present an automatic generation pipeline of interactive nonlinear video for online apparel shopping navigation. Our approach is inspired by Google’s "Messy Middle" theory, which suggests that people mentally have two tasks of exploration and evaluation before purchasing. Given a set of apparel product presentation videos, our navigation UI will organize these videos for users’ product exploration and automatically generate interactive videos for user product evaluation. To support automatic methods, we propose a video clustering similarity (CSIM) and a camera movement similarity (MSIM), as well as a comparative video generation algorithm for product recommendation, presentation, and comparison. To evaluate our pipeline’s effectiveness, we conducted several user studies. The results show that our pipeline can help users complete the consumption process more efficiently, making it easier for users to understand and choose the product.
Prior work has studied the behavioral logic of consumers and a "messy middle" theory was proposed, which noted that consumers often wander in the two states of exploration and evaluation when shopping online. Consumers explore their options and expand their consideration sets; then – either sequentially or simultaneously – they evaluate the options and narrow down their choices. Existing online shopping methods need to constantly switch pages to view and compare products. Such a shopping method reduces the exploration and evaluation efficiency and increases the time for customers to make a decision.To shorten the shopping time between product exploration and decision-making, we propose an automatic approach for generating nonlinear videos into two-level, coarse-level exploration and fine-level evaluation, in support of online clothing shopping navigation. Our approach can automatically generate interactive nonlinear videos for product presentation and comparison based on consumers’ interactions.
pip install requirements.txt
- [Attribute and Category] Put in libs/Fashion/checkpoints/, to predict the attributes and categories of apparel products in videos.
- [Unet] Put in libs/unet_segmentation/, to segment the apparel products in videos.
- [Detail Classification] Put in libs/Detail/models/, to classify the detail shots.
- [Landmark] Put in libs/Landmark/models/, to predict the clothing landmarks in full-shots.
- [MaskROI] Put in libs/maskRoi/weights/, to predict the ROI of human in frames.
- [AlphaPose] Put in libs/AlphaPose/models/, to detect the keypoints of people in frames.
- [ML Models] Put in model/, to obtain advantage features, such as view, direction of people in frame.
In the video association algorithm, we sample the input product video at intervals of
To recommend similar products (in many product videos), you can use this command:
python examples/example_recommend.py
It may takes long time to extract features and build the products' graph. After calculating all features of all videos, we will save them in a .pickle file. If there are lots of video nodes, it will also take a lot of space to save the graph.
In the shot association algorithm, we automatically attach detail shots to the full shot. When consumers evaluate the product, they can click the area of interest in the video to obtain more targeted information. The algorithm can be divided into video shot classification, detailed shot classification, and keypoint detection.
To associate close-up shots and full-shot, you can use this command:
python examples/example_single_presentation.py
This example will only generate the keypoints' position in the full-shot, and the classification results of close-up. These results will be saved as a .yaml file. Through our player, we can play this .yaml file as an interactive video.
The comparison algorithm we propose is designed to help consumers compare multiple interested products simultaneously. When comparing products, consumers can select the view (such as whole, medium, close-up) and direction (such as left, right, front, back) of the clothing in the video. Based on shots filtered by these tags, we train a neural network to generate a comparative video according to the optical flow features of camera movement. The comparative video generation algorithm can be divided into shot label extraction, optical flow feature calculation, and shot sequence generation.
To generate comparative shot sequence, you can use this command:
cd examples/example_compare/
jupyter notebook
We prepare four examples in the folder.
-
[example_raft.ipynb]: The example of using raft to get optical features.
-
[example_optical_flow.ipynb]: The example of using opencv to get optical features.
-
[example_train_editor.ipynb]: The example of training our DCDP-CNN.
-
[example_cut_editor.ipynb]: The example of using DCDP-CNN to generate shot sequences.
We also provide many tools and examples, to help you use any unit of our methods, such as alphapose, yolo, clothing segmentation, clothing landmarks detection.
In examples/
and tools/
, we hope you can find useful code for your work.