| PREGO paper [CVPR 2024] | TI-PREGO paper [arXiv]
This repo hosts the official PyTorch implementations of the IEEE/CVF Computer Vision and Pattern Recognition (CVPR) '24 paper PREGO: online mistake detection in PRocedural EGOcentric videos and of the follow-up paper TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos.
PREGO is the first online one-class classification model for mistake detection in procedural egocentric videos. It uses an online action recognition component to model current actions and a symbolic reasoning module to predict next actions, detecting mistakes by comparing the recognized current action with the expected future one. We evaluate this on two adapted datasets, Assembly101-O and Epic-tent-O, for online benchmarking of procedural mistake detection.
[2024-12-01] Uploaded the recognition branch.
[2024-11-12] Uploaded the script for the prediction aggregation strategy described in [TI-PREGO].
[2024-11-12] Uploaded the TSN features for Assembly101-O and Epic-tent-O [GDrive].
[2024-11-04] Published the follow-up paper [TI-PREGO].
[2024-06-20] Presented PREGO at #CVPR2024.
[2024-06-16] Uploaded the anticipation branch.
The TSN features of the Assembly101-O and Epic-tent-O datasets can be downloaded here: [GDrive]. The folder follows the structure described in MiniROAD:
PREGO
|
|__________ Assembly101-O
| |
| |__________ rgb_anet_resnet50
| | |
| | |_________nusar-2021_action_both_9011-b06b_9011_user_id_2021-02-01_154253.npy
| | |_________...
| |__________ rgb_as_flow
| | |
| | |_________nusar-2021_action_both_9011-b06b_9011_user_id_2021-02-01_154253.npy
| | |_________...
| |__________ target_perframe
| |
| |_________nusar-2021_action_both_9011-b06b_9011_user_id_2021-02-01_154253.npy
| |_________...
|__________ Epic-tent-O
|
|__________ rgb_anet_resnet50
| |
| |_________annotations_1.npy
| |_________...
|__________ rgb_as_flow
| |
| |_________annotations_1.npy
| |_________...
|__________ target_perframe
|
|_________annotations_1.npy
|_________...
To run our anticipation step with LLAMA, you must be granted access to the models by Meta here.
Place them wherever you like, and recall to update the paths whenever necessary, as in step_anticipation/scripts/anticipation.sh
.
You can choose between creating a conda
or virtualenv
environment, as you prefer
# conda
conda create -n prego python=3.10
conda activate prego
# virtualenv
python3.10 -m venv .venv
source .venv/bin/activate
Then, install the requirements
pip install -r requirements.txt
Install unsloth
following the instructions here.
For more detaila regarding the Step Recognition branch, you can refer to the official implementation of MiniROAD here.
To run the training on Assembly101-O for example, use the command
python step_recognition/main.py --config step_recognition/configs/miniroad_assembly101-O.yaml
that will save the checkpoints in the folder step_recognition/checkpoint/miniROAD/Assembly101-O
.
At this point, you can use the checkpoint for evaluation and it will save predictions frame by frame as a JSON file in the folder output_miniROAD
using the command
python step_recognition/main.py --config step_recognition/configs/miniroad_assembly101-O.yaml --eval <checkpoint_path>
The utils/aggregate.py
script handles the data aggregation process.
This script is responsible for aggregating predictions and ground truth data and saving the results to a JSON file.
To run the data aggregation script, use the following command using as input the JSON that was created in the section Step Recognition:
python utils/aggregate.py <input_path> <output_path>
<input_path>
: Path to the input JSON file containing the data.<output_path>
: Path to save the aggregated JSON file.
python utils/aggregate.py data/input.json data/output/aggregated_data.json
Description of the steps needed to prepare the data for the Step Anticipation branch.
Step Recognition predictions:
- place the predictions (after aggregation) of the Step Recognizer in the
step_anticipation/data/predictions
- the file should have the following structure:
{
"nusar-2021_action_both_9044-a08_9044_user_id_2021-02-05_154403": {
"pred": [
39,
37,
74,
39,
37
],
"gt": [
37,
80,
39,
29,
85
]
},
...
}
Context prompt:
step_anticipation/data/context_prompt/assembly_context_prompt_train.json
andstep_anticipation/data/context_prompt/epictents_context_prompt_train.json
contain the context to be used for the In-context learning prompt.step_anticipation/data/context_prompt/context_prompt.json
contains the strings to fill the context prompt.
Description of the parameters that can be added to the step_anticipation/scripts/anticipation.sh
script.
ckpt_dir=/path/to//llama/llama-2-7b
tokenizer_path=/path/to/tokenizer/llama/tokenizer.model
max_seq_len=2048
Maximum sequence length for input textmax_batch_size
Maximum batch size for generating sequencestemperature
Temperature value for controlling randomness in generationmax_gen_len
Maximum length of the generated text sequence.num_samples
How many generations per each input contextuse_gt
Select if gt or predictions from Step Recognizer are used as input contextdataset
Select the dataset to use. ['assembly', 'epictent']type_prompt
Select which type of context to be passed. ['num', 'alpha', 'emoji']toy_class_context
For the assembly dataset only. If True, the input context has all the examples from the same class of toysrecognition_model
If not use_gt, select which Step Recognizer predictions to use. ['miniROAD', 'OadTR']prompt_context
Select how the prompt context is structured. ['default', 'unreferenced','elaborate','no-context']
cd step_anticipation
./scripts/anticipation.sh
If you find our code or paper to be helpful, please consider citing:
@InProceedings{Flaborea_2024_CVPR,
author = {Flaborea, Alessandro and di Melendugno, Guido Maria D'Amely and Plini, Leonardo and Scofano, Luca and De Matteis, Edoardo and Furnari, Antonino and Farinella, Giovanni Maria and Galasso, Fabio},
title = {PREGO: Online Mistake Detection in PRocedural EGOcentric Videos},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {18483-18492}
}
@misc{plini2024tipregochainthoughtincontext,
title={TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos},
author={Leonardo Plini and Luca Scofano and Edoardo De Matteis and Guido Maria D'Amely di Melendugno and Alessandro Flaborea and Andrea Sanchietti and Giovanni Maria Farinella and Fabio Galasso and Antonino Furnari},
year={2024},
eprint={2411.02570},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.02570},
}