- A model which predict gestures from speech
- This repository is based on text2gesture
- original paper
See "Download raw data" in "Speech_driven_gesture_generation_with_autoencoder" repository
See "Split dataset" in "Speech_driven_gesture_generation_with_autoencoder"
python create_vector.py DATA_DIR
- Dataset is created by separating 64 frames each (both speech and motion)
- Shape
- Speech: (block of frames, 26, 64)
- Motion: (block of frames, 192, 64)
- The mean and standard deviation parameters obtained when standardizing the training data are located in
. /norm/
.
python train.py [--batch_size] [--epochs] [--lr] [--weight_decay] [--embedding_dimension]
[--outdir_path] [--device] [--gpu_num] [--speech_path] [--pose_path] [--generator]
[--gan] [--discriminator] [--lambda_d]
- See "Usage" in "text2gesture" for details.
python predict.py [--modelpath] [--inputpath] [--outpath]
- The argument of
--modelpath
is set to specifies the folder where the generator model is located- model is output by
train.py
and located in./out/datetime/generator_datetime_weights.pth
- model is output by
python reshape-predict.py [--denorm] [--denormpath] [--datatype] [--npypath] [--outpath]
- If you want to undo the normalized data, set the argument of
--denorm
to 1. In this case,--denormpath
and--datatype
should be set. (--datatype
defaults to train.)--denormpath
and--datatype
are arguments to specify the directory where mean and standard deviation parameters obtained when standardizing the training data are located (Same as/norm/
output path in chapter 3.)
--npypath
is set to the folder where the test data is located