Human Pose Estimation for multi-view Human Action Recognition
The dependencies are listed in requirements.txt. You can install them with the following command:
pip install -r requirements.txt
The dataset used in this project is the NTU RGB+D dataset. The dataset is divided into 3 parts:
- RGB videos
- Depth videos
- Skeleton data
-
We use the skeleton data for this project. The skeleton data is in the form of .skeleton files. Each .skeleton file contains the 3D coordinates of 25 joints of a person in a frame. The skeleton data is extracted from the .skeleton files and stored as .npy files. The code for this can be found here: Skeleton Data Extraction.
-
Create a folder called dataset in the root folder of the project. The folder structure should be as follows:
HPE-for-HAR ├── dataset │ ├── S001C001P001R001A001.skeleton.npy │ ├── ... ├── remaining files
-
Use the code from the link above to extract the skeleton data from the .skeleton files into the dataset folder.
-
The skeleton data is stored in the form of numpy arrays. Each numpy array contains the 3D coordinates of 25 joints of a person in a frame. The shape of the numpy array is (T, 25, 3), where T is the number of frames in the video. The skeleton data is stored in the form of numpy arrays to reduce the time taken to load the data.
- The skeleton data can be augmented by occluding the joints of the skeleton. The code for this can be found here: Skeleton Data Augmentation.
- The training code uses the PyTorch framework.
- To start training, run the following command:
python main.py --dataset ./dataset
- Other arguments can be found in the main.py file.
- The trained models are stored in the
./output
folder. - The hyperparameters of the model can be changed in the ./config/model.json file.
- To augment the data, pass the
--occlude
argument to the training script.