English | 简体中文
NEWS: The branch bevdet_vendor-ros2
is a ROS 2 version modified based on the branch one
. This branch organizes the TensorRT inference code of BEVDet into the ROS 2 package called bevdet_vendor
. After compilation, it generates the bevdet_vendor
library, and users can create their own ROS 2 node to call the bevdet_vendor
library for image inference. How to use the bevdet_vendor
library can refer to Autoware BEVDet inference node autoware_tensorrt_bevdet
.
This project is a TensorRT implementation for BEVDet inference, written in C++. It can be tested on the nuScenes dataset and also provides a single test sample. BEVDet is a multi-camera 3D object detection model in bird's-eye view. For more details about BEVDet, please refer to the following link BEVDet. The script to export the ONNX model is in this repository.
This project implements the following:
- TensorRT-Plugins : AlignBEV_plugin, Preprocess_plugin, BEVPool_plugin, GatherBEV_plugin
- Long-term model
- BEV-Depth model
- On the NVIDIA A4000, the BEVDet-r50-lt-depth model shows a 6.24x faster inference speed for TRT FP16 compared to PyTorch FP32
- On the Jetson AGX Orin, the FP16 model inference time is around 27 ms, achieving real-time performance
- A Dataloader for the nuScenes dataset and can be used to test on the dataset
- Fine-tuned the model to solve the problem that the model is sensitive to input resize sampling, which leads to the decline of mAP and NDS
- An Attempt at Int8 Quantization
The features of this project are as follows:
- A CUDA Kernel that combines Resize, Crop, and Normalization for preprocessing
- The Preprocess CUDA kernel includes two interpolation methods: Nearest Neighbor Interpolation and Bicubic Interpolation
- Alignment of adjacent frame BEV features using C++ and CUDA kernel implementation
- Multi-threading and multi-stream NvJPEG
- Sacle-NMS
- Remove the preprocess module in BEV encoder
The following parts need to be implemented:
- Quantization to int8.
- Integrate the bevpool and adjacent frame BEV feature alignment components into the engine as plugins
- Exception handling
All time units are in milliseconds (ms), and Nearest interpolation is used by default.
TRT-Engine | Postprocess | mean Total | |
---|---|---|---|
NVIDIA A4000 PyTorch FP32 | — | — | 86.24 |
NVIDIA A4000 FP16 | 11.38 | 0.53 | 11.91 |
Jetson AGX Orin FP16 | 26.60 | 0.99 | 27.60 |
The Project provides a test sample that can also be used for inference on the nuScenes dataset. When testing on the nuScenes dataset, you need to use the data_infos folder provided by this project. The data folder should have the following structure:
└── data
├── nuscenes
├── data_infos
├── samples_infos
├── sample0000.yaml
├── sample0001.yaml
├── ...
├── samples_info.yaml
├── time_sequence.yaml
├── samples
├── sweeps
├── ...
the data_infos folder can be downloaded from Google drive or Baidu Netdisk
For desktop or server:
- CUDA 11.8
- cuDNN 8.6.0
- TensorRT 8.5.2.2
- yaml-cpp
- Eigen3
- libjpeg
For Jetson AGX Orin
- Jetpack 5.1.1
- CUDA 11.4.315
- cuDNN 8.6.0
- TensorRT 8.5.2.2
- yaml-cpp
- Eigen3
- libjpeg
Use the ONNX file to export the TRT engine based on the script:
mkdir build && cd build
cmake .. && make
./export model.onnx model.engine
Inference
./bevdemo ../configure.yaml