Skip to content

BEVDet implemented by TensorRT, C++; Achieving real-time performance on Orin

Notifications You must be signed in to change notification settings

autowarefoundation/bevdet_vendor

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BEVDet implemented by TensorRT, C++

English | 简体中文

NEWS: The branch bevdet_vendor-ros2 is a ROS 2 version modified based on the branch one. This branch organizes the TensorRT inference code of BEVDet into the ROS 2 package called bevdet_vendor. After compilation, it generates the bevdet_vendor library, and users can create their own ROS 2 node to call the bevdet_vendor library for image inference. How to use the bevdet_vendor library can refer to Autoware BEVDet inference node autoware_tensorrt_bevdet.

This project is a TensorRT implementation for BEVDet inference, written in C++. It can be tested on the nuScenes dataset and also provides a single test sample. BEVDet is a multi-camera 3D object detection model in bird's-eye view. For more details about BEVDet, please refer to the following link BEVDet. The script to export the ONNX model is in this repository.

图

This project implements the following:

  • TensorRT-Plugins : AlignBEV_plugin, Preprocess_plugin, BEVPool_plugin, GatherBEV_plugin
  • Long-term model
  • BEV-Depth model
  • On the NVIDIA A4000, the BEVDet-r50-lt-depth model shows a 6.24x faster inference speed for TRT FP16 compared to PyTorch FP32
  • On the Jetson AGX Orin, the FP16 model inference time is around 27 ms, achieving real-time performance
  • A Dataloader for the nuScenes dataset and can be used to test on the dataset
  • Fine-tuned the model to solve the problem that the model is sensitive to input resize sampling, which leads to the decline of mAP and NDS
  • An Attempt at Int8 Quantization

The features of this project are as follows:

  • A CUDA Kernel that combines Resize, Crop, and Normalization for preprocessing
  • The Preprocess CUDA kernel includes two interpolation methods: Nearest Neighbor Interpolation and Bicubic Interpolation
  • Alignment of adjacent frame BEV features using C++ and CUDA kernel implementation
  • Multi-threading and multi-stream NvJPEG
  • Sacle-NMS
  • Remove the preprocess module in BEV encoder

The following parts need to be implemented:

  • Quantization to int8.
  • Integrate the bevpool and adjacent frame BEV feature alignment components into the engine as plugins
  • Exception handling

Results && Speed

Inference Speed

All time units are in milliseconds (ms), and Nearest interpolation is used by default.

TRT-Engine Postprocess mean Total
NVIDIA A4000 PyTorch FP32 86.24
NVIDIA A4000 FP16 11.38 0.53 11.91
Jetson AGX Orin FP16 26.60 0.99 27.60

DataSet

The Project provides a test sample that can also be used for inference on the nuScenes dataset. When testing on the nuScenes dataset, you need to use the data_infos folder provided by this project. The data folder should have the following structure:

└── data
    ├── nuscenes
        ├── data_infos
            ├── samples_infos
                ├── sample0000.yaml
                ├── sample0001.yaml
                ├── ...
            ├── samples_info.yaml
            ├── time_sequence.yaml
        ├── samples
        ├── sweeps
        ├── ...

the data_infos folder can be downloaded from Google drive or Baidu Netdisk

Environment

For desktop or server:

  • CUDA 11.8
  • cuDNN 8.6.0
  • TensorRT 8.5.2.2
  • yaml-cpp
  • Eigen3
  • libjpeg

For Jetson AGX Orin

  • Jetpack 5.1.1
  • CUDA 11.4.315
  • cuDNN 8.6.0
  • TensorRT 8.5.2.2
  • yaml-cpp
  • Eigen3
  • libjpeg

Compile && Run

Use the ONNX file to export the TRT engine based on the script:

mkdir build && cd build
cmake .. && make
./export model.onnx model.engine

Inference

./bevdemo ../configure.yaml

References

About

BEVDet implemented by TensorRT, C++; Achieving real-time performance on Orin

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Cuda 43.2%
  • C++ 42.0%
  • Python 13.3%
  • CMake 1.4%
  • C 0.1%