SparseEnd2End: Obstacle 3D Detection and Tracking Architecture Based VisionTransformer

👋 Hi, I’m ThomasVonWu. I'd like to introduce you to a simple and practical deployment repository based on TensorRT which uses end-to-end perception paradigm with sparse transformer to sense 3D obstacles. This repository has no complex dependency for Training | Inference | Deployment(which means, we don't need to install MMDetection3d, mmcv, mmcv-full, mmdeploy, etc.), so it's easy to install in your local workstation or supercomputing gpu clusters. This repository will also provide x86(NVIDIA RTX Series GPU) | ARM(NVIDIA ORIN) deployment solutions. Finally, you can deploy your e2e model onborad through this repository happily.
👀 I guess you are interested in:

how to define a PyTorch custom operator: DeformableAttentionAggr and register related ONNX node.
how to build a custom opertator plugin: DeformableAttentionAggr in TensorRT engine with Makefile or CMake.
how to convert ONNX format file with custom opertator to TensorRT engine and make it as part of the whole engine.
how to validate inference results consistency : PyTorch results vs. ONNX Runtime results vs. TensorRT results.
how to convert PyTorch model with temporal fusion transformer head to ONNX.
how to locate the TensorRT layer accurately when overflow occurs during using fp16 quantization for model parameter.

Algorithm Architecture

Algorithm Framework of Sparse4D, which conforms to an encoder-decoder structure. The inputs mainly consists of three components: multi-view images, newly initialized instances, propagated instances from previous frame. The output is the refined instances (3D anchor boxes and corresponding features), serve as the perception results for the current frame. Additionally, a subset of these refined instances is selected and propagated to the next frame.

nuScenes Benchmark

Results on Validation Split: `ThomasVonWu/SparseEnd2End` v.s. `HorizonRobotics/Sparse4D`

These training reproduction experiments were conducted using 4 NVIDIA H20 GPUs with 96 GB memory.

model	repository	backbone	pretrain	img size	Epoch	Traning	FPS	NDS	mAP	AMOTA	AMOTP	IDS	config	ckpt	log	GPU
Sparse4Dv3	HorizonRobotics/Sparse4D	Resnet50	ImageNet	256x704	100	22H	19.8	0.5637	0.4646	0.477	1.167	456	-	-	-	RTX3090
Sparse4Dv3	ThomasVonWu/SparseEnd2End	Resnet50	ImageNet	256x704	150	77.5H	-	0.5623	0.4645	0.457	1.196	541	cfg	ckpt	log	H20

SparseEnd2End Deployment Experiments Results

Model	ImgSize	Backbone	Framework	Precision	mAP	NDS	FPS	GPU	config	ckpt	onnx	engine
Sparse4Dv3	256x704	Resnet50	PyTorch	FP32	0.4645	0.5623	15.8	RTX 3090	config	ckpt	--	--
Sparse4Dv3	256x704	Resnet50	TensorRT	FP32	wait	wait	wait	RTX 3090	config	ckpt	onnx	engine
Sparse4Dv3	256x704	Resnet50	TensorRT	FP16	wait	wait	wait	RTX 3090	config	ckpt	wait	wait
Sparse4Dv3	256x704	Resnet50	TensorRT	INT8+FP16	wait	wait	wait	RTX 3090	config	ckpt	wait	wait
Sparse4Dv3	256x704	Resnet50	TensorRT	FP32	wait	wait	wait	NVIDIA ORIN	config	ckpt	wait	wait
Sparse4Dv3	256x704	Resnet50	TensorRT	FP16	wait	wait	wait	NVIDIA ORIN	config	ckpt	wait	wait
Sparse4Dv3	256x704	Resnet50	TensorRT	INT8+FP16	wait	wait	wait	NVIDIA ORIN	config	ckpt	wait	wait

News

24 Sep, 2024: I release repository: SparseEnd2End. The complete deployment solution was released.
25 Aug, 2024: I release repository: SparseEnd2End. The complete deployment solution will be released as soon as possible. Please stay tuned!

Tasklist

Introduction

SparseEnd2End is a Sparse-Centric paradigm for end-to-end autonomous driving perception.

Quick Start

Citation

If you find SparseEnd2End useful in your research or applications, please consider giving me a star 🌟

🏷 ChangeLog

08/25/2024： [v1.0.0] This repository now supports Training | Inference in NuscenesDataset. It includes: data dump in JSON, Training | Inference log caching, TensorBoard hooking, and so on.
11/14/2024： [v2.0.0] Reproduce training results of HorizonRobotics/Sparse4D with FP32.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
dataset		dataset
deploy		deploy
docker		docker
modules		modules
onboard		onboard
resources		resources
script		script
tool		tool
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
QUICK-START.md		QUICK-START.md
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparseEnd2End: Obstacle 3D Detection and Tracking Architecture Based VisionTransformer

Algorithm Architecture

nuScenes Benchmark

Results on Validation Split: `ThomasVonWu/SparseEnd2End` v.s. `HorizonRobotics/Sparse4D`

SparseEnd2End Deployment Experiments Results

News

Tasklist

Introduction

Quick Start

Citation

🏷 ChangeLog

About

Releases

Packages

Languages

License

ThomasVonWu/SparseEnd2End

Folders and files

Latest commit

History

Repository files navigation

SparseEnd2End: Obstacle 3D Detection and Tracking Architecture Based VisionTransformer

Algorithm Architecture

nuScenes Benchmark

Results on Validation Split: ThomasVonWu/SparseEnd2End v.s. HorizonRobotics/Sparse4D

SparseEnd2End Deployment Experiments Results

News

Tasklist

Introduction

Quick Start

Citation

🏷 ChangeLog

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Results on Validation Split: `ThomasVonWu/SparseEnd2End` v.s. `HorizonRobotics/Sparse4D`

Packages