[3DV 2022] Visual Localization via Few-Shot Scene Region Classification

*Siyan Dong, *Shuzhe Wang, Yixin Zhuang, Juho Kannala, Marc Pollefeys, Baoquan Chen

* Equal Contribution | Video | Poster

In this paper, we propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images for scene coordinate based visual localization. Our insight is leveraging a) pre-learned feature extractor, b) scene region classifier, and c) meta-learning strategy to accelerate training while mitigating overfitting. We evaluate our method on both indoor and outdoor benchmarks. The experiments validate the effectiveness of our method in the few-shot setting, and the training time is significantly reduced to only a few minutes.

The training pipeline: a hierarchical partition tree is built to divide the scene to regions, and then a neural network is trained to map input image pixels to region labels. The network is designed to leverage both scene-agnostic priors (i.e., the feature extractor SuperPoint) and scene-specific memorization (i.e. the classifier that consists of hierarchical classification networks).

The camera pose estimation pipeline: given a query image, the trained network infers correspondences between image pixels and scene regions. Since each scene region corresponds to a set of scene coordinates, 2D-3D correspondences are built between image pixels and scene coordinates. Followed by a PnP algorithm with RANSAC, the camera pose is solved by optimization.

Results

We provide our camera pose accuracy on the 7-Scenes dataset and Cambridge landmarks. Note that for both few-shot and original (full-training set) memorization, we use the same network capacity (∼40 MB).

Median Error (cm / deg)	Chess	Fire	Heads	Office	Pumpkin	RedKitchen	Stairs
Few-Shot Training	4 / 1.23	4 / 1.53	2 / 1.56	5 / 1.47	7 / 1.75	6 / 1.93	5 / 1.47
Original Training	3 / 0.79	3 / 1.03	2 / 0.98	4 / 0.94	4 / 1.10	6 / 1.39	4 / 1.12

Median Error (cm / deg)	Great Court	King's College	Old Hospital	Shop Facade	St Mary's Church
Few-Shot Training	81 / 0.47	39 / 0.69	38 / 0.54	19 / 0.99	31 / 1.03
Original Training	39 / 0.21	21 / 0.37	23 / 0.37	5 / 0.26	14 / 0.41

Setup

We recommend to use a conda environment:

Install anaconda or miniconda.
Create the environment: conda env create -f environment.yml.
Activate the environment: conda activate SRC.
To run the evaluation script, you will need to build the cython module:
```
cd ./pnpransac
python setup.py build_ext --inplace
```

Data

You can download the 7Scene and Cambridge datasets from the official website for training and evaluation. we also provide additional necessary information here for few-shot training, the depth maps we used for Cambridge dataset are from DSAC++. Note that you can directly use the provided .label_n*.png and _leaf_coords_n*.npy to reproduce our results. If you select to run partition.py to create your own region labels, please replace the provided _leaf_coords_n*.npy with the generated one.

Downloading the 12Scene dataset for pretraining is optional.

Model

You can download the pretrained models here, the optimizers are also included in the models.

7-Scenes

Training

Hierarchical Partition (Optional)

python partition.py --data_path Path/to/download/dataset --dataset 7S --scene chess --training_info train_fewshot.txt --n_class 64 --label_validation_test True

Scene Memorizarion

python train.py --data_path Path/to/download/dataset --dataset 7S --scene chess --training_info train_fewshot.txt --n_class 64 --train_id ???

Test

python eval.py --data_path Path/to/download/dataset --dataset 7S --scene chess --test_info test.txt --n_class 64 --checkpoint checkpoints/???

Cambridge Landmarks

Training

Hierarchical Partition (Optional)

python partition.py --data_path Path/to/download/dataset --dataset Cambridge --scene GreatCourt --training_info train_fewshot.txt --n_class 100 --use_gpu False

Scene Memorizarion

python train.py --data_path Path/to/download/dataset --dataset Cambridge --scene GreatCourt --training_info train_fewshot.txt --n_class 100 --train_id ???

Test

python eval.py --data_path Path/to/download/dataset --dataset Cambridge --scene GreatCourt --test_info test_Cambridge.txt --n_class 100 --checkpoint checkpoints/???

Pre-Training on 12-Scenes Dataset

python pretrain_12S.py --data_path Path/to/download/dataset --training_info train_20f.txt --n_class 64 --train_id ???

python pretrain_12S.py --data_path Path/to/download/dataset --training_info train_20f.txt --n_class 100 --train_id ???

Acknowledgements

We appreciate the previous open-source repositories DSAC++ and HSCNet.

Citation

If you find our work helpful in your research, please consider citing:

@INPROCEEDINGS{10044413,
  author={Dong, Siyan and Wang, Shuzhe and Zhuang, Yixin and Kannala, Juho and Pollefeys, Marc and Chen, Baoquan},
  booktitle={2022 International Conference on 3D Vision (3DV)}, 
  title={Visual Localization via Few-Shot Scene Region Classification}, 
  year={2022},
  volume={},
  number={},
  pages={393-402},
  doi={10.1109/3DV57658.2022.00051}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

[3DV 2022] Visual Localization via Few-Shot Scene Region Classification

Results

Setup

Data

Model

7-Scenes

Training

Hierarchical Partition (Optional)

Scene Memorizarion

Test

Cambridge Landmarks

Training

Hierarchical Partition (Optional)

Scene Memorizarion

Test

Pre-Training on 12-Scenes Dataset

Acknowledgements

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

[3DV 2022] Visual Localization via Few-Shot Scene Region Classification

Results

Setup

Data

Model

7-Scenes

Training

Hierarchical Partition (Optional)

Scene Memorizarion

Test

Cambridge Landmarks

Training

Hierarchical Partition (Optional)

Scene Memorizarion

Test

Pre-Training on 12-Scenes Dataset

Acknowledgements

Citation