Generating real-world training datasets

Provided files (DEF-Scan)

We provide the following two sets of files:

The intermediate files used within the processing pipeline, including:
- the original STL files from the ABC used for 3D printing,
- the original OBJ and YML files from the ABC dataset used for 3D feature annotation,
- the scanned .x files,
- converted .hdf5 and .ply files,
- .mlp files containing alignment information obtained in MeshLab,
- image-based renderings of each model in PNG format,
- 3D snapshots of each model in HTML format,
- Additional diagnostic information e.g. alignment histograms, 3D alignment snapshots.
All intermediate files are available in separate zips:
The final aligned, annotated training, testing, and evaluation datasets (DEF-Scan)
- In image-based format: images_align4mm_fullmesh_whole.tar.gz (753 Mb)
  - 981 training, 479 validation, 468 testing instances (depth images)
- In point-based format: points_align4mm_partmesh_whole.tar.gz (6.2 Gb)
  - 15574 training, 4119 validation, 9770 testing instances (point patches)

Obtaining the datasets

Warning: we strongly recommend using the final aligned, annotated shapes as the final training/evaluation dataset. We found that producing this real-world dataset is quite cumbersome and may result in failing to reproduce the results of our work.

The list below outlines the construction sequence for the dataset of real-world scans that we followed during our work.

Select shapes for printing. The original STL models using for 3D printing are available within the zipped data.
Manufacture the shape using a 3D printer. The original manufactured plastic models are stored at Skoltech. A photo of these models you can find in the paper.
Scan the shape using the structured-light scanning device. For each shape, we performed 24 scans, positioning the shape first in a normal orientation, then changing orientation once by 90 degrees for a 360 degree scan. The raw results of this scanning for all STL models are available within the zipped data.
We convert DirectX .x files using an utility script convert_x_to_hdf5.py in the following fashion:

python3 convert_x_to_hdf5.py -i file.x -o output_dir/ --hdf5 --verbose

resulting in a single .hdf5 output file in RangeVisionIO schema. We additionally export .ply files for semi-automatic alignment in MeshLab by running:

python3 convert_x_to_hdf5.py -i file.x -o output_dir/ --ply --verbose

All .hdf5 and .ply conversion results are parts of the raw scanned data are available within the zipped data. 5. We semi-automatically examine the raw scans to identify flawed scans or prints, and align the PLY scans to their respective STL shapes. Some fail to align tightly and are located in bad_scan/ subdirectory in respective archives. Shapes with printing flaws are located in bad_print/ subfolder. Shapes that were correctly aligned are located in aligned/ subfolder.
We additionally 6. We prepare the real scans by exporting them to a format similar to our real-world data (ViewIO schema)

python3 prepare_real_scans.py -i input-dir -o output-dir --verbose --debug

We annotate the scans into image-based format using

python3 prepare_real_images_dataset.py \
  -i input_dir/ -o output.hdf5 \ 
  --verbose --debug \
  --max_point_mesh_distance 4.0 \  # 4 mm
  --max_distance_to_feature 10.0 \  # 10 mm
  --full_mesh  # use all features of the mesh in annotation

We annotate the scans into point-based format using

python3 prepare_real_points_dataset.py \
  -i input_dir/ -o output.hdf5 \ 
  --verbose --debug \
  --max_point_mesh_distance 4.0 \  # 4 mm
  --max_distance_to_feature 10.0   # 10 mm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

real_data.md

real_data.md

Generating real-world training datasets

Provided files (DEF-Scan)

Obtaining the datasets

Files

real_data.md

Latest commit

History

real_data.md

File metadata and controls

Generating real-world training datasets

Provided files (DEF-Scan)

Obtaining the datasets