360MVSNet: Deep Multi-view Stereo Network with 360° Images for Indoor Scene Reconstruction

Ching-Ya Chiu¹ Yu-Ting Wu² I-Chao Shen³ Yung-Yu Chuang¹

¹National Taiwan University ²National Taipei University ³The University of Tokyo

Comparison of the reconstruction results using our method, COLMAP and MVSNet. (a) The visualization of camera distribution. In this scene, COLMAP and MVSNet use 300 perspective cameras with 70° Field-of-View (red points) and our method uses 25 360° images (blue points). (b), (c), and (d) are the reconstruction results of our method, COLMAP, and MVSNet, respectively. We show the scores of completeness/overall quality underneath each method's result (lower is better). Our method boosts the reconstruction completeness while saving 12X efforts.

Publication and downloads

Ching-Ya Chiu, Yu-Ting Wu, I-Chao Shen, Yung-Yu Chuang, 360MVSNet: Deep Multi-view Stereo Network with 360° Images for Indoor Scene Reconstruction, WACV, 2023.

Paper: [PDF, 8.7MB]
Supplemental Materials: [PDF, 19.4MB]

Abstract

Recent multi-view stereo methods have achieved promising results with the advancement of deep learning techniques. Despite of the progress, due to the limited fields of view of regular images, reconstructing large indoor environments still requires collecting many images with sufficient visual overlap, which is quite labor-intensive. 360° images cover a much larger field of view than regular images and would facilitate the capture process. In this paper, we present 360MVSNet, the first deep learning network for multi-view stereo with 360° images. Our method combines uncertainty estimation with a spherical sweeping module for 360° images captured from multiple viewpoints in order to construct multi-scale cost volumes. By regressing volumes in a coarse-to-fine manner, high-resolution depth maps can be obtained. Furthermore, we have constructed EQMVS, a large-scale synthetic dataset that consists of over 50K pairs of RGB and depth maps in equirectangular projection. Experimental results demonstrate that our method can reconstruct large synthetic and real-world indoor scenes with significantly better completeness than previous traditional and learning-based methods while saving both time and effort in the data acquisition process.

BibTex

@article{ chiu360mvsnet,	      
author    = {Ching-Ya Chiu and Yu-Ting Wu and I-Chao Shen and Yung-Yu Chuang},
title     = {360MVSNet: Deep Multi-view Stereo Network with 360° Images for Indoor Scene Reconstruction},
journal   = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
year      = {2023}
}