Scalable Neural Indoor Scene Rendering

SIGGRAPH 2022 (Journal track)

Xiuchao Wu^1* Jiamin Xu^1* Zihan Zhu¹ Hujun Bao¹ Qixing Huang² James Tompkin³ Weiwei Xu¹

^*joint first authors

¹Zhejiang University ²University of Texas at Austin ³Brown University

Abstract

We propose a scalable neural scene reconstruction and rendering method to support distributed training and interactive rendering of large indoor scenes. Our representation is based on tiles and a separation of view-independent appearance (diffuse color and shading) and view-dependent appearance (specular highlights, reflections), each of which predicted by lower-capacity MLPs. After assigning MPLs per tile, our scheme allows tile MLPs to be trained in parallel and still represent complex reflections through a two-pass training strategy. This is enabled by a background sampling strategy that can augment tile information from a proxy global mesh geometry and tolerate typical errors from reconstructed proxy geometry. Further, we design a two-MLP based representation at each tile to leverage the phenomena that view-dependent surface effects can be attributed to a reflected virtual light at the total ray distance to the source. This lets us handle sparse samplings of the input scene where reflection highlights do not always appear consistently in input images. We show interactive free-viewpoint rendering results from five scenes. One of them covers areas of more than 100 ㎡. Experimental results show that our method produces higher-quality renderings than a single large-capacity MLP and other recent baseline methods.

Rendering Results

LivingRoom2

Coffee Shop

LivingRoom1

Bar

Sofa

Method

We create tiles over the volumetric scene and optimize per-tile MLPs. Each tile has two MLPs: 1) The surface MLP that encodes density and view independent color, which is later stored in an octree for fast rendering. 2) The reflection MLP that encodes view-dependent effects like highlights using virtual points underneath the surface at the ray distance of the reflected light. Color outputs from both paths are combined in the final rendering.

Explainer Video

Distributed Training

Our scheme allows tile MLPs to be trained in parallel.

Interactive Rendering

The rendering time for a frame of resolution $1280 \times 720$ is 50ms on average.

Comparisons

Our method shows improved rendering results for specular reflection and temporal coherence over baselines

Extrapolation

We test the case when novel viewpoints are far from captured views.

Simple Editing

Full Video

BibTex

  
    @article{wu2022snisr,
      title={Scalable Neural Indoor Scene Rendering},
      author={Wu, Xiuchao and Xu, Jiamin and Zhu, Zihan and Bao, Hujun and Huang, Qixing and Tompkin, James and Xu, Weiwei},
      journal={ACM Transactions on Graphics (TOG)},
      year={2022}
    }

Acknowledgements

Supported by Information Technology Center and State Key Lab of CAD&CG, Zhejiang University.