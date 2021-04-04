



View composition is a computer vision (CV) technology that restores a 3D scene representation using observed images, allowing you to render a scene from a new, unobserved perspective. Recently, great progress has been made by using neural volumetric representation.

Neural Radiance Fields (NeRF) can render photo-realistic and innovative views with fine geometric details and realistic view-dependent appearances. It represents the scene as a continuous volume function, parameterized by a Multilayer Perceptron (MLP), and mapped from a stable 3D position to the volume density and view-dependent radiance at that position.

However, NeRF rendering is slow, computationally intensive, and limited in its use for interactive view compositing. You will also not be able to view the restored 3D model in a standard web browser.

Google researchers have addressed this issue of rendering trained NeRFs in real time, while maintaining the ability to represent fine geometric details and compelling view-dependent effects.

Their approach accelerates the NeRF rendering process by three orders of magnitude, delivering 12ms per frame on a single GPU. The trained NeRF is precomputed and stored in a sparse 3D voxel grid data structure called the sparse neural radius grid (SNeRG). Each SNeRG active voxel contains a trained feature vector that encodes opacity, diffuse color, and view-dependent effects.

To render this representation, we first accumulate diffuse colors and feature vectors along each ray. These accumulated feature vectors are then passed through a lightweight MLP to generate a view-dependent residual added to the collected diffuse colors.

Major changes in NeRF

Recent studies have recommended discretized volumetric representation as one of the most efficient approaches to improving the efficiency of NeRF. Researchers can use a delayed neural rendering technique to model view-dependent effects that can visualize trained NeRF models with real-time commodity hardware while minimizing quality degradation. Extend the approach.

The team will introduce two necessary changes to NeRF so that they can be effectively incorporated into this sparse voxel representation.

They designed a “deferred” NeRF architecture. The original NeRF architecture uses MLP, which runs once per 3D sample, to represent view-dependent effects. However, the modified architectures instead represent them in MLPs that run only once per pixel. They normalize the NeRF’s predicted opacity field during training to promote sparsity. The rendering time and storage required for volume representation largely depends on the sparseness of the opacity in the provided scene. Therefore, regularization penalizes the predicted density and makes the NeRF opacity field more sparse, improving both the resulting SNeRG storage cost and rendering time.

The team demonstrated the ability of the proposed method to speed up NeRF rendering, allowing frames to be rendered in real time while maintaining NeRF’s ability to represent fine geometric details and compelling view-dependent effects. I will. In addition, this representation is compact and requires an average of less than 90MB to represent the scene.

Figure 1: Comparison of NeRF and SNeRG ray marching procedures. Source: https: //arxiv.org/pdf/2103.14645.pdf

Researchers have compared the proposed approach with the latest techniques for accelerating NeRF, taking into account three criteria: render time performance, storage cost, and render quality. Upon evaluation, they found the following:

MLP had a slight impact on runtime performance when removing view dependencies. Extracting sparsity loss increased memory usage. Changed the proposed “delayed” rendering to NeRF with a very long render time.

The team states that the rendered quality of the proposed SNeRG model was found to compete with the neural model after tweaking. Storage ablation studies have verified that the compressed SNeRG representation is small enough to load quickly into a web page or view on a laptop GPU at 30 frames per second or more.

Figure 2: Loss of sparsity and visualization of visibility culling. Source: https: //arxiv.org/pdf/2103.14645.pdf Figure 3: View-dependent appearance Network fine-tuning (FT) impact Source: https: //arxiv.org/pdf/2103.14645.pdf

The team hopes that their approach will help them adopt such neural scene representations throughout various vision and graphics applications.

