ROGR: Relightable 3D Objects using Generative Relighting

Abstract

We introduce ROGR, a novel approach that reconstructs a relightable 3D model of an object captured from multiple views, driven by a generative relighting model that simulates the effects of placing the object under novel environment illuminations. Our method samples the appearance of the object under multiple lighting environments, creating a dataset that is used to train a lighting-conditioned Neural Radiance Field (NeRF) that outputs the object's appearance under any input enviromental lighting. The lighting-conditioned NeRF uses a novel dual-branch architecture to encode the general lighting effects and specularities separately.

The optimized lighting-conditioned NeRF enables efficient feed-forward relighting under arbitrary environment maps without requiring per-illumination optimization or light transport simulation. We evaluate our approach on the established TensoIR and Stanford-ORB datasets, where it improves upon the state-of-the-art on most metrics, showcase our approach on real-world object captures.

Method

Our multi-view relighting diffusion model takes as input N posed images illuminated with a consistent, but unknown illumination, represented by the camera raymap and the source pixel values, and an environment map per image, that has been rotated to the camera pose. The diffusion model generates images of the same object from the same poses, but lit by input environment map. In order to generate our multi-illumination dataset, we repeat this relighting process M times with M different environment maps.

We use a combination of two lighting conditioning signals to train the NeRF on our generated multi-illumination dataset. The general lighting encoding f^general is used for encoding the full environment map in a single embedding and is obtained using a transformer encoder applied to the entire sphere of incident radiance. The specular encoding f^specular is composed of the environment map value, as well as prefiltered versions of the environment map, queried at the reflection direction ω_r, which is the direction of the camera ray reflected about the surface normal vector. Combining these two conditioning signals provides the NeRF with all the information necessary for relighting diffuse materials as well as shiny ones that exhibit strong reflections.

Multi-view Consistent Relighting

In the left: we show diffusion sample results of single-image relighting model from Neural Gaffer;
In the middle: we show diffusion sample results of our 16-view relit diffusion model;
In the right: we show the reference ground truths.

Neural Gaffer 1-view diffusion

Our 16-view diffusion

Ground truth

Generalizable and Relightable Neural Radiance Fields

In the left: we show generalized NeRF relighting results based on both global and specular conditioning.
In the middle: we show generalized NeRF relighting results based on specular conditioning;
In the right: we show generalized NeRF relighting results based on global conditioning;

General and Specular Conditioning

Only Specular Conditioning

Only General Conditioning

Related Works

Check out the following works which also introduce a relighting diffusion model.

DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation (single image relighting with radiance cues)

Neural Gaffer: Relighting Any Object via Diffusion (single image relighting, needs to re-optimized for novel lighting. )

A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis (single image relighting, needs to re-optimized for novel lighting)

IllumiNeRF: 3D Relighting Without Inverse Rendering (single image relighting with radiance cues, needs to re-optimized for novel lighting.)

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models (directly generate relightable NeRF from sparse views and the target illumination, but does not guarantee consistent geometry across environment maps.)

DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models (video diffusion model for relighting but lacks 3D consistency and slow inference speed)

Acknowledgements

We would like to thank Xiaoming Zhao, Rundi Wu, Songyou Peng, Ruiqi Gao, Ben Poole, Aleksander Holynski, Jason Zhang, Jonathan T. Barron, Stan Szymanowicez, Hadi Alzayer, Alex Trevithick, and Jiahui Lei for their valuable contributions. We also extend our gratitude to Shlomi Fruchter, Kevin Murphy, Mohammad Babaeizadeh, Han Zhang and Amir Hertz for training the base text-to-image latent diffusion model.

BibTeX

@inproceedings{tang2025rogr,
        author    = {Jiapeng Tang and Matthew Levine and Dor Verbin and Stephan J. Garbin and Matthias Niessner and Ricardo Martin-Brualla and Pratul P. Srinivasan and Philipp Henzler},
        title     = {{ROGR: Relightable 3D Objects using Generative Relighting}},
        booktitle = {Advances in Neural Information Processing Systems},
        year      = {2025},
    }