Diffusion-Based Material Regularization for
Physics-Based Inverse Rendering
ECCV 2026
- Jingwang Ling 1 University of Illinois Urbana-Champaign
- Lifan Wu 2 NVIDIA
- Feng Xu 3 Tsinghua University
- Shuang Zhao 1✉ University of Illinois Urbana-Champaign
Qualitative comparison of relighting on the Stanford-ORB dataset between our method, Neural-PBIR, and MaterialFusion. With our new material clustering regularizer, we avoid baked-in shadows while accurately modeling spatially varying materials (top). Our method is also more robust to strong highlights on glossy metallic surfaces, producing more accurate reflections (bottom).
Abstract
Reconstructing physics-based 3D assets—geometry, materials, and illumination—from multi-view images is a core problem in computer graphics and vision, and a prerequisite for realistic relighting and editing. Physics-based inverse rendering offers an accurate image-formation model, but is severely underconstrained: without strong priors, illumination is baked into materials, and reconstructions generalize poorly to novel views and lighting. Data-driven diffusion models, in contrast, predict visually plausible materials, yet their predictions rarely satisfy the rendering equation and are not directly usable for physics-based rendering. We bridge these two paradigms rather than replacing either. Our key idea is to treat the predictions of a state-of-the-art diffusion model not as target material values but as a similarity kernel for optimization: we introduce a regularization loss that penalizes deviations in the optimized material over surface regions where the diffusion predictions are near-constant, while leaving the optimization free to match the input images. Built on this regularizer, our end-to-end pipeline jointly reconstructs geometry, materials, and illumination, yielding high-quality assets that drop into standard rendering pipelines and relight faithfully. On the Synthetic4Relight, Stanford-ORB, and DTC-Synthetic datasets, our method significantly outperforms state-of-the-art baselines in both reconstruction accuracy and relighting quality.
Method Overview
From N multi-view images under unknown illumination, we (1) predict per-view intrinsic G-buffers (albedo, roughness, metallic, normal) with a conditional diffusion model; (2) reconstruct a voxel-grid SDF by neural volume rendering, supervised by the predicted normals; and (3) jointly optimize shape, spatially varying material, and an environment map by differentiable rendering, minimizing the photometric loss and our material clustering regularizer. Rather than fitting the diffusion predictions directly, our regularizer uses them as a joint bilateral filtering guide: it penalizes the rendered G-buffer for deviating from its diffusion-guided filtered version, enforcing material similarity within regions the diffusion model deems uniform. The result is a renderer-ready PBR asset that relights faithfully under novel illumination.
Video
Citation
Website template borrowed from Jon Barron.