Yingshu Chen,
Huajian Huang†,
Tuan-Anh Vu,
Ka-Chun Shum,
Sai-Kit Yeung
The Hong Kong University of Science and Technology
†Corresponding author
Abstract: Creating large-scale virtual urban scenes with variant styles is inherently challenging. To facilitate prototypes of virtual production and bypass the need for complex materials and lighting setups, we introduce the first vision-and-text-driven texture stylization system for large-scale urban scenes, StyleCity. Taking an image and text as references, StyleCity stylizes a 3D textured mesh of a large-scale urban scene in a semantics-aware fashion and generates a harmonic omnidirectional sky background. To achieve that, we propose to stylize a neural texture field by transferring 2D vision-and-text priors to 3D globally and locally. During 3D stylization, we progressively scale the planned training views of the input 3D scene at different levels in order to preserve high-quality scene content. We then optimize the scene style globally by adapting the scale of the style image with the scale of the training views. Moreover, we enhance local semantics consistency by the semantics-aware style loss which is crucial for photo-realistic stylization. Besides texture stylization, we further adopt a generative diffusion model to synthesize a style-consistent omnidirectional sky image, which offers a more immersive atmosphere and assists the semantic stylization process. The stylized neural texture field can be baked into an arbitrary-resolution texture, enabling seamless integration into conventional rendering pipelines and significantly easing the virtual production prototyping process. Extensive experiments demonstrate our stylized scenes' superiority in qualitative and quantitative performance and user preferences.
- ⏳ Coming soon: StyleCity source code and samples
- ⏳ Coming soon: Segmentation model and 2D-to-3D segmentation script
- ⏳ Coming soon: Omnidirectional sky synthesis source code
- Project page (Full materials: paper, poster, video, demo, etc.)
- Data collector (Google Tiles API)
If you find our tool or work useful in your research, please consider citing:
@inproceedings{chen2024stylecity,
title={StyleCity: Large-Scale 3D Urban Scenes Stylization},
author={Chen, Yingshu and Huang, Huajian and Vu, Tuan-Anh and Shum, Ka Chun and Yeung, Sai-Kit},
booktitle={Proceedings of the European Conference on Computer Vision},
year={2024}
}
We are very grateful for the source codes and outstanding contributions from DPST, SyncDiffusion, Mask2Former, and Pytorch3D.