Text-to-Scene Generation

Project Overview

We aim to develop AI-assisted tools to generate 2D/3D visual data including images, 3D models, and scenes from textual descriptions. The key technologies to explore include cross-modality (text, images, 3D geometries) data matching and feature modeling, semantics-constrained generative models using GANs and Diffusion Models, etc.

Publications

Semantics-enhanced Adversarial Nets for Text-to-Image Synthesis

H. Tan, X. Liu, X. Li, Y. Zhang, and B. Yin,

International Conference on Computer Vision (ICCV), pp. 10501-10510, 2019

"KT-GAN: Knowledge-Transfer Generative Adversarial Network for Text-to-Image Synthesis,"

H. Tan, X. Liu, M. Liu, B. Yin, and X. Li,

IEEE Trans. on Image Processing (TIP), Vol. 30, pp. 1275-1290, 2021.

"Cross-modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis,"

H. Tan, X. Liu, B. Yin, and X. Li,

IEEE Trans. on Multimedia (TMM), 24:832-845, 2022.

“DR-GAN: Distribution Regularization for Text-to-Image Generation,”

H. Tan, X. Liu, B. Yin, and X. Li,

IEEE Transactions on Neural Networks and Learning Systems (TNNLS), DOI: 10.1109/TNNLS.2022.3165573, 2022.

“LE-GAN: Layout-enhanced Adversarial Network for Text-to-Image Synthesis,”

H. Tan, B. Yin, and X. Li,

submitted to IEEE Transactions on Multimedia, under revision, 2023.

Other:

Locations of visitors to this page

Web Page Hit Counter