Xun Huang's Academic Homepage

I was previously a Research Scientist at Adobe Research, an Adjunct Professor at CMU, and a Research Scientist at NVIDIA. I obtained my PhD in Computer Science from Cornell in 2020, advised by Professor Serge Belongie.

I invented architectures and algorithms that have enabled autoregressive real-time video generation, including Self Forcing and Autoregressive Diffusion Transformers (CausVid). Previously, I developed one of the first public text-to-image demo (GauGAN2), as well as NVIDIA's first text-to-image and text-to-3D foundation models. My research has been cited over 17,000 times as of Dec 2025.

I have been working on multimodal "Generative AI" for 10 years. During my PhD, I invented Adaptive Instance Normalization (AdaIN) and was the first to demonstrate its effectiveness in generative neural networks. AdaIN became a foundational component of StyleGAN and played a key role in the first working diffusion model. Variants of AdaIN are now used in nearly all diffusion models. My PhD research was supported by Adobe Research Fellowship (2019), Snap Research Fellowship (2019), and NVIDIA Graduate Fellowship (2018).

Selected/Recent Publications

MotionStream: Real-Time Video Generation with Interactive Motion Controls

arXiv 2025

Joonghyuk Shin, Zhengqi Li, Richard Zhang, Jun-Yan Zhu, Jaesik Park, Eli Shechtman, Xun Huang

[arXiv] [Project]

Learning an Image Editing Model without Image Editing Pairs

arXiv 2025

Nupur Kumari, Sheng-Yu Wang, Nanxuan Zhao, Yotam Nitzan, Yuheng Li, Krishna Kumar Singh, Richard Zhang, Eli Shechtman, Jun-Yan Zhu, Xun Huang

[arXiv] [Project]

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

NeurIPS 2025 (spotlight)

Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, Eli Shechtman

[arXiv] [Project] [Code]

Long-Context State-Space Video World Models

ICCV 2025

Ryan Po, Yotam Nitzan, Richard Zhang, Berlin Chen, Tri Dao, Eli Shechtman, Gordon Wetzstein, Xun Huang

[arXiv] [Project]

X-Fusion: Introducing New Modality to Frozen Large Language Models

ICCV 2025 (Best Paper @ CVPR 2025 T4V Workshop)

Sicheng Mo, Thao Nguyen, Xun Huang, Siddharth Srinivasan Iyer, Yijun Li, Yuchen Liu, Abhishek Tandon, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, Yuheng Li

[arXiv] [Project]

From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

CVPR 2025

Tianwei Yin*, Qiang Zhang*, Richard Zhang, William T. Freeman, Fredo Durand, Eli Shechtman, Xun Huang

[arXiv] [Project] [Code]

Magic3D: High-Resolution Text-to-3D Content Creation

CVPR 2023 (Highlight)

Chen-Hsuan Lin*, Jun Gao*, Luming Tang*, Towaki Takikawa*, Xiaohui Zeng*, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin

[arXiv] [Project] [Video]

eDiff-I: Text-to-Image Diffusion Models with Ensemble of Expert Denoisers

arXiv 2022

Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu

[arXiv] [Project] [Video]

Multimodal Conditional Image Synthesis with Product-of-Experts GANs

ECCV 2022

Xun Huang, Arun Mallya, Ting-Chun Wang, Ming-Yu Liu

[arXiv] [Project] [Video] [Two Minute Papers]

PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows

ICCV 2019 (Oral)

Guandao Yang*, Xun Huang*, Zekun Hao, Ming-Yu Liu, Serge Belongie, Bharath Hariharan

[arXiv] [Code] [Video]

Multimodal Unsupervised Image-to-Image Translation

ECCV 2018

Xun Huang, Ming-Yu Liu, Serge Belongie, Jan Kautz

[arXiv] [Code] [Video]

Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization

ICCV 2017 (Oral)

Xun Huang, Serge Belongie

[arXiv] [Code]

Stacked Generative Adversarial Networks

CVPR 2017

Xun Huang, Yixuan Li, Omid Poursaeed, John Hopcroft, Serge Belongie

[arXiv] [Code]

* indicates equal contribution.
See Google Scholar for the full list of publications.

Xun Huang (/shuun hwang/)

Selected/Recent Publications

MotionStream: Real-Time Video Generation with Interactive Motion Controls

Learning an Image Editing Model without Image Editing Pairs

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Long-Context State-Space Video World Models

X-Fusion: Introducing New Modality to Frozen Large Language Models

From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

Magic3D: High-Resolution Text-to-3D Content Creation

eDiff-I: Text-to-Image Diffusion Models with Ensemble of Expert Denoisers

Multimodal Conditional Image Synthesis with Product-of-Experts GANs

PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows

Multimodal Unsupervised Image-to-Image Translation

Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization

Stacked Generative Adversarial Networks

Teaching

Blog

Student mentees/interns