Photo of Fan Bao

Artificial intelligence & robotics

Fan Bao

He developed Vidu, a large-scale video generation model with long duration, high consistency, and high dynamism.

Year Honored
2023

Organization
ShengShu

Region
China

Hails From
China
Fan Bao produced many internationally well-known works in the field of diffusion models, with representative works including Analytic-DPM, U-ViT, UniDiffuser, and Vidu.

Among all, Fan’s most proud work to date is Vidu, a large-scale video generation model with long duration, high consistency, and high dynamism. It can produce 1080p videos up to 16 seconds in a single generation. This work integrates all his previous efforts in diffusion models, including his long-term dedication to fundamental theory, network architecture, and probabilistic modeling​.

Analytic-DPM is a novel and elegant training-free inference framework that estimates the analytic forms of the variance and KL divergence using the Monte Carlo method and a pretrained score-based model. The method proposed in Fan’s published paper was adopted as a key technology in OpenAI's large-scale image-text generation system, DALL·E 2​.

In addition, in terms of network architecture, Fan proposed the U-ViT architecture, which combines Diffusion and Transformer. U-ViT is the first diffusion transformer architecture, achieving state-of-the-art (SoTA) results in image generation. It laid the architectural foundation for multimodal diffusion models. UniDiffuser, based on the U-ViT architecture, unifies the learning of marginal, conditional, and joint distributions into a paradigm for predicting noise in perturbed data. With minimal modifications to the original diffusion model, UniDiffuser can perform image, text, text-to-image, image-to-text, and image-text pair generation within a single model​.

Fan’s goal is to develop a truly deployable general multimodal large-scale model, enabling a model to uniformly understand various input patterns, thus being able to flexibly accomplish various controllable generation tasks. Currently, as a co-founder and CTO, he has founded a company called ShengShu, which specializes in the industrialization of multimodal large-scale models.​