Video generation models as world simulators

February 15, 2024 Steve

We discover large-scale coaching of generative models on video information. Specifically, we prepare text-conditional diffusion models collectively on movies and pictures of variable durations, resolutions and side ratios. We leverage a transformer structure that operates on spacetime patches of video and picture latent codes. Our largest mannequin, Sora, is able to producing a minute of excessive constancy video. Our outcomes recommend that scaling video generation models is a promising path in the direction of constructing normal goal simulators of the bodily world.

You May Also Like

New embedding models and API updates

Transform Legacy Apps to Microservices using the DevOps Approach

Bounding Box Deep Learning: The Future of Video Annotation