This new open-source AI, CogVideoX, could change the way we create videos forever

Subscribe to our daily and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI coverage. Learn more

Researchers at Tsinghua University and Zhipu AI have released CogVideoX, an open-source text-to-video model that threatens to disrupt the AI landscape dominated by startups like Runway, Luma AI, and Pika Labs. This breakthrough, detailed in a recent arXiv paper, puts expanded video generation capabilities in the hands of developers worldwide.

??Hot new release: CogVideoX-5B, a new text-to-video model from @thukeg Group (the group behind the GLM LLM series)
– More examples from the 5B model in this thread?
– GPU VRAM requirement for diffusers: 20.7 GB for BF16 and 11.4 GB for INT8
– Conclusion for 50 steps in BF16: 90s in … pic.twitter.com/GAyWmst5GW
— Gradio (@Gradio) 27 August 2024

CogVideoX generates high-quality, coherent videos of up to six seconds in length from text prompts. According to the researchers’ benchmarks, the model outperforms well-known competitors such as VideoCrafter-2.0 and OpenSora in several metrics.

The crown jewel of the project, CogVideoX-5B, has 5 billion parameters and produces video at 720 × 480 resolution at 8 frames per second. While these specifications are not comparable to the latest proprietary systems, the real innovation of CogVideoX lies in its open source nature.

How open source models ensure equal opportunities

By making its code and model weights publicly available, the Tsinghua team has democratized a technology that was previously exclusive to well-funded tech companies. This move could accelerate progress in AI-generated video by harnessing the collective power of the global developer community.

The researchers achieved CogVideoX’s impressive performance through several technical innovations. They implemented a 3D Variational Autoencoder (VAE) to efficiently compress videos and developed an “Expert Transformer” to improve text-video alignment.

CogVideoX just released the weights for his 5B model! ? ✨
It is the best open weight text to video model – competitive with Runway / Luma / Pika. With ?@diffuserslibit fits
(ah, and they changed the smaller 2B model license to Apache 2.0?) pic.twitter.com/5fxAk6BuLv
— apolinario? (@multimodalart) 27 August 2024

“To improve the matching between videos and texts, we propose an expert transformer with an expert-adaptive LayerNorm to facilitate the fusion between the two modalities,” the paper states. This advancement enables a more sophisticated interpretation of text prompts and more accurate video generation.

The release of CogVideoX represents a significant shift in the AI landscape. Smaller companies and individual developers now have access to capabilities that were previously unavailable to them due to limited resources. This leveling of opportunity could unleash a wave of innovation in industries ranging from advertising and entertainment to education and scientific visualization.

The double-edged sword: Balancing innovation and ethical concerns in AI video creation

However, the widespread availability of such powerful technology is not without risks. The potential for misuse in creating deepfakes or misleading content is a real problem that the AI community must grapple with. Researchers are aware of these ethical implications and call for responsible use of the technology.

As AI-generated videos become more accessible and sophisticated, we are entering new territory in the digital content creation space. The release of CogVideoX could mark a turning point, shifting the balance of power away from the larger players in the space and toward a more distributed, open-source model of AI development.

CogVideoX 5B – Open-weight text-to-video AI model is here and fits Luma/Runway/Pika! ?
Powered by diffusers – requires less than 10GB of VRAM to run inference! ⚡
Check out the free demo below to play with it! pic.twitter.com/Q0YT0RIpGb
— Vaibhav (VB) Srivastav (@reach_vb) 27 August 2024

The true impact of this democratization remains to be seen. Will it usher in a new era of creativity and innovation, or exacerbate existing problems related to misinformation and digital manipulation? As the technology continues to evolve, policymakers and ethicists will need to work closely with the AI community to establish guidelines for responsible development and use.

What is certain, however, is that with the launch of CogVideoX, the future of AI-generated video is no longer confined to Silicon Valley labs. It is in the hands of developers around the world, for better or for worse.

VB Daily

Stay up to date! Get the latest news in your inbox every day

By signing up, you agree to VentureBeat’s Terms of Service.

Thank you for your subscription. You can find more VB newsletters here.

An error has occurred.

How open source models ensure equal opportunities

The double-edged sword: Balancing innovation and ethical concerns in AI video creation

Related Posts

Where would Marc Marquez have ended up without the chaos in Turn 1 at the Austrian MotoGP? | MotoGP

The week in classical music: Prom 37: Britten’s War Requiem; The Turn of the Screw – review | Classical music

Gundam Breaker 4 Review – Want to Build a Gundam?

Leave a Reply Cancel reply