Veo (text-to-video model)

Veo
Developer(s)	Google DeepMind
Initial release	2024; 1 year ago
Type	Text-to-video model
Website	deepmind.google/models/veo/

Veo is a text-to-video model developed by Google DeepMind. Like all text-to-video models, it uses generative artificial intelligence to generate video based on user prompt engineering.

Development

In May 2024, a multimodal video generation model called Veo was announced at Google I/O 2024.^[1] Google claimed that it could generate 1080p videos beyond a minute long.^[1] In December 2024, Google released Veo 2, available via VideoFX. It supports 4K resolution video generation, and has an improved understanding of physics.^[2] In April 2025, Google announced that Veo 2 became available for advanced users on Gemini App.^[3] In May 2025, Google released Veo 3, which not only generates videos but also creates synchronized audio — including dialogue, sound effects, and ambient noise — to match the visuals.^[4]^[5] Google also announced Flow, a video-creation tool powered by Veo and Imagen.^[6]

A key innovation of the May 2025 release of Veo 3 was that it generated music and voice to match well with the video.^[7] Google DeepMind CEO Demis Hassabis described the release as the moment when AI video generation left the era of the silent film.^[7]

Reactions

A reporter for Gizmodo reacted to the release of Veo 3 by observing that users directed the model to generate low-quality content, such as man on the street interviews or haul videos of people unboxing products.^[8] Another media commentator reported that the tool tended to repeat the same joke in response to difference prompts.^[9]

Commentators speculated that Google had trained the service on YouTube videos^[7] or Reddit posts^[9]. Google itself had not stated the source of its training content.^[7]