Veo is a text-to-video model developed by Google DeepMind. Like all text-to-video models, it uses generative artificial intelligence to generate video based on user prompt engineering.
In May 2024, a multimodal video generation model called Veo was announced at Google I/O 2024.[1] Google claimed that it could generate 1080p videos beyond a minute long.[1] In December 2024, Google released Veo 2, available via VideoFX. It supports 4K resolution video generation, and has an improved understanding of physics.[2] In April 2025, Google announced that Veo 2 became available for advanced users on Gemini App.[3] In May 2025, Google released Veo 3, which not only generates videos but also creates synchronized audio — including dialogue, sound effects, and ambient noise — to match the visuals.[4][5] Google also announced Flow, a video-creation tool powered by Veo and Imagen.[6]
A key innovation of the May 2025 release of Veo 3 was that it generated music and voice to match well with the video.[7] Google DeepMind CEO Demis Hassabis described the release as the moment when AI video generation left the era of the silent film.[7]
A reporter for Gizmodo reacted to the release of Veo 3 by observing that users directed the model to generate low-quality content, such as man on the street interviews or haul videos of people unboxing products.[8] Another media commentator reported that the tool tended to repeat the same joke in response to difference prompts.[9]
Commentators speculated that Google had trained the service on YouTube videos[7] or Reddit posts[9]. Google itself had not stated the source of its training content.[7]