The Download: impressive new AI capabilities
OpenAI teases an amazing new generative video model called Sora
OpenAI has built a striking new generative video model called Sora that can take a short text description and turn it into a detailed, high-definition film clip up to a minute long. It’s seriously impressive-looking.
Based on four sample videos that OpenAI shared with MIT Technology Review, the firm has pushed the envelope of what’s possible with text-to-video generation (a hot new research direction that we flagged as a trend to watch in 2024).
It’s hard to know exactly how impressive a step this is until we get more information from OpenAI—and we may have a wait on our hands. The company has no plans to release it to the public currently, though it does hope to in future. For now, mindful of the potential for misuse, OpenAI will be doing extensive safety testing. Read the full story—and check out some of the videos!
—Will Douglas Heaven
Google’s new version of Gemini can handle far bigger amounts of data
The news: Google DeepMind has launched the next generation of its powerful artificial-intelligence model Gemini, which has an enhanced ability to work with large amounts of video, text, and images.
For example: In one demonstration video shown by Google, the model was fed the 402-page transcript of the Apollo moon landing mission. Then they showed Gemini a hand-drawn sketch of a boot, and asked it to identify the moment in the transcript that the drawing represents. The model was also able to identify moments of humor.
What it means: These sorts of AI capabilities are very impressive, Oren Etzioni, former technical director of the Allen Institute for Artificial Intelligence, told us. However, he did give one major caveat: “Never trust an AI demo.” Read the full story.