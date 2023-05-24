



Google just merged DeepMind and Google Brain into one big AI team, and on Wednesday, the new Google DeepMind integrated one of the company’s Visual Language Models (VLMs) into YouTube shorts to boost discoverability. shared details about how it is used to generate the description of

Short-shorts are created in just minutes and often lack descriptions and helpful titles, making them difficult to find in searches, Deepmind said in the post. Flamingo can explain these things by analyzing the first frame of the video and explaining what’s going on. (DeepMind cites the example of a dog balancing a pile of crackers on its head.) Text descriptions are stored as metadata to better categorize videos and make search results more accessible to viewers. match the query.

Colin Murdoch, chief business officer of Google DeepMinds, told The Verge that this solves the real problem. For short videos, creators may not add metadata because the process of creating videos is more streamlined than for long-form videos. Todd Sherman, director of product management for short videos, said that short videos are primarily viewed in feeds, and people simply swipe to the next video instead of actively browsing through them, so it’s important to keep metadata in mind. I added that I don’t have much motivation to add.

Sherman says the Flamingo model’s ability to understand these videos and provide descriptive text is invaluable in helping systems that are already looking for this metadata. This allows users to understand these videos more effectively and match them when users search for them.

The generated description is not user friendly. Sherman said they were talking about metadata behind the scenes. We do not release this information to our creators, but great efforts have been made to verify its accuracy. As to how Google makes sure these descriptions are accurate, Sherman said all descriptions will be in line with our liability standards. It’s very unlikely you’ll generate descriptive text that somehow frames the video in bad light. That’s not the result we expected at all.

Flamingo already applies auto-generated descriptions to new short video uploads

According to DeepMind spokesperson Duncan Smith, Flamingo is already applying auto-generated descriptions to new short video uploads, even to a large corpus of existing videos, including the most-viewed ones. It is said to apply.

I had to ask if Flamingo would apply to long-form videos on YouTube in the future. Sherman said it could well be the case. I think it’s probably less necessary. He points out that for long-form videos, creators can spend hours in pre-production, shooting, editing, etc., so adding metadata is a relatively small part of the video creation process. increase. Also, since users often watch long-form videos based on titles, thumbnails, etc., creators creating videos are motivated to add metadata that helps with discoverability.

So I think the answer is that we need to wait and see. But given Google’s big push to incorporate AI into nearly everything it offers, applying something like Flamingo to long-form YouTube videos is out of the question. I doubt it, and it could have a big impact on YouTube search in the future.

