Since we first announced captions in Google Video and YouTube, we’ve introduced multiple caption tracks, improved search functionality and even automatic translation. Each of these features has had great personal significance to me, not only because I helped to design them, but also because I’m deaf. Today, I’m in Washington, D.C. to announce what I consider the most important and exciting milestone yet: machine-generated automatic captions.
Since the original launch of captions in our products, we’ve been happy to see growth in the number of captioned videos on our services, which now number in the hundreds of thousands. This suggests that more and more people are becoming aware of how useful captions can be. As we’ve explained in the past, captions not only help the deaf and hearing impaired, but with machine translation, they also enable people around the world to access video content in any of 51 languages. Captions can also improve search and even enable users to jump to the exact parts of the videos they’re looking for.
However, like everything YouTube does, captions face a tremendous challenge of scale. Every minute, 20 hours of video are uploaded. How can we expect every video owner to spend the time and effort necessary to add captions to their videos? Even with all of the captioning support already available on YouTube, the majority of user-generated video content online is still inaccessible to people like me.
To help address this challenge, we’ve combined Google’s automatic speech recognition (ASR) technology with the YouTube caption system to offer automatic captions, or auto-caps for short. Auto-caps use the same voice recognition algorithms in Google Voice to automatically generate captions for video. The captions will not always be perfect (check out the video below for an amusing example), but even when they’re off, they can still be helpful—and the technology will continue to improve with time.
In addition to automatic captions, we’re also launching automatic caption timing, or auto-timing, to make it significantly easier to create captions manually. With auto-timing, you no longer need to have special expertise to create your own captions in YouTube. All you need to do is create a simple text file with all the words in the video and we’ll use Google’s ASR technology to figure out when the words are spoken and create captions for your video. This should significantly lower the barriers for video owners who want to add captions, but who don’t have the time or resources to create professional caption tracks.
To learn more about how to use auto-caps and auto-timing, check out this short video and our help center article:
You should see both features available in English by the end of the week. For our initial launch, auto-caps are only visible on a handful of partner channels (list below*). Because auto-caps are not perfect, we want to make sure we get feedback from both viewers and video owners before we roll them out more broadly. Auto-timing, on the other hand, is rolling out globally for all English-language videos on YouTube. We hope to expand these features for other channels and languages in the future. Please send us your feedback to help make that happen.
Today I’m more hopeful than ever that we’ll achieve our long-term goal of making videos universally accessible. Even with its flaws, I see the addition of automatic captioning as a huge step forward.
* Partners for the initial launch of auto-caps: UC Berkeley, Stanford, MIT, Yale, UCLA, Duke, UCTV, Columbia, PBS, National Geographic, Demand Media, UNSW and most Google & YouTube channels.
Posted by Ken Harrenstien, Software Engineer