Speech-to-text (STT) technology is in its moment of combustion, moving from early-adopters to widespread, mainstream use. Many high-quality products are out there already. As it becomes the new normal, will we make room for the Deaf, or repeat history?
The value of this precise moment is that you have the opportunity to witness or participate in what will become the new normal for these technologies.
Here’s the deal: Google’s “meet” now offers free, live, automatic closed-caption (CC). This is a great win for language barriers between sign / speech, and comes after an already great improvement. Google’s Youtube has had auto CC for …a year? More, maybe. Otter.ai is an excellent live STT/CC method, and the free version gives 600 minutes of transcription–good, but not perfect. Zoom, similarly, hopes to make money from CC technology.
All of these are pretty damn good- maybe 90-95% accurate, but 5-10% is still a problem.
Now, there’s a product I recently discovered and am obsessed with: Descript. It’s a video editor that transcribes, and then lets you edit the text, which simultaneously edits the video. (It does an amazing job! I LOVE IT. no more “um” and “uh”s. Delete them.) This is revolutionary for video editing of high-volume content. Descript’s market is the amateur videographer vlogger and influencer; 21st century gossiper. Their product can also render a CC file, making accessibility literally a click of a button. In early adoption, their free version offers a good deal, but then you’ll need to pay to rely on it.
As the market evolves beyond early adopters, will we see more closed captioning or will it be put behind a paywall like Zoom? Even if free or accessible, will it simply be too inconvenient to add to our videos? It’s easy to hate on Google, but I’m seeing a pattern that Google is both being a trendsetter, and on the right side of equity. I’m a supporter. This is a key moment in technology. Keep your eyes open and words transcribed.