Convert WebVTT to a Transcript using Python


YouTube showing subtitles.

I want to convert YouTube's auto-generated subtitles into a plain transcript. Why is this so hard? This blog post gives a more detailed explanation than my answer to this StackOverflow question. Here's what the subtitles look like when you view a video: And here's what the code which generates those subtitles looks like: 00:00:00.930 --> 00:00:03.080 align:start position:0% and<00:00:01.230><c> now</c><00:00:01.439><c> can</c><00:00:01.709><c> we</c><00:00:01.800><c>…

Continue reading →