There’s a lot of interesting content produced on video these days. Invariably, though, when we post something some comments will appear lamenting that a video isn’t the most efficient way to disseminate technical information. We have mixed feelings. Some things benefit from being able to see, for example, a screencast. Some people like the human connection of seeing an instructor interact with a class instead of just reading. But we will admit that sometimes a video takes longer to watch, especially if it is full of pauses. Unsilence is a tool from [labmoellertim] that can fix that. The command line tool takes a video and strips out the parts that are silent. You can also use it as a Python library if you want to build your own tools using the technique.
If you’ve ever taken a class online, it isn’t uncommon to speed up a video so you can get through class faster. This works to a point, but removing or speeding up silent gaps means you don’t have to “listen faster.” Of course, you could still speed up the video, too.
The tool can detect silent versus audible content and can do several operations. By default, it speeds up silent parts by a factor of 6. You can change the speed of either part, of course. You can also change the volume — presumably muting silence. The fact that it speeds up the silent parts is disconcerting at first, but after watching a bit, you realize it helps you understand what’s going on in many cases.
As an example, an MIT Python lecture (see videos below) clocks in at 9:45, but after processing takes under 8 minutes. Saving not quite two minutes might not sound like a lot, but for such a short clip it works out to almost 19%. For an hour lecture that could add up to nearly 12 minutes. Of course, a lot will depend on the style of the speaker and the video. Some videos may save more time; others less.
Unfortunately, you do need the video file locally so if you want to apply this to a YouTube video, you’ll need a way to download it first. That’s relatively easy to do, but it kills the immediacy of just watching a video in your browser.
Now if we could just skip the commercials. Then again, some of our favorite videos have no words, but they do sometimes have music, and that would prevent the tool from working.
 
            
 
 
    									 
    									 
    									 
    									 
			 
			 
			 
			 
			 
			 
			 
			 
			 
			
Not that convincing, imho. Cannot drop repetitions that are mandatory to catch live audience attention on a topic, and doesn’t address the issue of “I know I’ve heard this bearded guy talking about that interesting topic in his lab” into something we can search for.
Good point. I never considered it but “beard guy with those ring toy things” isn’t likely to find the videos.
Now I’m trying to figure if it’d be scarier for that to work…
1. Typo in name
2 “for such a short clip it works out to almost 19%” – it’s a percentage, the length of the video does not matter.
Comparative data between lectures of the same professor or subjects or .. would be interesting.
sox can detect and remove silence in audio. Not sure about the video libraries.
While a percentage wouldn’t change if you assume silence is evenly distributed in all clips. I don’t think that’s a true statement. I think in a longer video you would more likely have pauses. At least when I lecture that’s true. If I’m talking all day there’s going to be more time when I’m not talking for a variety of reasons. I might be writing on the whiteboard. I might be setting up the next set of PowerPoint charts. So I don’t have anything to prove that longer videos would have more dead time in them but my empirical experience suggests that is probably true. Do you have any proof that silence is evenly distributed in lecture videos?
“Saving not quite two minutes might not sound like a lot, but for such a short clip it works out to almost 19%”
Saving two minutes would work out to be a different percent of a different-length clip.
I can see the potential, but it seems not at all ready for real use. In the example, the pauses that are cut are reduced to nearly zero, leaving no pause at all, and leaving unnatural messes at almost every cut that make it difficult to accurately identify the words immediately on either side of the cut. I think it would be much more intelligible if only pauses over a certain length were reduced, AND they were reduced only to that minimum length. Also, since this is video, not just audio, rather than doing simple cuts – called jump cuts in the editing world because of their jarring nature – the video could be resampled at a higher rate to leave the pause at the desired minimum length without a sudden visual jump.
After all, the objective usually isn’t to hear something as quickly as possible, but to understand it as quickly as possible.
Absolutely not a fan of jump cuts and the end of didactic pauses. I’d like to mention TED as a source of high quality reference material. My first impulse is to blame advertising for this deterioration of prosody.
Well, you can control the speed of the silent parts which I think by default is 6X but you could change it to 2 or 3X and get less of a jumpy effect although you would also get less of a speed up.
There is a similar project from last year made by a youtube called carykh: https://github.com/carykh/jumpcutter
https://www.youtube.com/watch?v=DQ8orIurGxw
Wondering if he’s doing Hanoi’s tower?
Neat idea. Who else watches YouTube at 2X playback speed?
When I can, some I have to knock back to 1.75 to make the words out though.
It would be nice to see something like this that also increases there cadence and removes ums and arrrrhrrhrhrhrrs.
They should be fairly easy to detect, you just need the right training set.
“Uh…. yeah,.. umm, (something mumbled) training set.
That ought to do it, eh?”
B^)
Marco Armet built a realllllly nice “smart speed” option into his iOS podcast app, Overcast. It averages a 1.2x (for the content I listen to) speed up in audio content. My only qualm with it is that certain interstitial music can sound really weird, but other than that it is very naturalistic and does not sound clipped at all. He has talked about his algorithm a bit in his podcasts, but I’d love to see it applied to video.
This sounds like a nice tool but I don’t think it’s going to make people who prefer text suddenly like videos as is sort of implied by the article. I think that has more to do with a preferred learning style which is hard coded in the brain. Some of us just don’t take in information as easily from a lecture as we do from reading a book. Others are the opposite.
this so much. a video absolutely is NOT a replacement for a textual walkthrough with supplementary static art or short-videos demonstrating tasks-in-motion
Pause is often used in teaching as a time to think about what you just heard because it was something important. When you read, you can insert additional pauses anywhere you like. For me, speeding up video has no value.
Real life doesn’t have a pause button, but youtube etc does.
If only the indication of where the pause was supposed to be was still there.
But I know when I need to stop and when I don’t, do you have to buy your TV dinners with a chewing and swallowing schedule?
Well it would be good…. if the output video was not full of atrocious audio artifacts, some of them stupidly loud compared to the rest of the audio, making it unusable!
It’s almost like the processing function didn’t decode and reencode the video, but, for the sped-up parts, just brute-force cut out every nth audio and video frame with no regard of the audio compression algorithm. But then you’d have to wonder why the video isn’t glitching out as well.
It’s certainly … not great … for a demonstration of “save time on spoken words” if, two seconds in, it makes you rip off your headphones off your ears in a panic.
When most people speak, the pauses are there for a reason. To let you digest the information you were just given. I am a big fan of speeding up videos. I do it all the time. This preserves the pauses. They get smaller, but they are still there. It would drive me bat shit crazy to have to listen to too much video with all the pauses snipped out. What might be interesting though would be to use this to figure out when to turn on and off the audio track and perhaps save some bandwidth.
Or you can play picture in picture same time.
You can do this on audio files with Audacity. It has a filter than can remove silences longer than a configurable length. Very useful for podcasts, DnD sessions and other unscripted content. With audio you’ll never know it’s there, it makes stuff so much better to listen to, makes the talkers seem more attentive, more on-point.
Back in my day – the dark ages of 10-15 years ago – we had this crazy thing called “the written word.” With a skill called “skimming” you could quickly learn if a unit of information was relevant in the context of what you wanted to learn, and rapidly pick out the important parts. But it gets better! When it was digitized and indexed, you could easily search for key terms and concepts, increasing the odds that the bit of information you were looking at contained the information you were interested in! If there was a lot of fluff in the text, you could frequently find condensed summaries that referenced the original source. On the internet they called them “Blogs.” Some of these are even still around. I recommend checking out “hackaday.com”
For Android there’s a YouTube frontend app called newpipe on f-droid which has fast forward during silence baked in and it works on the fly and can be easily toggled on and off. Works better than the demo above.
https://newpipe.schabi.org/
After reading this article, I stumbled across this: https://www.youtube.com/watch?v=r-rSOSx–0w
This can do it realtime on youtube videos, plus can add a general speed factor, and does not result in the weird audio artifacts present in the video linked in the article.