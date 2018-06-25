Nvidia is back at it again with another awesome demo of applied machine learning: artificially transforming standard video into slow motion – they’re so good at showing off what AI can do that anyone would think they were trying to sell hardware for it.
Though most modern phones and cameras have an option to record in slow motion, it often comes at the expense of resolution, and always at the expense of storage space. For really high frame rates you’ll need a specialist camera, and you often don’t know that you should be filming in slow motion until after an event has occurred. Wouldn’t it be nice if we could just convert standard video to slow motion after it was recorded?
That’s just what Nvidia has done, all nicely documented in a paper. At its heart, the algorithm must take two frames, and artificially create one or more frames in between. This is not a manual algorithm that interpolates frames, this is a fully fledged deep-learning system. The Convolutional Neural Network (CNN) was trained on over a thousand videos – roughly 300k individual frames.
Since none of the parameters of the CNN are time-dependent, it’s possible to generate as many intermediate frames as required, something which sets this solution apart from previous approaches. In some of the shots in their demo video, 30fps video is converted to 240fps; this requires the creation of 7 additional frames for every pair of consecutive frames.
The video after the break is seriously impressive, though if you look carefully you can see the odd imperfection, like the hockey player’s skate or dancer’s arm. Deep learning is as much an art as a science, and if you understood all of the research paper then you’re doing pretty darn well. For the rest of us, get up to speed by wrapping your head around neural networks, and trying out the simplest Tensorflow example.
15 thoughts on “Nvidia Transforms Standard Video Into Slow Motion Using AI”
This is cool for making cool demos, but not useful for anything where slow motion is actually useful. It can’t add information that isn’t there, so for answering questions like, Who won the race? or, Was the bar grounded before the wicket was broken, this is literally just making it up. It won’t stop people trying to use it to “prove” their team should have won, though.
Slow motion is useful for dramatic effect. And in those cases, accuracy is not terribly important.
That’s right, you KNOW that this sort of technology WILL be used for determining who won the race.
Sorry to be the “slippery slope” guy, but it’s bad enough that CG is at the point where it takes serious analysis to determine whether or not something is real, and this just adds to the problem. Congratulations Nvidia, we can officially no longer believe anything we see.
The information is there to some extent though. Things like velocity and acceleration of an object can be determined, which will determine which thing will happen first.
For it to even look right most of this physics “knowledge” must be there.
Sure, if the *only* reason you can think of for doing slomo is for formal scientific research.
I don’t know if what the Slo-Mo guys are doing counts as anywhere near that.
Also, if you speed it back up you can also use it for better high-framerate results than the current kinda-shitty algorithms that modern TV’s use to interpolate broadcast content up to 100Hz.
Is ‘Deep Learning” like going to university, then working in industry for some time or more like standardised extrapolation using a cost effective set of ‘rules’. The special effect is fun but the marketing speak is atrocious.
I think this is deeply disturbing.
Let me explain. We’re seeing this kind of approach already applied in photography: the tiny sensors in smartphones aren’t up to the task of taking good photos. Plus, the photographers are folks like you and me, and not highly trained experts. So taken face-value, the results, as-is would be horribly mushy things. You can’t sell a premium smartphone like that.
But those smartphones have processing power…
That’s all that “…taken with an iPhone 6” thing. There’s already a lot of canned models generated with machine learning in those things.
Half in jest I tell to my friends: actually the smartphone doesn’t need a camera sensor at all. Just its geocoordinates and current attitude (accelerometer + compass), a bit of Google Street View, a dash of video surveillance data (enough cameras with open ports, ask Shodan), and Da Goog knows how the scene should look like.
Now to the disturbing part: those things aren’t showing us “what’s there”, but “what the model ‘thinks’ is there”, without us suck^H^H^H^H end-users having much insight in, let alone control over that model (heck, the device manufacturer itself doesn’t probably know what’s exactly in those models).
How is that going to influence our perception of the world?
I don’t like that idea very much.
You don’t see what’s actually there either. If you did optical illusions wouldn’t work.
And even what you see changes what you hear (“The McGurk Effect”). https://www.youtube.com/watch?v=G-lN8vWm3m0
That’s right, and I thought of that too.
Actually it is a good analogy to what is happening. Our vision system, starting from the several layers in the retina and passing through the visual cortex is already a complex system with tons of quirks to help us understand what we “see”, and which has its big share of quirky, sometimes funny failure modes.
The difference is that the natural vison “is there”, and can be (and is) investigated, and we have a solid body of research on it.
“We” (that is, our conscience, our judgement) have co-evolved with it for… uh… ages.
This new “perception filters”, if you allow the term, are emerging so quickly that we’re bound to see “interesting” effects.
“Interesting” in the sense of the (reportedly apocryphal) Chinese curse.
Yes. It’s another example of “Humans need not apply.”
That’s exactly how a normal brain works.
We don’t see (perceive) what is actually there, but just a rough draft, or model, of the reality. Then the brain completes the image with details and ‘stuff’ added from our own individual past experiences of similar circumstances. This happens at all the time scales, from the range of ms to the range of long term memories and history.
Right now, we are on the path of externalizing our own “brains’ main job” to an AI. Controlled by just a few.
It won’t happen over night, but it’s just a matter of time.
This will be disruptive in many, many ways.
Your concerns are fully entitled.
Welcome to THE HIVE!!!
:o)
Push all the Jupiter footage through it!! :D
There is already software that can do this, e.g. http://slowmovideo.granjow.net/
So the main difference is that this is now done with machine learning. Would have been nice for them to have a comparison against existing state-of-the-art interpolation programs.
The biggest gotcha is that to get good results, you need to have a source video with global shutter camera and short exposure time. A typical mobile phone video will have way too much motion blur and distortion to yield anything except a blurry object moving across the screen.
Ah, the actual paper has a comparison.