AI Makes Linux Do What You Mean, Not What You Say

We are always envious of the Star Trek Enterprise computers. You can just sort of ask them a hazy question and they will — usually — figure out what you want. Even the automatic doors seemed to know the difference between someone walking into a turbolift versus someone being thrown into the door during a fight. [River] decided to try his new API keys for the private beta of an AI service to generate Linux commands based on a description. How does it work? Watch the video below and find out.

Some examples work fairly well. In response to “email the Rickroll video to Jeff Bezos,” the system produced a curl command and an e-mail to what we assume is the right place. “Find all files in the current directory bigger than 1 GB” works, too.

Continue reading “AI Makes Linux Do What You Mean, Not What You Say”

AI Upscaling And The Future Of Content Delivery

The rumor mill has recently been buzzing about Nintendo’s plans to introduce a new version of their extremely popular Switch console in time for the holidays. A faster CPU, more RAM, and an improved OLED display are all pretty much a given, as you’d expect for a mid-generation refresh. Those upgraded specifications will almost certainly come with an inflated price tag as well, but given the incredible demand for the current Switch, a $50 or even $100 bump is unlikely to dissuade many prospective buyers.

But according to a report from Bloomberg, the new Switch might have a bit more going on under the hood than you’d expect from the technologically conservative Nintendo. Their sources claim the new system will utilize an NVIDIA chipset capable of Deep Learning Super Sampling (DLSS), a feature which is currently only available on high-end GeForce RTX 20 and GeForce RTX 30 series GPUs. The technology, which has already been employed by several notable PC games over the last few years, uses machine learning to upscale rendered images in real-time. So rather than tasking the GPU with producing a native 4K image, the engine can render the game at a lower resolution and have DLSS make up the difference.

The current model Nintendo Switch

The implications of this technology, especially on computationally limited devices, is immense. For the Switch, which doubles as a battery powered handheld when removed from its dock, the use of DLSS could allow it to produce visuals similar to the far larger and more expensive Xbox and PlayStation systems it’s in competition with. If Nintendo and NVIDIA can prove DLSS to be viable on something as small as the Switch, we’ll likely see the technology come to future smartphones and tablets to make up for their relatively limited GPUs.

But why stop there? If artificial intelligence systems like DLSS can scale up a video game, it stands to reason the same techniques could be applied to other forms of content. Rather than saturating your Internet connection with a 16K video stream, will TVs of the future simply make the best of what they have using a machine learning algorithm trained on popular shows and movies?

Continue reading “AI Upscaling And The Future Of Content Delivery”

sample of automatically generated comics

Read Your Movies As Automatically Generated Comic Books

A research paper from Dalian University of Technology in China and City University of Hong Kong (direct PDF link) outlines a system that automatically generates comic books from videos. But how can an algorithm boil down video scenes to appropriately reflect the gravity of the scene in a still image? This impressive feat is accomplished by saving two still images per second, then segments the frames into scenes through analysis of region-of-interest and importance ranking.

movie to comic book pipeline diagram

For its next trick, speech for each scene is processed by combining subtitle information with the audio track of the video. The audio is analyzed for emotion to determine the appropriate speech bubble type and size of the subtitle text. Frames are even analyzed to establish which person is speaking for proper placement of the bubbles. It can then create layouts of the keyframes, determining panel sizes for each page based on the region-of-interest analysis.

The process is completed by stylizing the keyframes with flat color through quantization, for that classic cel shading look, and then populating the layouts with each frame and word balloon.

The team conducted a study with 40 users, pitting their results against previous techniques which require more human intervention and still besting them in every measure. Like any great superhero, the team still sees room for improvement. In the future, they would like to improve the accuracy of keyframe selection and propose using a neural network to do so.

Thanks to [Qes] for the tip!

Twitter: It’s Not The Algorithm’s Fault. It’s Much Worse.

Maybe you heard about the anger surrounding Twitter’s automatic cropping of images. When users submit pictures that are too tall or too wide for the layout, Twitter automatically crops them to roughly a square. Instead of just picking, say, the largest square that’s closest to the center of the image, they use some “algorithm”, likely a neural network, trained to find people’s faces and make sure they’re cropped in.

The problem is that when a too-tall or too-wide image includes two or more people, and they’ve got different colored skin, the crop picks the lighter face. That’s really offensive, and something’s clearly wrong, but what?

A neural network is really just a mathematical equation, with the input variables being in these cases convolutions over the pixels in the image, and training them essentially consists in picking the values for all the coefficients. You do this by applying inputs, seeing how wrong the outputs are, and updating the coefficients to make the answer a little more right. Do this a bazillion times, with a big enough model and dataset, and you can make a machine recognize different breeds of cat.

What went wrong at Twitter? Right now it’s speculation, but my money says it lies with either the training dataset or the coefficient-update step. The problem of including people of all races in the training dataset is so blatantly obvious that we hope that’s not the problem; although getting a representative dataset is hard, it’s known to be hard, and they should be on top of that.

Which means that the issue might be coefficient fitting, and this is where math and culture collide. Imagine that your algorithm just misclassified a cat as an “airplane” or as a “lion”. You need to modify the coefficients so that they move the answer away from this result a bit, and more toward “cat”. Do you move them equally from “airplane” and “lion” or is “airplane” somehow more wrong? To capture this notion of different wrongnesses, you use a loss function that can numerically encapsulate just exactly what it is you want the network to learn, and then you take bigger or smaller steps in the right direction depending on how bad the result was.

Let that sink in for a second. You need a mathematical equation that summarizes what you want the network to learn. (But not how you want it to learn it. That’s the revolutionary quality of applied neural networks.)

Now imagine, as happened to Google, your algorithm fits “gorilla” to the image of a black person. That’s wrong, but it’s categorically differently wrong from simply fitting “airplane” to the same person. How do you write the loss function that incorporates some penalty for racially offensive results? Ideally, you would want them to never happen, so you could imagine trying to identify all possible insults and assigning those outcomes an infinitely large loss. Which is essentially what Google did — their “workaround” was to stop classifying “gorilla” entirely because the loss incurred by misclassifying a person as a gorilla was so large.

This is a fundamental problem with neural networks — they’re only as good as the data and the loss function. These days, the data has become less of a problem, but getting the loss right is a multi-level game, as these neural network trainwrecks demonstrate. And it’s not as easy as writing an equation that isn’t “racist”, whatever that would mean. The loss function is being asked to encapsulate human sensitivities, navigate around them and quantify them, and eventually weigh the slight risk of making a particularly offensive misclassification against not recognizing certain animals at all.

I’m not sure this problem is solvable, even with tremendously large datasets. (There are mathematical proofs that with infinitely large datasets the model will classify everything correctly, so you needn’t worry. But how close are we to infinity? Are asymptotic proofs relevant?)

Anyway, this problem is bigger than algorithms, or even their writers, being “racist”. It may be a fundamental problem of machine learning, and we’re definitely going to see further permutations of the Twitter fiasco in the future as machine classification is being increasingly asked to respect human dignity.

Boost Your Animation To 60 FPS Using AI

The uses of artificial intelligence and machine learning continue to expand, with one of the more recent implementations being video processing. A new method can “fill in” frames to smooth out the appearance of the video, which [LegoEddy] was able to use this in one of his animated LEGO movies with some astonishing results.

His original animation of LEGO figures and sets was created at 15 frames per second. As an animator, he notes that it’s orders of magnitude more difficult to get more frames than this with traditional methods, at least in his studio. This is where the artificial intelligence comes in. The program is able to interpolate between frames and create more frames to fill the spaces between the original. This allowed [LegoEddy] to increase his frame rate from 15 fps to 60 fps without having to actually create the additional frames.

While we’ve seen AI create art before, the improvement on traditionally produced video is a dramatic advancement. Especially since the AI is aware of depth and preserves information about the distance of objects from the camera. The software is also free, runs on any computer with an appropriate graphics card, and is available on GitHub.

Continue reading “Boost Your Animation To 60 FPS Using AI”

Let This Crying Detecting Classifier Offer Some Much Needed Reprieve

Baby monitors are cool, but [Ish Ot Jr.] wanted his to only transmit sounds that required immediate attention and filter any non-emergency background noise. Posed with this problem, he made a baby monitor that would only send alerts when his baby was crying.

For his project, [Ish] used an Arduino Nano 33 BLE Sense due to its built-in microphone, sizeable RAM for storing large chunks of data, and it’s BLE capabilities for later connecting with an app. He began his project by collecting background noise using Edge Impulse Studio’s data acquisition functionality. [Ish] really emphasized that Edge Impulse was really doing all the work for him. He really just needed to collect some test data and that was mostly it on his part. The work needed to run and test the Neural Network was taken care of by Edge Impulse. Sounds handy, if you don’t mind offloading your data to the cloud.

[Ish] ended up with an 86.3% accurate classifier which he thought was good enough for a first pass at things. To make his prototype a bit more “finished”, he added some status LEDs, providing some immediate visual feedback of his classifier and to notify the caregiver. Eventually, he wants to add some BLE support and push notifications, alerting him whenever his baby needs attention.

We’ve seen a couple of baby monitor projects on Hackaday over the years. [Ish’s] project will most certainly be a nice addition to the list.

Facial Detection With Pi + MATLAB

[Monica] wanted to try a bit of facial detection with her Raspberry Pi and she found some pretty handy packages in MATLAB to help her do just that. The packages are based on the Viola-Jones algorithm which was the first real-time object detection framework for facial detection.

She had to download MATLAB’s Raspbian image to allow the Pi to interpret MATLAB commands over a custom server. That setup is mostly pretty easy and she does a good job walking you through the setup on her project page.

With that, now she can control the Pi in MATLAB: configure the camera, toggle GPIO, etc. The real fun comes with the facial detection program. In addition to opening up a live video feed of the Pi camera, the program outputs pixel data. [Monica] was mostly just testing the stock capabilities, but wants to try detecting other objects next. We’ll see what cool modifications she’s able to come up with.

If MATLAB doesn’t quite fit your taste, we have a slew of facial detection projects on Hackaday.