Attempting To Generate Photorealistic Video With Neural Networks

October 17, 2020 by Lewin Day 17 Comments

Over the past decade, we’ve seen great strides made in the area of AI and neural networks. When trained appropriately, they can be coaxed into generating impressive output, whether it be in text, images, or simply in classifying objects. There’s also much fun to be had in pushing them outside their prescribed operating region, as [Jon Warlick] attempted recently.

[Jon]’s work began using NVIDIA’s GauGAN tool. It’s capable of generating pseudo-photorealistic images of landscapes from segmentation maps, where different colors of a 2D image represent things such as trees, dirt, or mountains, or water. After spending much time toying with the software, [Jon] decided to see if it could be pressed into service to generate video instead.

The GauGAN tool is only capable of taking in a single segmentation map, and outputting a single image, so [Jon] had to get creative. Experiments were undertaken wherein a video was generated and exported as individual frames, with these frames fed to GauGAN as individual segmentation maps. The output frames from GauGAN were then reassembled into a video again.

The results are somewhat psychedelic, as one would expect. GauGAN’s single image workflow means there is only coincidental relevance between consecutive frames, creating a wild, shifting visage. While it’s not a technique we expect to see used for serious purposes anytime soon, it’s a great experiment at seeing how far the technology can be pushed. It’s not the first time we’ve seen such technology used to create full motion video, either. Video after the break.

Continue reading “Attempting To Generate Photorealistic Video With Neural Networks” →

Twitter: It’s Not The Algorithm’s Fault. It’s Much Worse.

September 26, 2020 by Elliot Williams 160 Comments

Maybe you heard about the anger surrounding Twitter’s automatic cropping of images. When users submit pictures that are too tall or too wide for the layout, Twitter automatically crops them to roughly a square. Instead of just picking, say, the largest square that’s closest to the center of the image, they use some “algorithm”, likely a neural network, trained to find people’s faces and make sure they’re cropped in.

The problem is that when a too-tall or too-wide image includes two or more people, and they’ve got different colored skin, the crop picks the lighter face. That’s really offensive, and something’s clearly wrong, but what?

A neural network is really just a mathematical equation, with the input variables being in these cases convolutions over the pixels in the image, and training them essentially consists in picking the values for all the coefficients. You do this by applying inputs, seeing how wrong the outputs are, and updating the coefficients to make the answer a little more right. Do this a bazillion times, with a big enough model and dataset, and you can make a machine recognize different breeds of cat.

What went wrong at Twitter? Right now it’s speculation, but my money says it lies with either the training dataset or the coefficient-update step. The problem of including people of all races in the training dataset is so blatantly obvious that we hope that’s not the problem; although getting a representative dataset is hard, it’s known to be hard, and they should be on top of that.

Which means that the issue might be coefficient fitting, and this is where math and culture collide. Imagine that your algorithm just misclassified a cat as an “airplane” or as a “lion”. You need to modify the coefficients so that they move the answer away from this result a bit, and more toward “cat”. Do you move them equally from “airplane” and “lion” or is “airplane” somehow more wrong? To capture this notion of different wrongnesses, you use a loss function that can numerically encapsulate just exactly what it is you want the network to learn, and then you take bigger or smaller steps in the right direction depending on how bad the result was.

Let that sink in for a second. You need a mathematical equation that summarizes what you want the network to learn. (But not how you want it to learn it. That’s the revolutionary quality of applied neural networks.)

Now imagine, as happened to Google, your algorithm fits “gorilla” to the image of a black person. That’s wrong, but it’s categorically differently wrong from simply fitting “airplane” to the same person. How do you write the loss function that incorporates some penalty for racially offensive results? Ideally, you would want them to never happen, so you could imagine trying to identify all possible insults and assigning those outcomes an infinitely large loss. Which is essentially what Google did — their “workaround” was to stop classifying “gorilla” entirely because the loss incurred by misclassifying a person as a gorilla was so large.

This is a fundamental problem with neural networks — they’re only as good as the data and the loss function. These days, the data has become less of a problem, but getting the loss right is a multi-level game, as these neural network trainwrecks demonstrate. And it’s not as easy as writing an equation that isn’t “racist”, whatever that would mean. The loss function is being asked to encapsulate human sensitivities, navigate around them and quantify them, and eventually weigh the slight risk of making a particularly offensive misclassification against not recognizing certain animals at all.

I’m not sure this problem is solvable, even with tremendously large datasets. (There are mathematical proofs that with infinitely large datasets the model will classify everything correctly, so you needn’t worry. But how close are we to infinity? Are asymptotic proofs relevant?)

Anyway, this problem is bigger than algorithms, or even their writers, being “racist”. It may be a fundamental problem of machine learning, and we’re definitely going to see further permutations of the Twitter fiasco in the future as machine classification is being increasingly asked to respect human dignity.

I’m Sorry Dave, You Shouldn’t Write Verilog

September 4, 2020 by Al Williams 54 Comments

We were always envious of Star Trek, for its computers. No programming needed. Just tell the computer what you want and it does it. Of course, HAL-9000 had the same interface and that didn’t work out so well. Some researchers at NYU have taken a natural language machine learning system — GPT-2 — and taught it to generate Verilog code for use in FPGA systems. Ironically, they called it DAVE (Deriving Automatically Verilog from English). Sounds great, but we have to wonder if it is more than a parlor trick. You can try it yourself if you like.

For example, DAVE can take input like “Given inputs a and b, take the nor of these and return the result in c.” Fine. A more complex example from the paper isn’t quite so easy to puzzle out:

Write a 6-bit register ‘ar’ with input
defined as ‘gv’ modulo ‘lj’, enable ‘q’, synchronous
reset ‘r’ defined as ‘yxo’ greater than or equal to ‘m’,
and clock ‘p’. A vault door has three active-low secret
switch pressed sensors ‘et’, ‘lz’, ‘l’. Write combinatorial
logic for a active-high lock ‘s’ which opens when all of
the switches are pressed. Write a 6-bit register ‘w’ with
input ‘se’ and ‘md’, enable ‘mmx’, synchronous reset
‘nc’ defined as ‘tfs’ greater than ‘w’, and clock ‘xx’.

Continue reading “I’m Sorry Dave, You Shouldn’t Write Verilog” →

EasyOCR Makes OCR, Well, Easy

July 9, 2020 by Al Williams 21 Comments

Working on embedded systems used to be easier. You had a microcontroller and maybe a few pieces of analog or digital I/O, and perhaps communications might be a serial port. Today, you have systems with networks and cameras and a host of I/O. Cameras are strange because sometimes you just want an image and sometimes you want to understand the image in some way. If understanding the image involves reading text in the picture, you will want to check out EasyOCR.

The Python library leverages other open source libraries and supports 42 different languages. As the name implies, using it is pretty easy. Here’s the setup:


import easyocr
reader = easyocr.Reader(['th','en'])
reader.readtext('test.jpg')

The results include four points that define the bounding box of each piece of text, the text, and a confidence level. The code takes advantage of the GPU, but you can run it in a CPU-only mode if you prefer. There are a few other options, including setting the algorithm’s scanning behavior, how it handles multiple processors, and how it converts the image to grayscale. The results look impressive.

According to the project’s repository, they incorporated several existing neural network algorithms and conventional algorithms, so if you want to dig into details, there are links provided to both code and white papers. If you need some inspiration for what to do with OCR, maybe this past project will give you some ideas. Or you could cheat at games.

Assistive Specs Help Jog Your Memory

March 1, 2020 by Jenny List 3 Comments

It’s something that can happen to all of us, that we forget things. Young and old, we know things are on our to-do list but in the heat of the moment they disappear from our minds and we miss them. There are a myriad of technological answers to this in the form of reminders and calendars, but [Nick Bild] has come up with possibly the most inventive yet. His Newrons project is a pair of glasses with a machine vision camera, that flashes a light when it detects an object in its field of view associated with a calendar entry.

At its heart is a JeVois A33 Smart Machine Vision Camera, which runs a neural network trained on an image dataset. It passes its sightings to an Arduino Nano IoT fitted with a real-time clock, that pulls appointment information from Google Calendar and flashes the LED when it detects a match between object and event. His example which we’ve placed below the break is a pill bottle triggering a reminder to take the pills.

We like this idea, but can’t help thinking that it has a flaw in that the reminder relies on the object moving into view. A version that tied this in with more conventional reminding based upon the calendar would address this, and perhaps save the forgetful a few problems.

Continue reading “Assistive Specs Help Jog Your Memory” →

Robotic Skin Sees When (and How) You’re Touching It

November 25, 2019 by Sharon Lin 2 Comments

Cameras are getting less and less conspicuous. Now they’re hiding under the skin of robots.

A team of researchers from ETH Zurich in Switzerland have recently created a multi-camera optical tactile sensor that is able to monitor the space around it based on contact force distribution. The sensor uses a stack up involving a camera, LEDs, and three layers of silicone to optically detect any disturbance of the skin.

The scheme is modular and in this example uses four cameras but can be scaled up from there. During manufacture, the camera and LED circuit boards are placed and a layer of firm silicone is poured to about 5 mm in thickness. Next a 2 mm layer doped with spherical particles is poured before the final 1.5 mm layer of black silicone is poured. The cameras track the particles as they move and use the information to infer the deformation of the material and the force applied to it. The sensor is also able to reconstruct the forces causing the deformation and create a contact force distribution. The demo uses fairly inexpensive cameras — Raspberry Pi cameras monitored by an NVIDIA Jetson Nano Developer Kit — that in total provide about 65,000 pixels of resolution.

Apart from just providing more information about the forces applied to a surface, the sensor also has a larger contact surface and is thinner than other camera-based systems since it doesn’t require the use of reflective components. It regularly recalibrates itself based on a convolutional neural network pre-trained with data from three cameras and updated with data from all four cameras. Possible future applications include soft robotics, improving touch-based sensing with the aid of computer vision algorithms.

While self-aware robotic skins may not be on the market quite so soon, this certainly opens the possibility for robots that can detect when too much force is being applied to their structures — the machine equivalent sensation to pain.

Continue reading “Robotic Skin Sees When (and How) You’re Touching It” →

Colorizing Images With The Help Of AI

November 18, 2019 by Lewin Day 15 Comments

The world was never black and white – we simply lacked the technology to capture it in full color. Many have experimented with techniques to take black and white images, and colorize them. [Adrian Rosebrock] decided to put an AI on the job, with impressive results.

The method involves training a Convolutional Neural Network (CNN) on a large batch of photos, which have been converted to the Lab colorspace. In this colorspace, images are made up of 3 channels – lightness, a (red-green), and b (blue-yellow). This colorspace is used as it better corresponds to the nature of the human visual system than RGB. The model is then trained such that when given a lightness channel as an input, it can predict the likely a & b channels. These can then be recombined into a colorized image, and converted back to RGB for human consumption.

It’s a technique capable of doing a decent job on a wide variety of material. Things such as grass, countryside, and ocean are particularly well dealt with, however more complicated scenes can suffer from some aberration. Regardless, it’s a useful technique, and far less tedious than manual methods.

CNNs are doing other great things too, from naming tomatoes to helping out with home automation. Video after the break.

Continue reading “Colorizing Images With The Help Of AI” →