Netflix has recently announced that they now stream optimized shot-based encoding content for 4K. When I read that news title I though to myself: “Well, that’s great! Sounds good but… what exactly does that mean? And what’s shot-based encoding anyway?”
[Ottverse] has an interesting series in progress to demystify video compression. The latest installment promises to explain discrete cosine transforms as though you were five years old.
We’ll be honest. At five, we probably didn’t know how to interpret this sentence:
…the Discrete Cosine Transform takes a set of N correlated (similar) data-points and returns N de-correlated (dis-similar) data-points (coefficients) in such a way that the energy is compacted in only a few of the coefficients M where M << N.
Still, the explanation is pretty clear and we really liked the analogy with the spheres and the stars in a constellation.
It’s true what they say — you never know what you can do until you try. Russell Kirsch, who developed the first digital image scanner and subsequently invented the pixel, was a firm believer in this axiom. And if Russell had never tried to get a picture of his three-month-old son into a computer back in 1957, you might be reading Hackaday in print right now. Russell’s work laid the foundation for the algorithms and storage methods that make digital imaging what it is today.
Russell A. Kirsch was born June 20, 1929 in New York City, the son of Russian and Hungarian immigrants. He got quite an education, beginning at Bronx High School of Science. Then he earned a bachelor’s of Electrical Engineering at NYU, a Master of Science from Harvard, and attended American University and MIT.
In 1951, Russell went to work for the National Bureau of Standards, now known as the National Institutes of Science and Technology (NIST). He spent nearly 50 years at NIST, and started out by working with one of the first programmable computers in America known as SEAC (Standards Eastern Automatic Computer). This room-sized computer built in 1950 was developed as an interim solution for the Census Bureau to do research (PDF).
Like the other computers of its time, SEAC spoke the language of punch cards, mercury memory, and wire storage. Russell Kirsch and his team were tasked with finding a way to feed pictorial data into the machine without any prior processing. Since the computer was supposed to be temporary, its use wasn’t as tightly controlled as other computers. Although it ran 24/7 and got plenty of use, SEAC was more accessible than other computers, which allowed time for bleeding edge experimentation. NIST ended up keeping SEAC around for the next thirteen years, until 1963.
The Original Pixel Pusher
The term ‘pixel’ is a shortened portmanteau of picture element. Technically speaking, pixels are the unit of length for digital imaging. Pixels are building blocks for anything that can be displayed on a computer screen, so they’re kind of the first addressable blinkenlights.
As the drum slowly rotated, a photo-multiplier moved back and forth, scanning the image through a square viewing hole in the wall of a box. The tube digitized the picture by transmitting ones and zeros to SEAC that described what it saw through the square viewing hole — 1 for white, and 0 for black. The digital image of Walden is 76 x 76 pixels, which was the maximum allowed by SEAC.
In in the video below, Russell discusses the idea and proves that variable pixels make a better image with more information than square pixels do, and with significantly fewer pixels overall. It takes some finagling, as pixel pairs of triangles and rectangles must be carefully chosen, rotated, and mixed together to best represent the image, but the image quality is definitely worth the effort. Following that is a video of Russell discussing SEAC’s hardware.
Russell retired from NIST in 2001 and moved to Portland, Oregon. As of 2012, he could be found in the occasional coffeehouse, discussing technology with anyone he could engage. Unfortunately, Russell developed Alzheimer’s and died from complications on August 11, 2020. He was 91 years old.
Cameras are getting smarter and more capable than ever, able to run embedded machine vision algorithms and pull off tricks far beyond what something like a serial camera and microcontroller board would be capable of, and the upcoming Vizy aims to be even smarter and easier to use yet. Vizy is the work of Charmed Labs, and this isn’t their first foray into accessible machine vision. Charmed Labs are the same folks behind the Pixy and Pixy 2 cameras. Vizy’s main goal is to make object detection and classification easy, with thoughtful hardware features and a browser-based interface.
The usual way to do machine vision is to get a USB camera and run something like OpenCV on a desktop machine to handle the processing. But Vizy leverages a Raspberry Pi 4 to provide a tightly-integrated unit in a small package with a variety of ready-to-run applications. For example, the “Birdfeeder” application comes ready to take snapshots of and identify common species of bird, while also identifying party-crashers like squirrels.
The demonstration video on their page shows off using the built-in high-current I/O header to control a sprinkler, repelling non-bird intruders with a splash of water while uploading pictures and video clips. The hardware design also looks well thought out; not only is there a safe shutdown and low-power mode for the Raspberry Pi-based hardware, but the lens can be swapped and the camera unit itself even contains an electrically-switched IR filter.
Vizy has a Kickstarter campaign planned, but like many others, Charmed Labs is still adjusting to the changes the COVID-19 pandemic has brought. You can sign up to be notified when Vizy launches; we know we’ll be keen for a closer look once it does. Easier machine vision is always a good thing, because it helps free people to focus on clever ideas like machine vision-based tool alignment.
Video conferencing is nothing new, but the recent world events have made it much more mainstream than it has been in the past. Luckily, web camera technology is nothing new and most software can also show your screen. But what about your paper documents? Turns out that [John Nelson] can show you how to spend $5 for an old laptop camera module and put your documents center stage on your next Zoom, Skype or other video conferences.
This is especially good for things that would be hard to draw in real time during a conference like a quick sketch, a schematic, or as you can see in the post and the video demo below, chemical molecule diagrams.
Stuck at home in self-quarantine, artist and filmmaker [Kira Bursky] had fewer options than normal for her latest film project. While a normal weekend film sprint would have involved collaborating with actors, set designers, and cinematographers in a frenzied attempt to finish in less than 48 hours, she instead chose to indulge in her curiosity for projection mapping, a technique that involves projecting visuals onto three-dimensional or flat surfaces.
In order for the images to properly map onto a surface, the surface first has to be mapped so that the projection is able to properly transform the flat image in order to produce the illusion of the light wrapping around the object. The technique is done in layers, in software similar to Photoshop, making it easier for the designer to organize the different interacting components in their animation.
[Kira] used a tool called Lightform to design her projections, which relies on a camera to calibrate the location of the surface and a projector to display the visuals. Her animated figures are drawn with loose lines and characterized by their slow gradients and ethereal movements. In the background of her film, a rhythmic sound plays while she brings the figures closer to view. Their outlines come into greater focus until the figures transform into her physical body, which also dances with the meandering lights.
There’s a lot of interesting content produced on video these days. Invariably, though, when we post something some comments will appear lamenting that a video isn’t the most efficient way to disseminate technical information. We have mixed feelings. Some things benefit from being able to see, for example, a screencast. Some people like the human connection of seeing an instructor interact with a class instead of just reading. But we will admit that sometimes a video takes longer to watch, especially if it is full of pauses. Unsilence is a tool from [labmoellertim] that can fix that. The command line tool takes a video and strips out the parts that are silent. You can also use it as a Python library if you want to build your own tools using the technique.
If you’ve ever taken a class online, it isn’t uncommon to speed up a video so you can get through class faster. This works to a point, but removing or speeding up silent gaps means you don’t have to “listen faster.” Of course, you could still speed up the video, too.