Using OpenCV To Catch A Hungry Thief

Rory, the star of the show

[Scott] has a neat little closet in his carport that acts as a shelter and rest area for their outdoor cat, Rory. She has a bed and food and water, so when she’s outside on an adventure she has a place to eat and drink and nap in case her humans aren’t available to let her back in. However, [Scott] recently noticed that they seemed to be going through a lot of food, and they couldn’t figure out where it was going. Kitty wasn’t growing a potbelly, so something else was eating the food.

So [Scott] rolled up his sleeves and hacked together an OpenCV project with a FLIR Boson to try and catch the thief. To reduce the amount of footage to go through, the system would only capture video when it detected movement or a large change in the scene. It would then take snapshots, timestamp them, and optionally record a feed of the video. [Scott] originally started writing the system in Python, but it couldn’t keep up and was causing frames to be dropped when motion was detected. Eventually, he re-wrote the prototype in C++ which of course resulted in much better performance!

Continue reading “Using OpenCV To Catch A Hungry Thief”

Giving People An Owl-like Visual Field Via VR Feels Surprisingly Natural

We love hearing about a good experiment, and here’s a pretty neat one: researchers used a VR headset, an off-the-shelf VR360 camera, and some custom software to glue them together. The result? Owl-Vision squashes a full 360° of un-distorted horizontal visual perception into 90° of neck travel to either side. One can see all around oneself, without needing to physically turn one’s head any further than is natural.

It’s still a work in progress, and accessing the paper currently doesn’t have a free option, but the demonstration video at that link (also embedded below) gives a solid overview of what’s going on.

Continue reading “Giving People An Owl-like Visual Field Via VR Feels Surprisingly Natural”

EMO: Alibaba’s Diffusion Model-Based Talking Portrait Generator

Alibaba’s EMO (or Emote Portrait Alive) framework is a recent entry in a series of attempts to generate a talking head using existing audio (spoken word or vocal audio) and a reference portrait image as inputs. At its core it uses a diffusion model that is trained on 250 hours of video footage and over 150 million images. But unlike previous attempts, it adds what the researchers call a speed controller and a face region controller. These serve to stabilize the generated frames, along with an additional module to stop the diffusion model from outputting frames that feature a result too distinct from the reference image used as input.

In the related paper by [Linrui Tian] and colleagues a number of comparisons are shown between EMO and other frameworks, claiming significant improvements over these. A number of examples of talking and singing heads generated using this framework are provided by the researchers, which gives some idea of what are probably the ‘best case’ outputs. With some examples, like [Leslie Cheung Kwok Wing] singing ‘Unconditional‘ big glitches are obvious and there’s a definite mismatch between the vocal track and facial motions. Despite this, it’s quite impressive, especially with fairly realistic movement of the head including blinking of the eyes.

Meanwhile some seem extremely impressed, such as in a recent video by [Matthew Berman] on EMO where he states that Alibaba releasing this framework to the public might be ‘too dangerous’. The level-headed folks over at PetaPixel however also note the obvious visual imperfections that are a dead give-away for this kind of generative technology. Much like other diffusion model-based generators, it would seem that EMO is still very much stuck in the uncanny valley, with no clear path to becoming a real human yet.

Continue reading “EMO: Alibaba’s Diffusion Model-Based Talking Portrait Generator”

A Wireless Monitor Without Breaking The Bank

The quality of available video production equipment has increased hugely as digital video and then high-definition equipment have entered the market. But there are still some components which are expensive, one of which is a decent quality HD wireless monitor. Along comes [FuzzyLogic] with a solution, in the form of an external monitor for a laptop, driven by a wireless HDMI extender.

In one sense this project involves plugging in a series of components and simply using them for their intended purpose, however it’s more than that in that it involves some rather useful 3D printed parts to make a truly portable wireless monitor, as well as saving the rest of us the gamble of buying wireless HDMI extender without knowing whether it would deliver.

He initially tried an HDMI-to-USB dongle and a streaming Raspberry Pi, however the latency was far too high to be useful. The extender does have a small delay, but not so bad as to be unusable. The whole including the monitor can be powered from a large USB power bank, answering one of our questions. All the files can be downloaded from Printables should you wish to follow the same path, and meanwhile there’s a video with the details below the break.

Continue reading “A Wireless Monitor Without Breaking The Bank”

What If The Matrix Was Made In The 1950s?

We’ve noticed a recent YouTube trend of producing trailers for shows and movies as if they were produced in the 1950s, even when they weren’t. The results are impressive and, as you might expect, leverage AI generation tools. While we enjoy watching them, we were especially interested in [Patrick Gibney’s] peek behind the curtain of how he makes them, as you can see below. If you want to see an example of the result first, check out the second video, showing a 1950s-era The Matrix.

Of course, you could do some of it yourself, but if you want the full AI experience, [Patrick] suggests using ChatGPT to produce a script, though he admits that if he did that, he would tweak the results. Other AI tools create the pictures used and the announcer-style narration. Another tool produces cinematographic shots that include the motion of the “actors” and other things in the scene. More tools create the background music.

Continue reading “What If The Matrix Was Made In The 1950s?”

Close-up of the mod installed into the HDMI switch, tapping the IR receiver

Interfacing A Cheap HDMI Switch With Home Assistant

You know the feeling of having just created a perfect setup for your hacker lab? Sometimes, there’s just this missing piece in the puzzle that requires you to do a small hack, and those are the most tempting. [maxime borges] has such a perfect setup that involves a HDMI 4:2 switch, and he brings us a write-up on integrating that HDMI switch into Home Assistant through emulating an infrared receiver’s signals.

overview picture of the HDMI switch, with the mod installed

The HDMI switch is equipped with an infrared sensor as the only means of controlling it, so naturally, that was the path chosen for interfacing the ESP32 put inside the switch. Fortunately, Home Assistant provides the means to both receive and output IR signals, so after capturing all the codes produced by the IR remote, parsing their meaning, then turning them into a Home Assistant configuration, [maxime] got HDMI input switching to happen from the comfort of his phone.

We get the Home Assistant config snippets right there in the blog post — if you’ve been looking for a HDMI switch for your hacker lair, now you have one model to look out for in particular. Of course, you could roll your own HDMI switch, and if you’re looking for references, we’ve covered a good few hacks doing that as part of building a KVM.

Unraveling The Secrets Of Apple’s Mysterious Fisheye Format

Apple has developed a proprietary — even mysterious — “fisheye” projection format used for their immersive videos, such as those played back by the Apple Vision Pro. What’s the mystery? The fact that they stream their immersive content in this format but have provided no elaboration, no details, and no method for anyone else to produce or play back this format. It’s a completely undocumented format and Apple’s silence is deafening when it comes to requests for, well, anything to do with it whatsoever.

Probably those details are eventually forthcoming, but [Mike Swanson] isn’t satisfied to wait. He’s done his own digging into the format and while he hasn’t figured it out completely, he has learned quite a bit and written it all up on a blog post. Apple’s immersive videos have a lot in common with VR180 type videos, but under the hood there is more going on. Apple’s stream is DRM-protected, but there’s an unencrypted intro clip with logo that is streamed in the clear, and that’s what [Mike] has been focusing on.

Most “fisheye” formats are mapped onto square frames in a way similar to what’s seen here, but this is not what Apple is doing.

[Mike] has been able to determine that the format definitely differs from existing fisheye formats recorded by immersive cameras. First of all, the content is rotated 45 degrees. This spreads the horizon of the video across the diagonal, maximizing the number of pixels available in that direction (a trick that calls to mind the heads in home video recorders being tilted to increase the area of tape it can “see” beyond the physical width of the tape itself.) Doing this also spreads the center-vertical axis of the content across the other diagonal, with the same effect.

There’s more to it than just a 45-degree rotation, however. The rest most closely resembles radial stretching, a form of disc-to-square mapping. It’s close, but [Mike] can’t quite find a complete match for what exactly Apple is doing. Probably we’ll all learn more soon, but for now Apple isn’t saying much.

Videos like VR180 videos and Apple’s immersive format display stereoscopic video that allow a user to look around naturally in a scene. But to really deliver a deeper sense of presence and depth takes light fields.