We love hearing about a good experiment, and here’s a pretty neat one: researchers used a VR headset, an off-the-shelf VR360 camera, and some custom software to glue them together. The result? Owl-Vision squashes a full 360° of un-distorted horizontal visual perception into 90° of neck travel to either side. One can see all around oneself, without needing to physically turn one’s head any further than is natural.
It’s still a work in progress, and accessing the paper currently doesn’t have a free option, but the demonstration video at that link (also embedded below) gives a solid overview of what’s going on.
Alibaba’s EMO (or Emote Portrait Alive) framework is a recent entry in a series of attempts to generate a talking head using existing audio (spoken word or vocal audio) and a reference portrait image as inputs. At its core it uses a diffusion model that is trained on 250 hours of video footage and over 150 million images. But unlike previous attempts, it adds what the researchers call a speed controller and a face region controller. These serve to stabilize the generated frames, along with an additional module to stop the diffusion model from outputting frames that feature a result too distinct from the reference image used as input.
In the related paper by [Linrui Tian] and colleagues a number of comparisons are shown between EMO and other frameworks, claiming significant improvements over these. A number of examples of talking and singing heads generated using this framework are provided by the researchers, which gives some idea of what are probably the ‘best case’ outputs. With some examples, like [Leslie Cheung Kwok Wing] singing ‘Unconditional‘ big glitches are obvious and there’s a definite mismatch between the vocal track and facial motions. Despite this, it’s quite impressive, especially with fairly realistic movement of the head including blinking of the eyes.
Meanwhile some seem extremely impressed, such as in a recent video by [Matthew Berman] on EMO where he states that Alibaba releasing this framework to the public might be ‘too dangerous’. The level-headed folks over at PetaPixel however also note the obvious visual imperfections that are a dead give-away for this kind of generative technology. Much like other diffusion model-based generators, it would seem that EMO is still very much stuck in the uncanny valley, with no clear path to becoming a real human yet.
The quality of available video production equipment has increased hugely as digital video and then high-definition equipment have entered the market. But there are still some components which are expensive, one of which is a decent quality HD wireless monitor. Along comes [FuzzyLogic] with a solution, in the form of an external monitor for a laptop, driven by a wireless HDMI extender.
In one sense this project involves plugging in a series of components and simply using them for their intended purpose, however it’s more than that in that it involves some rather useful 3D printed parts to make a truly portable wireless monitor, as well as saving the rest of us the gamble of buying wireless HDMI extender without knowing whether it would deliver.
He initially tried an HDMI-to-USB dongle and a streaming Raspberry Pi, however the latency was far too high to be useful. The extender does have a small delay, but not so bad as to be unusable. The whole including the monitor can be powered from a large USB power bank, answering one of our questions. All the files can be downloaded from Printables should you wish to follow the same path, and meanwhile there’s a video with the details below the break.
We’ve noticed a recent YouTube trend of producing trailers for shows and movies as if they were produced in the 1950s, even when they weren’t. The results are impressive and, as you might expect, leverage AI generation tools. While we enjoy watching them, we were especially interested in [Patrick Gibney’s] peek behind the curtain of how he makes them, as you can see below. If you want to see an example of the result first, check out the second video, showing a 1950s-era The Matrix.
Of course, you could do some of it yourself, but if you want the full AI experience, [Patrick] suggests using ChatGPT to produce a script, though he admits that if he did that, he would tweak the results. Other AI tools create the pictures used and the announcer-style narration. Another tool produces cinematographic shots that include the motion of the “actors” and other things in the scene. More tools create the background music.
You know the feeling of having just created a perfect setup for your hacker lab? Sometimes, there’s just this missing piece in the puzzle that requires you to do a small hack, and those are the most tempting. [maxime borges] has such a perfect setup that involves a HDMI 4:2 switch, and he brings us a write-up on integrating that HDMI switch into Home Assistant through emulating an infrared receiver’s signals.
The HDMI switch is equipped with an infrared sensor as the only means of controlling it, so naturally, that was the path chosen for interfacing the ESP32 put inside the switch. Fortunately, Home Assistant provides the means to both receive and output IR signals, so after capturing all the codes produced by the IR remote, parsing their meaning, then turning them into a Home Assistant configuration, [maxime] got HDMI input switching to happen from the comfort of his phone.
We get the Home Assistant config snippets right there in the blog post — if you’ve been looking for a HDMI switch for your hacker lair, now you have one model to look out for in particular. Of course, you could roll your own HDMI switch, and if you’re looking for references, we’ve covered a good few hacks doing that as part of building a KVM.
Apple has developed a proprietary — even mysterious — “fisheye” projection format used for their immersive videos, such as those played back by the Apple Vision Pro. What’s the mystery? The fact that they stream their immersive content in this format but have provided no elaboration, no details, and no method for anyone else to produce or play back this format. It’s a completely undocumented format and Apple’s silence is deafening when it comes to requests for, well, anything to do with it whatsoever.
Probably those details are eventually forthcoming, but [Mike Swanson] isn’t satisfied to wait. He’s done his own digging into the format and while he hasn’t figured it out completely, he has learned quite a bit and written it all up on a blog post. Apple’s immersive videos have a lot in common with VR180 type videos, but under the hood there is more going on. Apple’s stream is DRM-protected, but there’s an unencrypted intro clip with logo that is streamed in the clear, and that’s what [Mike] has been focusing on.
Most “fisheye” formats are mapped onto square frames in a way similar to what’s seen here, but this is not what Apple is doing.
[Mike] has been able to determine that the format definitely differs from existing fisheye formats recorded by immersive cameras. First of all, the content is rotated 45 degrees. This spreads the horizon of the video across the diagonal, maximizing the number of pixels available in that direction (a trick that calls to mind the heads in home video recorders being tilted to increase the area of tape it can “see” beyond the physical width of the tape itself.) Doing this also spreads the center-vertical axis of the content across the other diagonal, with the same effect.
There’s more to it than just a 45-degree rotation, however. The rest most closely resembles radial stretching, a form of disc-to-square mapping. It’s close, but [Mike] can’t quite find a complete match for what exactly Apple is doing. Probably we’ll all learn more soon, but for now Apple isn’t saying much.
Videos like VR180 videos and Apple’s immersive format display stereoscopic video that allow a user to look around naturally in a scene. But to really deliver a deeper sense of presence and depth takes light fields.
The T-800, also known as the Terminator, was like some kind of non-giving up robot guy. The robot assassin viewed the world through a tinted view with lines of code scrolling by all the while. It was cinematic shorthand to tell the audience they were looking through the eyes of a machine. Now, a YouTuber called [Open Source] has analyzed that code.
The video highlights one interesting finds, concerning graphics seen in the T-800’s vision. They appear to match the output of various code listings and articles in Nibble Magazine, specifically its September 1984 issue. One example spotted was a compass rose, spawned from an Apple Basic listing. it was a basic quiz to help teach children to understand the compass. Another graphic appears to be cribbed from the same issue in the MacPaint Patterns section.
The weird thing is that the original film came out in October 1984 — just a month after that article would have hit the news stands. It suggests perhaps someone involved with the movie was also involved or had access to an early copy of Nibble Magazine — or that the examples in the magazine were just rehashed from some other earlier source.
Code that regularly flickers in the left of the T-800s vision is just 6502 machine code. It’s apparently just a random hexdump from an Apple II’s memory. At other times, there’s also 6502 assembly code on screen which includes various programmer comments still intact. There’s even some code cribbed from the Apple II DOS 3.3 RAM Disk driver.
It’s neat to see someone actually track down the background of these classic graphics. Hacking and computers are usually portrayed in a fairly unrealistic way in movies, and it’s no different in The Terminator (1984). Still, that doesn’t mean the movies aren’t fun!