AI Image Generation Meets Virtual Dress Up

Image generators have really taken off thanks to machine learning, and all kinds of new ideas have been turned on in people’s heads as a result. OOTDiffusion is one such project, its job being to allow virtual try-ons of clothing by combining a picture of a person and an item of clothing, and doing so in a coherent way.

A model sporting a 2021 Remoticon shirt.

When it comes to AI image generators, maintaining consistency of a particular subject in a picture while changing or combining other parts of the image isn’t a trivial task. (If you’re unfamiliar with the basics of how diffusion-type AI image generators work, we have you covered.)

Virtual try-on of clothing is not a new idea, but it’s also far from being a completely solved problem. It’s easy to feed a system high-quality images of people and clothing and ask it to combine them, but the outputs rarely emerge with all their limbs intact, figuratively speaking.

OOTDiffusion addresses the two big challenges in this area: making sure the outputs look natural and realistic, and preserving as much of the garment’s appearance and qualities as possible in the process.

It seems to to a very good job, and you can try it for yourself in the online demo. Check out the research paper for more details, and the GitHub repository provides all the code if you’d like to get a little more hands-on.

Your Text Needs More JPEG

We’ve all been victims of bad memes on the Internet, but they’re not all just bad jokes gone wrong. Some are simply bad as a result of being copies-of-copies, as each reposter adds another layer of compression to an already lossy image format like JPEG. Compression can certainly be a benefit in areas like images and videos, but [Michal] had a bit of a fever dream imagining this process applied to text. Rather than let the idea escape, he built the Lossifizer to add JPEG-like compression to text.

JPEG compression uses a system similar to the fast Fourier transform (FFT) called the discrete cosine transform (DCT) to reduce the amount of data in an image by essentially removing some frequency information. The data lost is often not noticeable to the human eye, at least until it gets out of hand. [Michal]’s system performs the same transform on text instead, with a slider to control the “amount of JPEG” in the output text. The code for this script uses a “perceptual” character map, clustering similarly-looking and similarly-sounding characters next to each other, resembling “leet speak” from days of yore, although at high enough compression this quickly gets out of hand.

One of the quirks that [Michal] discovered is that certain AI chat bots have a much less difficult time interpreting this JPEG-ified text than a human probably would have, which provides a bit of insight into how some of these algorithms might be functioning under the hood. For some more insight into how JPEG actually works on images, we posted about a deep dive into the image format a while back.

Making Floating Point Calculations Less Cursed When Accuracy Matters

Inverting the earlier exponentiation to reduce floating point arithmetic error. (Credit: exozy)
Inverting the earlier exponentiation to reduce floating point arithmetic error. (Credit: exozy)

An unfortunate reality of trying to represent continuous real numbers in a fixed space (e.g. with a limited number of bits) is that this comes with an inevitable loss of both precision and accuracy. Although floating point arithmetic standards – like the commonly used IEEE 754 – seek to minimize this error, it’s inevitable that across the range of a floating point variable loss of precision occurs. This is what [exozy] demonstrates, by showing just how big the error can get when performing a simple division of the exponential of an input value by the original value. This results in an amazing error of over 10%, which leads to the question of how to best fix this.

Obviously, if you have the option, you can simply increase the precision of the floating point variable, from 32-bit to 64- or even 256-bit, but this only gets you so far. The solution which [exozy] shows here involves using redundant computation by inverting the result of ex. In a demonstration using Python code (which uses IEEE 754 double precision internally), this almost eradicates the error. Other than proving that floating point arithmetic is cursed, this also raises the question of why this works.

Continue reading “Making Floating Point Calculations Less Cursed When Accuracy Matters”

This Piano Does Not Exist

A couple of decades ago one of *the* smartphone accessories to have was a Bluetooth keyboard which projected the keymap onto a table surface where letters could be typed in a virtual space. If we’re honest, we remember them as not being very good. But that hasn’t stopped the idea from resurfacing from time to time.

We’re reminded of it by [Mayuresh1611]’s paper piano, in which a virtual piano keyboard is watched over by a webcam to detect the player’s fingers such that the correct note from a range of MP3 files is delivered.

The README is frustratingly light on details other than setup, but a dive into the requirements reveals OpenCV as expected, and TensorFlow. It seems there’s a training step before a would-be virtual virtuoso can tinkle on the non-existent ivories, but the demo shows that there’s something playable in there. We like the idea, and wonder whether it could also be applied to other instruments such as percussion. A table as a drum kit would surely be just as much fun.

This certainly isn’t the first touch piano we’ve featured, but we think it may be the only one using OpenCV. A previous one used more conventional capacitive sensors.

A multifactor authentication device showing TOTP codes

An ESP32 MultiFactor TOTP Generator

MFA, or multifactor authentication, is a standard security feature these days. However, it can be a drag to constantly reach into one’s pocket, scroll to Google Authenticator (other MFA applications are available!), and find the correct TOTP code to log in to a site for a short while. [Allan Oricil] felt this pain point, so they took the problem by the horns and created a desktop MFA TOTP generator to make life just that little bit easier.

TOTP, which stands for Time-based One-Time Password, is a security measure that uses a device or application to provide unique codes that expire after a short time. Two-factor authentication requires a physical item (something you have), such as a key or swipe card, and knowledge of a fact (something you know), like a password, rather than relying on a single factor. This approach ensures a higher level of security. [Allan]’s project is a physical thing one would use with a password or key file.

Continue reading “An ESP32 MultiFactor TOTP Generator”

render of the Amiga juggler demo

The Juggler: In Rust

Back on the theme of learning to program by taking on a meaningful project — we have another raytracing demo — this time using Rust on the Raspberry Pi. [Unfastener] saw our previous article about writing a simple raytracer in spectrum BASIC and got inspired to try something similar. The plan was to recreate the famous juggler 3D demo, from the early days of 3D rendering on the Amiga.

The juggler story starts with an Amiga programmer called [Eric Graham] who created ssg, the first ray tracer application on a personal computer. A demo was shown to Commodore, who didn’t believe it was done on their platform, but a quick follow-up with the actual software used soon quelled their doubts. Once convinced, they purchased the rights to the demo for a couple of thousand dollars (in 1986 money, mind you) to use in promotional materials. [Eric] developed ssg into the popular Sculpt 3D, which became available also on Mac and Windows platforms, and kick-started a whole industry of personal 3D modelling and ray tracing.

Anyway, back to the point. [Unfastener] needed to get up the considerable Rust learning curve, and the best way to do that is to let someone else take care of some of the awkward details of dealing with GUI, and just concentrate on the application. To that end, they use the softbuffer and winit Rust crates that deal with the (important, yet frankly uninteresting) details of building frame buffers and pushing the pixels out to the window manager in a cross-platform way. Vecmath takes care of — you guessed it — the vector math. There’s no point reinventing that wheel either. Whilst [Unfastener] mentions the original Amiga demo took about an hour per frame to render, this implementation runs in real-time. To that end, the code performs a timed pre-render to determine the most acceptable resolution to get an acceptable frame rate, achieving a respectable 30 or so frames per second on a Pi 5, with the older Pis needing to drop the resolution a little. This goes to show how efficient Rust code can be and, how capable the new Pi is. How far we have come.

We saw another interesting rust-based raytracer a while back, which is kinda fun. We’ve also covered rust in other applications a few times, like inside the Linux kernel. Finally here’s our guide to getting started with rust, in case you need any more motivation to have a crack at this upcoming language.