High Quality 3D Scene Generation From 2D Source, In Realtime

Here’s some fascinating work presented at SIGGRAPH 2023 of a method for radiance field rendering using a novel technique called Gaussian Splatting. What’s that mean? It means synthesizing a 3D scene from 2D images, in high quality and in real time, as the short animation shown above shows.

Neural Radiance Fields (NeRFs) are a method of leveraging machine learning to, in a way, do what photogrammetry does: synthesize complex scenes and views based on input images. But NeRFs work in a fraction of the time, and require only a fraction of the source material. There are different ways to go about this and unsurprisingly, there tends to be a clear speed vs. quality tradeoff. But as the video accompanying this new work seems to show, clever techniques mean the best of both worlds.

A short video summary is embedded just below the page break. Interested in deeper details? The research PDF is here. The amount of development this field has seen is nothing short of staggering, and certainly higher in quality than what was state-of-the-art for NeRFs only a year ago.

Continue reading “High Quality 3D Scene Generation From 2D Source, In Realtime”

Re-Creating Pink Floyd In The Name Of Speech

For people who have lost the ability to speak, the future may include brain implants that bring that ability back. But could these brain implants also allow them to sing? Researchers believe that, all in all, it’s just another brick in the wall.

In a new study published in PLOS Biology, twenty-nine people who were already being monitored for epileptic seizures participated via a postage stamp-sized array of electrodes implanted directly on the surface of their brains. As the participants were exposed to Pink Floyd’s Another Brick In the Wall, Part 1, the researchers gathered data from several areas of the brain, each attuned to a different musical element such as harmony, rhythm, and so on. Then the researchers used machine learning to reconstruct the audio heard by the participants using their brainwaves.

First, an AI model looked at the data generated from the brains’ responses to components of the song, like the changes in rhythm, pitch, and tone. Then a second model rejiggered the piecemeal song and estimated the sounds heard by the patients. Of the seven audio samples published in the study results, we think #3 sounds the most like the song. It’s kind of creepy but ultimately very cool. What do you think?

Continue reading “Re-Creating Pink Floyd In The Name Of Speech”

A Hacker-Friendly Software Package For Your Next AI Project

If you’re interested in using Large Language Models (LLM) in a project, but aren’t plugged directly into the fast-developing world of artificial intelligence (AI), knowing what tool or software to use can be daunting. Luckily, [Max Woolf] created simpleaichat, which is complete with examples and documentation and minimal code complexity.

As [Max] puts it, the main motivations behind the project are to provide useful tools while making it easier for non-engineers to peer through the breathless hyperbole and see just how AI-based apps actually work. This project was directly inspired by [Max]’s own real-world software experiences in this area, particularly his frustrations with popular and much-hyped frameworks in which “Hello World” feels a lot more like Hell World.

simpleaichat is a Python package that provides easy and powerful ways to interface with the OpenAI API, makers of ChatGPT. Now, it is true that OpenAI’s models are not open source and access is not free, but they are easily one of the most capable and cost-effective services of their kind.

Prefer something a little more open, and a lot more private? There’s always the option to run an LLM locally on your own machine, possibly with the help of a tool like text-generation-webui or gpt4all. Running an LLM locally will not have the quality of OpenAI’s offerings, but it can still do the job. It’s also possible to give these local LLMs an interface that mimics OpenAI’s API, so there are loads of possibilities.

Are you getting ideas yet? Share them in the comments, or keep them to yourselves and submit a tip once your project is off the ground!

AI Assistant Translates Your Every Request For The Command Line

If you don’t live on the command line, it can be easy to forget the exact syntax of commands. It often leaves you running to the “/?” or “–help” switches, or else a quick Google search to find the proper incantations. Shell-AI is a machine-learning assistant that could change all that by helping you find the proper command for the job, right on the command line!

Shell-AI accepts natural-language inputs — simply type in “shai” followed by what you’re trying to do. It will then take in your request, run it through an OpenAI language model like GPT-3.5-Turbo, and then present you with three (or more) potential commands. You can then select which command to use and get on with your day.

As demonstrated, it’s more than capable of following commands like “download a random image” or “show only image files ls.” And, hilariously, it responds to the request “do something crazy” with just one suggestion: “rm -rf”. That seems rather fitting.

We wouldn’t blindly follow any commands coming out of a large language model, of course. But, if you know what you’re doing, it could prove a useful little tool to ease your regular duties on the command line.

Text Compression Gets Weirdly Efficient With LLMs

It used to be that memory and storage space were so precious and so limited of a resource that handling nontrivial amounts of text was a serious problem. Text compression was a highly practical application of computing power.

Today it might be a solved problem, but that doesn’t mean it doesn’t attract new or unusual solutions. [Fabrice Bellard] released ts_zip which uses Large Language Models (LLM) to attain text compression ratios higher than any other tool can offer.

LLMs are the technology behind natural language AIs, and applying them in this way seems effective. The tradeoff? Unlike typical compression tools, the lossless decompression part isn’t exactly guaranteed when an LLM is involved. Lossy compression methods are in fact quite useful. JPEG compression, for example, is a good example of discarding data that isn’t readily perceived by humans to make a smaller file, but that isn’t usually applied to text. If you absolutely require lossless compression, [Fabrice] has that covered with NNCP, a neural-network powered lossless data compressor.

Do neural networks and LLMs sound far too serious and complicated for your text compression needs? As long as you don’t mind a mild amount of definitely noticeable data loss, check out [Greg Kennedy]’s Lossy Text Compression which simply, brilliantly, and amusingly uses a thesaurus instead of some fancy algorithms. Yep, it just swaps longer words for shorter ones. Perhaps not the best solution for every need, but between that and [Fabrice]’s brilliant work we’re confident there’s something for everyone who craves some novelty with their text compression.

[Photo by Matthew Henry from Burst]

2023 Hackaday Prize: Two Bee Or More Bee Swarm Detection

In the bustling world of bees, swarming is the ultimate game of real estate shuffle. When a hive gets too crowded or craves a change of scenery, colonies scout out swarms for a new hive. [Captain Flatus O’Flaherty] is a beekeeper trying to capture more native honey bees, and a custom LoRa-enabled capture hive helps him do that.

A catch hive, perched high and mighty, lures scouting as potential new homes. If selected, a swarm of over a thousand bees can move in, where [Flatus]’s detector comes in. Many catch hives are scattered around, and manually checking them is difficult. While the breath of one bee is hard to see, a thousand bees produce enough CO2 to be detected by a sensor. A custom PCB with a solar-powered  +30dB LoRa radio measures CO2 and reports back. The PCB contains an ESP32 D4 and a 1-watt Ebyte E22-400M30S LoRa module. If the CO2 levels are still elevated at nightfall, [Flatus] can be pretty confident a swarm has moved in.

Using the data collected, he massaged it to create a dataset suitable for training on XGBoost. With weather data and other conditions, the model tries to predict when a swarm is more or less likely to happen. Apis Mellifera (the local honeybee around [Flatus]) loves sun-kissed, warm, humid afternoons with little wind.

We’ve seen beehive monitors before and love exploring what the data could be used for—video after the break.

Continue reading “2023 Hackaday Prize: Two Bee Or More Bee Swarm Detection”

Several video clips of a robot arm manipulating objects in a kitchen environment, demonstrating some of the 12 generalized skills

RoboAgent Gets Its MT-ACT Together

Researchers at Carnegie Mellon University have shared a pre-print paper on generalized robot training within a small “practical data budget.” The team developed a system that breaks movement tasks into 12 “skills” (e.g., pick, place, slide, wipe) that can be combined to create new and complex trajectories within at least somewhat novel scenarios, called MT-ACT: Multi-Task Action Chunking Transformer. The authors write:

Trained merely on 7500 trajectories, we are demonstrating a universal RoboAgent that can exhibit a diverse set of 12 non-trivial manipulation skills (beyond picking/pushing, including articulated object manipulation and object re-orientation) across 38 tasks and can generalize them to 100s of diverse unseen scenarios (involving unseen objects, unseen tasks, and to completely unseen kitchens). RoboAgent can also evolve its capabilities with new experiences.

Continue reading “RoboAgent Gets Its MT-ACT Together”