AI Binoculars Know More About Birds Than You

2024 is the year of adding Artificial Intelligence to everything. Now, even a pleasant walk in the woods is getting a dose of AI: optics manufacturer Swarovski has announced the AX Visio, a binocular set with an AI bird identification feature. Not sure if that is a lesser or greater scaup on your pond? These binoculars will tell you, for the low, low price of  $4799.

While digital cameras built into binoculars have been around for a while, adding AI is new. That’s a cool thing, but a bit of digging into the specs reveals that there is a much cheaper way to do it.

  1. Buy a cheap digital camera, like the Kodak Pixpro AZ255, which has a higher resolution and longer zoom than these binoculars.
  2. Transfer the image to your cell phone with an $11 memory card reader.
  3. Run the free Cornell Merlin ID app to identify the bird.
  4. Send the $4500 you just saved to us, or your favorite charity.

These ludicrously overpriced binoculars use the same Cornell Merlin ID system that you can use for free from their app, which also has the advantage of being able to ID birds from their songs. This is helpful because birds are tricky creatures who will try and hide from the hideously overpriced gadget you just bought.

[Via DigitalCameraWorld]

This Week In Security: AI Is Terrible, Ransomware Wrenches, And Airdrop

So first off, go take a look at this curl bug report. It’s a 8.6 severity security problem, a buffer overflow in websockets. Potentially a really bad one. But, it’s bogus. Yes, a strcpy call can be dangerous, if there aren’t proper length checks. This code has pretty robust length checks. There just doesn’t seem to be a vulnerability here.

OK, so let’s jump to the punch line. This is a bug report that was generated with one of the Large Language Models (LLMs) like Google Bard or ChatGPT. And it shouldn’t be a surprise. There are some big bug bounties that are paid out, so naturally people are trying to leverage AI to score those bounties. But as [Daniel Stenberg] point out, LLMs are not actually AI, and the I in LLM stands for intelligence.

There have always been vulnerability reports of dubious quality, sent by people that either don’t understand how vulnerability research works, or are willing to waste maintainer time by sending in raw vulnerability scanner output without putting in any real effort. What LLMs do is provide an illusion of competence that takes longer for a maintainer to wade through before realizing that the claim is bogus. [Daniel] is more charitable than I might be, suggesting that LLMs may help with communicating real issues through language barriers. But still, this suggests that the long term solution may be “simply” detecting LLM-generated reports, and marking them as spam. Continue reading “This Week In Security: AI Is Terrible, Ransomware Wrenches, And Airdrop”

Adding AI To NPCs Is Easy, Doing It Well Is Hard

Adding natural language interfaces to software is easier than ever, and that led [creikey] to prototype a game that hinges on communicating with NPCs. The prototype went through multiple iterations during which he mainly discovered things that did not work well. Ultimately, it led to [creikey] settling on a western-themed game called Dante’s Cowboy which he hopes to release as an experiment. He begins talking about the game around the 4:43 mark in the video, which directly precedes a recording of a presentation he gives at as an indie developer.

Games typically revolve around the player manipulating entities in an environment in order to make things happen. This interaction drives engagement and interesting decisions. But while adding natural language AI to NPCs makes them easy to talk with, talking by itself is a shallow interaction. Convincing NPCs to do things? That’s complex and far more difficult to implement. [creikey] realized the limitations large language models (LLMs) had and worked to overcome them to make a unique game experience.

The challenges boil down to figuring out how to drive meaningful interaction, aligning AI behavior with the gameplay context, and managing API costs. In his words, “it’s been a learning experience to figure out where [natural language AI] even belongs in a game, if it belongs at all.”

We’ve previously seen ChatGPT used to grant NPCs the ability to communicate naturally which is a fascinating tech demo, but gameplay-wise can boil down to being a complicated alternative to pressing a button. As [creikey] discovered, adding this technology into games in a way that feels meaningful takes a new kind of work.

Continue reading “Adding AI To NPCs Is Easy, Doing It Well Is Hard”

AI Pet Door Rejects Dead Mice

If you have pet with a little access door to the outside world, and that pet happens to be a cat, you’re likely on the receiving end of all kinds of lifeless little lagniappes. Don’t worry, it’s CES season out in Las Vegas and a company called Flappie has the solution — an AI-powered cat door that rejects dead mice and other would-be offerings.

Image by Nathan Ingraham via Engadget

It works about like you might expect — there’s a motion sensor and a night-vision camera on the exterior side of the door. Using Flappie’s “unique and proprietary” dataset, the door distinguishes between Tom and Jerry and keeps out unwanted guests with more than 90% accuracy. To do this, Flappie collected video of a lot of cats and prey in a variety of lighting conditions. There’s even a chip detection system that will reject all other cats.

Thankfully, it’s not all automation. The prey detection system can be turned off entirely, and there are manual switches on the inside for locking and unlocking the door at will. You don’t even have to hook it up to the Internet, it seems.

Americans will have to wait a while, as the company is rolling out the door in Switzerland and Germany first. No word on when the US launch will take place, but interested parties can expect to pay around $399.

Of course, this problem can be solved without AI as long as you’re willing to review the situation and unlock the door yourself.

Using Local AI On The Command Line To Rename Images (And More)

We all have a folder full of images whose filenames resemble line noise. How about renaming those images with the help of a local LLM (large language model) executable on the command line? All that and more is showcased on [Justine Tunney]’s bash one-liners for LLMs, a showcase aimed at giving folks ideas and guidance on using a local (and private) LLM to do actual, useful work.

This is built out from the recent llamafile project, which turns LLMs into single-file executables. This not only makes them more portable and easier to distribute, but the executables are perfectly capable of being called from the command line and sending to standard output like any other UNIX tool. It’s simpler to version control the embedded LLM weights (and therefore their behavior) when it’s all part of the same file as well.

One such tool (the multi-modal LLaVA) is capable of interpreting image content. As an example, we can point it to a local image of the Jolly Wrencher logo using the following command:

llava-v1.5-7b-q4-main.llamafile --image logo.jpg --temp 0 -e -p '### User: The image has...\n### Assistant:'

Which produces the following response:

The image has a black background with a white skull and crossbones symbol.

With a different prompt (“What do you see?” instead of “The image has…”) the LLM even picks out the wrenches, but one can already see that the right pieces exist to do some useful work.

Check out [Justine]’s rename-pictures.sh script, which cleverly evaluates image filenames. If an image’s given filename already looks like readable English (also a job for a local LLM) the image is left alone. Otherwise, the picture is fed to an LLM whose output guides the generation of a new short and descriptive English filename in lowercase, with underscores for spaces.

What about the fact that LLM output isn’t entirely predictable? That’s easy to deal with. [Justine] suggests always calling these tools with the --temp 0 parameter. Setting the temperature to zero makes the model deterministic, ensuring that a same input always yields the same output.

There’s more neat examples on the Bash One-Liners for LLMs that demonstrate different ways to use a local LLM that lives in a single-file executable, so be sure to give it a look and see if you get any new ideas. After all, we have previously shown how automating tasks is almost always worth the time invested.

A Ham Radio Answering Machine

For those who grew up with a cell phone in their hand, it might be difficult to imagine a time where the phone wasn’t fully integrated with voicemail. It sounds like a fantastical past, yet at one point a separate machine needed to be attached to the phone to record messages if no one was home to answer. Not only that, but a third device, a cassette tape, was generally needed as a storage device to hold the messages. In many ways we live in a much simpler world now, but in the amateur radio world one group is looking to bring this esoteric technology to the airwaves and [saveitforparts] is demonstrating one as part of a beta test.

The device is called the Boondock Echo, and while at its core it’s an ESP32 there’s a lot going on behind the scenes. It has an audio interface which is capable of connecting to a radio given the correct patch cable; in this case with a simple Baofeng handheld unit. The answering machine can record any sounds that come in. However, with a network connection the recordings are analyzed with an AI which can transcribe what it hears and even listen for specific call signs, then take actions such as sending emails when it hears triggers like that. Boondock also plans for this device to be capable of responding as well, but [saveitforparts] was not able to get this working during this beta test.

While an answering machine might seem like a step backwards technologically, an answering machine like this, especially when paired with Google Voice-like capabilities from an AI, has a lot of promise for ham radio operators. Even during this test, [saveitforparts] lost a radio and a kind stranger keyed it up when it was found, which was recorded by the Boondock Echo and used to eventually recover the radio. Certainly there are plenty of other applications as well, such as using AI instead of something like an Arduino to do Morse decoding.

Continue reading “A Ham Radio Answering Machine”

Making Visual Anagrams, With Help From Machine Learning

[Daniel Geng] and others have an interesting system of generating multi-view optical illusions, or visual anagrams. Such images have more than one “correct” view and visual interpretation.

What’s more, there are quite a few different methods on display: 90 degree flips and other (orthogonal) image rotations, color inversions, jigsaw permutations, and more. The project page has a generous number of examples, so go check them out!

The team’s method uses pre-trained diffusion models — more commonly known as the secret sauce inside image-generating AIs — to evaluate and work to combine the differences between different images, and try to combine and apply it in a way that results in the model generating a good visual result. While conceptually straightforward, this process wasn’t really something that could work without diffusion models driven by modern machine learning techniques.

The visual_anagrams GitHub repository has code and the research paper goes into details on implementation, limitations, and gives guidance on obtaining good results. Image generation is just one of the rapidly-evolving aspects of recent innovations, and it’s always interesting to see unusual applications like this one.