AI Helps Make Web Scraping Faster And Easier

Web scraping is usually only a first step towards extracting meaningful data. Once you’ve got everything pulled down, you’ve still got to process it into something useful. Here to assist with that is Scrapegraph-ai, a Python tool that promises to automate the process using a selection of large language models (LLMs).

Scrapegraph-ai is able to accept a URL as well as a prompt, which is a plain-English instruction on what to do with the data. Examples include summarizing, describing images, and more. In other words, gathering the data and analyzing or formatting it can now be done as one.

The project is actually pretty flexible in terms of the AI back-end. It’s able to work with locally-installed AI tools (via ollama) or with API keys for services like OpenAI and more. If you have an OpenAI API key, there’s an online demo that will show you the capabilities pretty effectively. Otherwise, local installation is only a few operations away.

This isn’t the first time we have seen the flexibility of AI tools like large language models leveraged to ease the notoriously-fiddly task of web scraping, and it’s great to see the results have only gotten better.

Train A GPT-2 LLM, Using Only Pure C Code

[Andrej Karpathy] recently released llm.c, a project that focuses on LLM training in pure C, once again showing that working with these tools isn’t necessarily reliant on sprawling development environments. GPT-2 may be older but is perfectly relevant, being the granddaddy of modern LLMs (large language models) with a clear heritage to more modern offerings.

LLMs are fantastically good at communicating despite not actually knowing what they are saying, and training them usually relies on PyTorch deep learning library, itself written in Python. llm.c takes a simpler approach by implementing the neural network training algorithm for GPT-2 directly. The result is highly focused and surprisingly short: about a thousand lines of C in a single file. It is a highly elegant process that does the same thing the bigger, clunkier methods accomplish. It can run entirely on a CPU, or it can take advantage of GPU acceleration, where available.

This isn’t the first time [Andrej Karpathy] has bent his considerable skills and understanding towards boiling down these sorts of concepts into bare-bones implementations. We previously covered a project of his that is the “hello world” of GPT, a tiny model that predicts the next bit in a given sequence and offers low-level insight into just how GPT (generative pre-trained transformer) models work.

Australian Library Uses Chatbot To Imitate Veteran With Predictable Results

The educational sector is usually the first to decry large language models and AI, due to worries about cheating. The State Library of Queensland, however, has embraced the technology in controversial fashion. In the lead-up to Anzac Day, the primarily Australian war memorial holiday, the library released a chatbot intended to imitate a World War One veteran. It went as well as you’d expect.

The highlighted line was apparently added to the chatbot’s instructions later on to help shut down tomfoolery.

Twitter users immediately chimed in with dismay at the very concept. Others showed how easy it was to “jailbreak” the AI, convincing Charlie he was actually supposed to teach Python, imitate Frasier Crane, or explain laws like Elle from Legally Blonde. One person figured out how to get Charlie to spit out his initial instructions; these were patched later in the day to try and stop some of the shenanigans.

From those instructions, it’s clear that this was supposed to be educational, rather than some sort of macabre experiment. However, Charlie didn’t do a great job here, either. As with any Large Language Model, Charlie had no sense of objective truth. He routinely spat out incorrect facts regarding the war, and regularly contradicted himself.

Generally, any plan that includes the words “impersonate a veteran” is a foolhardy one at best. Throwing a machine-generated portrait and a largely uncontrolled AI into the mix didn’t help things. Regardless, the State Library has left the “Virtual Veterans” experience up at the time of writing.

The problem with AI is that it’s not a magic box that gets things right all the time. It never has been. As long as organizations keep putting AI to use in ways like this, the same story will keep playing out.

AI System Drops A Dime On Noisy Neighbors

“There goes the neighborhood” isn’t a phrase to be thrown about lightly, but when they build a police station next door to your house, you know things are about to get noisy. Just how bad it’ll be is perhaps a bit subjective, with pleas for relief likely to fall on deaf ears unless you’ve got firm documentation like that provided by this automated noise detection system.

OK, let’s face it — even with objective proof there’s likely nothing that [Christopher Cooper] is going to do about the new crop of sirens going off in his neighborhood. Emergencies require a speedy response, after all, and sirens are perhaps just the price that we pay to live close to each other. That doesn’t mean there’s no reason to monitor the neighborhood noise, though, so [Christopher] got to work. The system uses an Arduino BLE Sense module to detect neighborhood noises and Edge Impulse to classify the sounds. An ESP32 does most of the heavy lifting, including running the UI on a nice little TFT touchscreen.

When a siren-like sound is detected, the sensor records the event and tries to classify the type of siren — fire, police, or ambulance. You can also manually classify sounds the system fails to understand, and export a summary of events to an SD card. If your neighborhood noise problems tend more to barking dogs or early-morning leaf blowers, no problem — you can easily train different models.

While we can’t say that this will help keep the peace in his neighborhood, we really like the way this one came out. We’ve seen the BLE Sense and Edge Impulse team up before, too, for everything from tuning a bike suspension to calming a nervous dog. Continue reading “AI System Drops A Dime On Noisy Neighbors”

AI + LEGO = A Brickton Of Ideas

What if there was some magic device that could somehow scan all your LEGO and tell you what you can make with it? It’s a childhood dream come true, right? Well, that device is in your pocket. Just dump out your LEGO stash on the carpet, spread it out so there’s only one layer, scan it with your phone, and after a short wait, you get a list of all the the fun things you can make. With building instructions. And oh yeah, it shows you where each brick is in the pile.

We are talking about the BrickIt app, which is available for Android and Apple. Check it out in the short demo after the break. Having personally tried the app, we can say it does what it says it does and is in fact quite cool.

As much as it may pain you to have to pick up all those bricks when you’re finished, it really does work better against a neutral background like light-colored carpet. In an attempt to keep the bricks corralled, we tried a wooden tray, and it didn’t seem to be working as well as it probably could have — it didn’t hold that many bricks, and they couldn’t be spread out that far.

And the only real downside is that results are limited because there’s a paid version. And the app is kind of constantly reminding you of what you’re missing out on. But it’s still really, really cool, so check it out.

We don’t have to tell you how versatile LEGO is. But have you seen this keyboard stand, or this PCB vise?

Continue reading “AI + LEGO = A Brickton Of Ideas”

A pair of hands holds a digital camera. "NUCA" is written in the hood above the lens and a black grip is on the right hand side of the device (left side of image). The camera body is off-white 3D printed plastic. The background is a pastel yellow.

AI Camera Only Takes Nudes

One of the cringier aspects of AI as we know it today has been the proliferation of deepfake technology to make nude photos of anyone you want. What if you took away the abstraction and put the faker and subject in the same space? That’s the question the NUCA camera was designed to explore. [via 404 Media]

[Mathias Vef] and [Benedikt Groß] designed the NUCA camera “with the intention of critiquing the current trajectory of AI image generation.” The camera itself is a fairly unassuming device, a 3D-printed digital camera (19.5 × 6 × 1.5 cm) with a 37 mm lens. When the camera shutter button is pressed, a nude image is generated of the subject.

The final image is generated using a mixture of the picture taken of the subject, pose data, and facial landmarks. The photo is run through a classifier which identifies features such as age, gender, body type, etc. and then uses those to generate a text prompt for Stable Diffusion. The original face of the subject is then stitched onto the nude image and aligned with the estimated pose. Many of the sample images on the project’s website show the bias toward certain beauty ideals from AI datasets.

Looking for more ways to use AI with cameras? How about this one that uses GPS to imagine a scene instead. Prefer to keep AI out of your endeavors to invade personal space? How about building your own TSA body scanner?

 

Hackaday Links Column Banner

Hackaday Links: April 21, 2024

Do humanoid robots dream of electric retirement? Who knows, but maybe we can ask Boston Dynamics’ Atlas HD, which was officially retired this week. The humanoid robot, notable for its warehouse Parkour and sweet dance moves, never went into production, at least not as far as we know. Atlas always seemed like it was intended to be an R&D platform, to see what was possible for a humanoid robot, and in that way it had a heck of a career. But it’s probably a good thing that fleets of Atlas robots aren’t wandering around shop floors or serving drinks, especially given the number of hydraulic blowouts the robot suffered. That also seems to be one of the lessons Boston Dynamics learned, since Atlas’ younger, nimbler replacement is said to be all-electric. From the thumbnail, the new kid already seems pretty scarred and battered, so here’s hoping we get to see some all-electric robot fails soon.

Continue reading “Hackaday Links: April 21, 2024”