Segment Anything, recently released by Facebook Research, does something that most people who have dabbled in computer vision have found daunting: reliably figure out which pixels in an image belong to an object. Making that easier is the goal of the Segment Anything Model (SAM), just released under the Apache 2.0 license.
The results look fantastic, and there’s an interactive demo available where you can play with the different ways SAM works. One can pick out objects by pointing and clicking on an image, or images can be automatically segmented. It’s frankly very impressive to see SAM make masking out the different objects in an image look so effortless. What makes this possible is machine learning, and part of that is the fact that the model behind the system has been trained on a huge dataset of high-quality images and masks, making it very effective at what it does.
Someone wants to learn about Arduino programming. Do you suggest they blink an LED first? Or should they go straight for a 3D laser scanner with galvos, a time-of-flight sensor, and multiple networking options? Most of us need to start with the blinking light and move forward from there. So what if you want to learn about the latest wave of GPT — generative pre-trained transformer — programs? Do you start with a language model that looks at thousands of possible tokens in large contexts? Or should you start with something simple? We think you should start simple, and [Andrej Karpathy] agrees. He has a workbook that makes a tiny GPT that can predict the next bit in a sequence. It isn’t any more practical than a blinking LED, but it is a manageable place to start.
The simple example starts with a vocabulary of two. In other words, characters are 1 or 0. It also uses a context size of 3, so it will look at 3 bits and use that to infer the 4th bit. To further simplify things, the examples assume you will always get a fixed-size sequence of tokens, in this case, eight tokens. Then it builds a little from there.
[BioBootloader] combined Python and a hefty dose of of AI for a fascinating proof of concept: self-healing Python scripts. He shows things working in a video, embedded below the break, but we’ll also describe what happens right here.
The demo Python script is a simple calculator that works from the command line, and [BioBootloader] introduces a few bugs to it. He misspells a variable used as a return value, and deletes the subtract_numbers(a, b) function entirely. Running this script by itself simply crashes, but using Wolverine on it has a very different outcome.
Wolverine is a wrapper that runs the buggy script, captures any error messages, then sends those errors to GPT-4 to ask it what it thinks went wrong with the code. In the demo, GPT-4 correctly identifies the two bugs (even though only one of them directly led to the crash) but that’s not all! Wolverine actually applies the proposed changes to the buggy script, and re-runs it. This time around there is still an error… because GPT-4’s previous changes included an out of scope return statement. No problem, because Wolverine once again consults with GPT-4, creates and formats a change, applies it, and re-runs the modified script. This time the script runs successfully and Wolverine’s work is done.
LLMs (Large Language Models) like GPT-4 are “programmed” in natural language, and these instructions are referred to as prompts. A large chunk of what Wolverine does is thanks to a carefully-written prompt, and you can read it here to gain some insight into the process. Don’t forget to watch the video demonstration just below if you want to see it all in action.
[James Turk] has a novel approach to the problem of scraping web content in a structured way without needing to write the kind of page-specific code web scrapers usually have to deal with. How? Just enlist the help of a natural language AI. Scrapeghost relies on OpenAI’s GPT API to parse a web page’s content, pull out and classify any salient bits, and format it in a useful way.
What makes Scrapeghost different is how data gets organized. For example, when instantiating scrapeghost one defines the data one wishes to extract. For example:
The kicker is that this format is entirely up to you! The GPT models are very, very good at processing natural language, and scrapeghost uses GPT to process the scraped data and find (using the example above) whatever looks like a name, district, party, photo, and office address and format it exactly as requested.
It’s an experimental tool and you’ll need an API key from OpenAI to use it, but it has useful features and is certainly a novel approach. There’s a tutorial and even a command-line interface, so check it out.
By now we’ve all seen articles where the entire copy has been written by ChatGPT. It’s essentially a trope of its own at this point, so we will start out by assuring you that this article is being written by a human. AI tools do seem poised to be extremely disruptive to certain industries, though, but this doesn’t necessarily have to be a bad thing as long as they continue to be viewed as tools, rather than direct replacements. ChatGPT can be used to assist in plenty of tasks, and can help augment processes like programming (rather than becoming the programmer itself), and this article shows a few examples of what it might be used for.
While it can write some programs on its own, in some cases quite capably, for specialized or complex tasks it might not be quite up to the challenge yet. It will often appear extremely confident in its solutions even if it’s providing poor or false information, though, but that doesn’t mean it can’t or shouldn’t be used at all.
The article goes over a few of the ways it can function more as an assistant than a programmer, including generating filler content for something like an SQL database, converting data from one format to another, converting programs from one language to another, and even help with a program’s debugging process.
Some other things that ChatGPT can be used for that we’ve been able to come up with include asking for recommendations for libraries we didn’t know existed, as well as asking for music recommendations to play in the background while working. Tools like these are extremely impressive, and while they likely aren’t taking over anyone’s job right now, that might not always be the case.
You might have heard about LLaMa or maybe you haven’t. Either way, what’s the big deal? It’s just some AI thing. In a nutshell, LLaMa is important because it allows you to run large language models (LLM) like GPT-3 on commodity hardware. In many ways, this is a bit like Stable Diffusion, which similarly allowed normal folks to run image generation models on their own hardware with access to the underlying source code. We’ve discussed why Stable Diffusion matters and even talked about how it works.
LLaMa is a transformer language model from Facebook/Meta research, which is a collection of large models from 7 billion to 65 billion parameters trained on publicly available datasets. Their research paper showed that the 13B version outperformed GPT-3 in most benchmarks and LLama-65B is right up there with the best of them. LLaMa was unique as inference could be run on a single GPU due to some optimizations made to the transformer itself and the model being about 10x smaller. While Meta recommended that users have at least 10 GB of VRAM to run inference on the larger models, that’s a huge step from the 80 GB A100 cards that often run these models.
While this was an important step forward for the research community, it became a huge one for the hacker community when [Georgi Gerganov] rolled in. He released llama.cpp on GitHub, which runs the inference of a LLaMa model with 4-bit quantization. His code was focused on running LLaMa-7B on your Macbook, but we’ve seen versions running on smartphones and Raspberry Pis. There’s even a version written in Rust! A rough rule of thumb is anything with more than 4 GB of RAM can run LLaMa. Model weights are available through Meta with some rather strict terms, but they’ve been leaked online and can be found even in a pull request on the GitHub repo itself. Continue reading “Why LLaMa Is A Big Deal”→
In the old days of the Internet, FTP was sufficient for downloading the occasional file. But with the widespread use of computer audio and video, it was easy to swamp an FTP server so — eventually — BitTorrent was born. The idea was you would download bits and pieces of a file from different places and, in theory, people would download bits and pieces that you have if they need them. Now Petals wants to use this same method with language models. These AI language models are all the rage, but they take significant computer resources. The idea behind Petals is like BitTorrent. You handle a small part of the model (about 8 gigabytes which is small compared to the 352 gigabytes required), and other people have other parts.
Of course, if you are privacy-minded, that means that some amount of your data is going out to the public, but for your latest chatbot experiments, that might not be a big problem. You can install Petals in an Anaconda environment or run a Docker image if you don’t want to set up anything. If you just want to access the distributed network’s chatbot based on BLOOMZ-176B, you can do that online.