More Details On Why DeepSeek Is A Big Deal

The DeepSeek large language models (LLM) have been making headlines lately, and for more than one reason. IEEE Spectrum has an article that sums everything up very nicely.

We shared the way DeepSeek made a splash when it came onto the AI scene not long ago, and this is a good opportunity to go into a few more details of why this has been such a big deal.

For one thing, DeepSeek (there’s actually two flavors, -V3 and -R1, more on them in a moment) punches well above its weight. DeepSeek is the product of an innovative development process, and freely available to use or modify. It is also indirectly highlighting the way companies in this space like to label their LLM offerings as “open” or “free”, but stop well short of actually making them open source.

The DeepSeek-V3 LLM was developed in China and reportedly cost less than 6 million USD to train. This was possible thanks to developing DualPipe, a highly optimized and scalable method of training the system despite limitations due to export restrictions on Nvidia hardware. Details are in the technical paper for DeepSeek-V3.

There’s also DeepSeek-R1, a chain-of-thought “reasoning” model which handily provides its thought process enclosed within easily-parsed <think> and </think> pseudo-tags that are included in its responses. A model like this takes an iterative step-by-step approach to formulating responses, and benefits from prompts that provide a clear goal the LLM can aim for. The way DeepSeek-R1 was created was itself novel. Its training started with supervised fine-tuning (SFT) which is a human-led, intensive process as a “cold start” which eventually handed off to a more automated reinforcement learning (RL) process with a rules-based reward system. The result avoided problems that come from relying too much on RL, while minimizing the human effort of SFT. Technical details on the process of training DeepSeek-R1 are here.

DeepSeek-V3 and -R1 are freely available in the sense that one can access the full-powered models online or via an app, or download distilled models for local use on more limited hardware. It is free and open as in accessible, but not open source because not everything needed to replicate the work is actually released. Like with most LLMs, the training data and actual training code used are not available.

What is released and making waves of its own are the technical details of how researchers produced what they did, and that means there are efforts to try to make an actually open source version. Keep an eye out for Open-R1!

Modern AI On Vintage Hardware: LLama 2 Runs On Windows 98

[EXO Labs] demonstrated something pretty striking: a modified version of Llama 2 (a large language model) that runs on Windows 98. Why? Because when it comes to personal computing, if something can run on Windows 98, it can run on anything. More to the point: if something can run on Windows 98 then it’s something no tech company can control how you use, no matter how large or influential they may be. More on that in a minute.

Ever wanted to run a local LLM on 25 year old hardware? No? Well now you can, and at a respectable speed, too!

What’s it like to run an LLM on Windows 98? Aside from the struggles of things like finding compatible peripherals (back to PS/2 hardware!) and transferring the required files (FTP over Ethernet to the rescue) or even compilation (some porting required), it works maybe better than one might expect.

A Windows 98 machine with Pentium II processor and 128 MB of RAM generates a speedy 39.31 tokens per second with a 260K parameter Llama 2 model. A much larger 15M model generates 1.03 tokens per second. Slow, but it works. Going even larger will also work, just ever slower. There’s a video on X that shows it all in action.

It’s true that modern LLMs have billions of parameters so these models are tiny in comparison. But that doesn’t mean they can’t be useful. Models can be shockingly small and still be perfectly coherent and deliver surprisingly strong performance if their training and “job” is narrow enough, and the tools to do that for oneself are all on GitHub.

This is a good time to mention that this particular project (and its ongoing efforts) are part of a set of twelve projects by EXO Labs focusing on ensuring things like AI models can be run anywhere, by anyone, independent of tech giants aiming to hold all the strings.

And hey, if local AI and the command line is something that’s up your alley, did you know they already exist as single-file, multi-platform, command-line executables?

Hackaday Links Column Banner

Hackaday Links: December 15, 2024

It looks like we won’t have Cruise to kick around in this space anymore with the news that General Motors is pulling the plug on its woe-beset robotaxi project. Cruise, which GM acquired in 2016, fielded autonomous vehicles in various test markets, but the fleet racked up enough high-profile mishaps (first item) for California regulators to shut down test programs in the state last year. The inevitable layoffs ensued, and GM is now killing off its efforts to build robotaxis to concentrate on incorporating the Cruise technology into its “Super Cruise” suite of driver-assistance features for its full line of cars and trucks. We feel like this might be a tacit admission that surmounting the problems of fully autonomous driving is just too hard a nut to crack profitably with current technology, since Super Cruise uses eye-tracking cameras to make sure the driver is paying attention to the road ahead when automation features are engaged. Basically, GM is admitting there still needs to be meat in the seat, at least for now.

Continue reading “Hackaday Links: December 15, 2024”

An Animated Walkthrough Of How Large Language Models Work

If you wonder how Large Language Models (LLMs) work and aren’t afraid of getting a bit technical, don’t miss [Brendan Bycroft]’s LLM Visualization. It is an interactively-animated step-by-step walk-through of a GPT large language model complete with animated and interactive 3D block diagram of everything going on under the hood. Check it out!

nano-gpt has only around 85,000 parameters, but the operating principles are all the same as for larger models.

The demonstration walks through a simple task and shows every step. The task is this: using the nano-gpt model, take a sequence of six letters and put them into alphabetical order.

A GPT model is a highly complex prediction engine, so the whole process begins with tokenizing the input (breaking up words and assigning numerical values to the chunks) and ends with choosing an appropriate output from a list of probabilities. There are of course many more steps in between, and different ways to adjust the model’s behavior. All of these are made quite clear by [Brendan]’s process breakdown.

We’ve previously covered how LLMs work, explained without math which eschews gritty technical details in favor of focusing on functionality, but it’s also nice to see an approach like this one, which embraces the technical elements of exactly what is going on.

We’ve also seen a much higher-level peek at how a modern AI model like Anthropic’s Claude works when it processes requests, extracting human-understandable concepts that illustrate what’s going on under the hood.

Playing Chess Against LLMs And The Mystery Of Instruct Models

At first glance, trying to play chess against a large language model (LLM) seems like a daft idea, as its weighted nodes have, at most, been trained on some chess-adjacent texts. It has no concept of board state, stratagems, or even whatever a ‘rook’ or ‘knight’ piece is. This daftness is indeed demonstrated by [Dynomight] in a recent blog post (Substack version), where the Stockfish chess AI is pitted against a range of LLMs, from a small Llama model to GPT-3.5. Although the outcomes (see featured image) are largely as you’d expect, there is one surprise: the gpt-3.5-turbo-instruct model, which seems quite capable of giving Stockfish a run for its money, albeit on Stockfish’s lower settings.

Each model was given the same query, telling it to be a chess grandmaster, to use standard notation, and to choose its next move. The stark difference between the instruct model and the others calls investigation. OpenAI describes the instruct model as an ‘InstructGPT 3.5 class model’, which leads us to this page on OpenAI’s site and an associated 2022 paper that describes how InstructGPT is effectively the standard GPT LLM model heavily fine-tuned using human feedback.

Continue reading “Playing Chess Against LLMs And The Mystery Of Instruct Models”

This Week In Security: Linux VMs, Real AI CVEs, And Backscatter TOR DoS

Steve Ballmer famously called Linux “viral”, with some not-entirely coherent complaints about the OS. In a hilarious instance of life imitating art, Windows machines are now getting attacked through malicious Linux VM images distributed through phishing emails.

This approach seems to be intended to fool any anti-malware software that may be running. The VM includes the chisel tool, described as “a fast TCP/UDP tunnel, transported over HTTP, secured via SSH”. Now that’s an interesting protocol stack. It’s an obvious advantage for an attacker to have a Linux VM right on a target network. As this sort of virtualization does require hardware virtualization, it might be worth disabling the virtualization extensions in BIOS if they aren’t needed on a particular machine.

AI Finds Real CVE

We’ve talked about some rather unfortunate use of AI, where aspiring security researchers asked an LLM to find vulnerabilities in a project like curl, and then completely wasted a maintainer’s time on those bogus reports. We happened to interview Daniel Stenberg on FLOSS Weekly this week, and after he recounted this story, we mused that there might be a real opportunity to use LLMs to find vulnerabilities, when used as a way to direct fuzzing, and when combined with a good test suite.

And now, we have Google Project Zero bringing news of their Big Sleep LLM project finding a real-world vulnerability in SQLite. This tool was previously called Project Naptime, and while it’s not strictly a fuzzer, it does share some similarities. The main one being that both tools take their educated guesses and run that data through the real program code, to positively verify that there is a problem. With this proof of concept demonstrated, it’s sure to be replicated. It seems inevitable that someone will next try to get an LLM to not only find the vulnerability, but also find an appropriate fix. Continue reading “This Week In Security: Linux VMs, Real AI CVEs, And Backscatter TOR DoS”

This Week In Security: Playing Tag, Hacking Cameras, And More

Wired has a fascinating story this week, about the length Sophos has gone to for the last 5 years, to track down a group of malicious but clever security researchers that were continually discovering vulnerabilities and then using those findings to attack real-world targets. Sophos believes this adversary to be overlapping Chinese groups known as APT31, APT41, and Volt Typhoon.

The story is actually refreshing in its honesty, with Sophos freely admitting that their products, and security products from multiple other vendors have been caught in the crosshairs of these attacks. And indeed, we’ve covered stories about these vulnerabilities over the past weeks and months right here on this column. The sneaky truth is that many of these security products actually have pretty severe security problems.

The issues at Sophos started with an infection of an informational computer at a subsidiary office. They believe this was an information gathering exercise, that was a precursor to the widespread campaign. That campaign used multiple 0-days to crack “tens of thousands of firewalls around the world”. Sophos rolled out fixes for those 0-days, and included just a bit of extra logging as an undocumented feature. That logging paid off, as Sophos’ team of researchers soon identified an early signal among the telemetry. This wasn’t merely the first device to be attacked, but was actually a test device used to develop the attack. The game was on. Continue reading “This Week In Security: Playing Tag, Hacking Cameras, And More”