The DeepSeek large language models (LLM) have been making headlines lately, and for more than one reason. IEEE Spectrum has an article that sums everything up very nicely.
We shared the way DeepSeek made a splash when it came onto the AI scene not long ago, and this is a good opportunity to go into a few more details of why this has been such a big deal.
For one thing, DeepSeek (there’s actually two flavors, -V3 and -R1, more on them in a moment) punches well above its weight. DeepSeek is the product of an innovative development process, and freely available to use or modify. It is also indirectly highlighting the way companies in this space like to label their LLM offerings as “open” or “free”, but stop well short of actually making them open source.
The DeepSeek-V3 LLM was developed in China and reportedly cost less than 6 million USD to train. This was possible thanks to developing DualPipe, a highly optimized and scalable method of training the system despite limitations due to export restrictions on Nvidia hardware. Details are in the technical paper for DeepSeek-V3.
There’s also DeepSeek-R1, a chain-of-thought “reasoning” model which handily provides its thought process enclosed within easily-parsed <think>
and </think>
pseudo-tags that are included in its responses. A model like this takes an iterative step-by-step approach to formulating responses, and benefits from prompts that provide a clear goal the LLM can aim for. The way DeepSeek-R1 was created was itself novel. Its training started with supervised fine-tuning (SFT) which is a human-led, intensive process as a “cold start” which eventually handed off to a more automated reinforcement learning (RL) process with a rules-based reward system. The result avoided problems that come from relying too much on RL, while minimizing the human effort of SFT. Technical details on the process of training DeepSeek-R1 are here.
DeepSeek-V3 and -R1 are freely available in the sense that one can access the full-powered models online or via an app, or download distilled models for local use on more limited hardware. It is free and open as in accessible, but not open source because not everything needed to replicate the work is actually released. Like with most LLMs, the training data and actual training code used are not available.
What is released and making waves of its own are the technical details of how researchers produced what they did, and that means there are efforts to try to make an actually open source version. Keep an eye out for Open-R1!
I see a FLOSSAIatHome project in the making.
Listening to Glenn Beck, he tried DeepSeek. He started asking questions where some of the responses could include China’s questionable history. Glenn took screen shots the whole time
At first, the response was correct. Then, he said the screen went blank, and showed a different response that cast a better light on China.
When Glenn asked DeepSearch about this, it lied; saying it did not display such info.
An AI that will lie to you. No thanks.
You realize that all of the major Western AI’s have been “safety aligned” too, yeah?
AIs all lie when that is a valid path to the goal. We did train them on human data, which is packed full of lies and deceit.
If it was the online one then duh. It’s going to their servers in China, they would basically be legally forced to so that.
Try running it locally. So far I haven’t had issues with the distilled versions at least. A full model might be more censored but would lack some ways they could keep it neutered and could still be tuned to repair it.
This also ignores how US models have been censored in stupid ways as well.
Oh yes, China’s “questionable” history. Read a history book sometime. π
It’s idiotic to be looking to a free LLM app from ANYWHERE as a trustworthy source of scholarship on politics or history. Listening to Glenn Beck isn’t such a good idea, either.
actually the censorhip is done online. That is why you can actully get a glimpse of the answer before it immediately censors itself.
If you run the model locally, it is not censored.
The main problem here is that you’re listening to Glenn Beck.
Instead of listening to social media influencers, try a few LLMs yourself. Llama and ChatGPT do the exact same thing, along with most other pre-trained LLMs.
If you want uncensored, train it yourself.
Glenn Beck, the alt right conspiracy nut?
Let’s be honest, the “sin” of DeepSeek is being made in China, so haters gonna hate. Sad situation, in the meanwhile, DS already help me to put together some nice trading bots, in almost 1/4 of the time it took me with chatgpt and half the time using Claude.
I’d like to see the term “freeware” popularized for models where the weights are available but the training data isn’t.
Because the “open source” models are only as open as the .exe files of freeware Windows and DOS applications.
Agreed. Didn’t understand that about deepseek until I read this article. I was suprised that such a cutting edge technology would be released as open source.
TheftWare would be more truthful.
All these models are trained using datasets filled without consent.
They need to be nuked from orbit, because fruit of the poisoned tree will always be poisonous.
I’m curious. How do you propose to teach a mind about the nature of our world by using entirely IP-free content? Do you think it would be possible to build a completely original simulacrum of the world and human history and society without using anybody’s IP and then use that to teach a child? No, that’s insane, it would take a hundred thousand years and in the end it wouldn’t actually resemble the real world, so it would fail. Would you insist that you can’t ever read a book to a learning being because that is “stealing” some person’s IP because those words are entering and integrating into their psyche?
Why do you think it’s possible to do this to teach a machine?
Companies pull “It’s my IP” all the time.
So I can’t say I feel bad for not allowing them to profit off things that aren’t theirs.
Btw the book analogy is flawed. A more accurate one would be if I took a book, read it, then sold a summarized version as my own work… Which yeah pretty much stealing.
I’m pretty sure there are a lot of works out there by people that read and collated multiple books to try and present accurate information.
By your definition that is all theft.
The thing is⦠We use AI as a tool to better our position in society, i.e. get better jobs, more money, more power, more respect, etc.
We donβt use our children for that.
The difference is in the intention of the training.
We train AIs with the intention to gain an advantage over other people.
We train children to make sure that they can survive in our society and have long and prosperous lives.
Quite a difference in intention.
Agree to disagree on children. Just watch gymnastics or any other sports where parents “force/coerceβ their children to compete. It is the hubris or ego that drives a lot of people. AI just like kids will feed into that.
Look what my kid accomplished thanks to me.
Look what my AI model is able to do thanks to me.
It’s the same compulsion.
It’s not about entirely IP free content it’s about using data that consumers were specifically told that would be private and local laws would be adhered to then inside China just using the data anyway because they really do not care one single bit about foreign laws on data, IP or privacy.
They just “kowtow” and play along when we ask them to.
How on earth do you think a nation can rise up so rapidly without ignoring all laws and stealing every bit of IP going?
And Im not suggesting that all western image recognition tech wasnt trained on all those “free photo storage” accounts with google. Of course it wasnt…
Disagree. Humans read books (from libraries), learn, and use the information.
Are you suggesting that I should pay royalties to Kernighan and Ritchie every time I write a line of code?
What is not OK, is regurgitating the book (or whatever) more or less verbatim with no attribution. All these ML machines should have hypertext output linking to the original sources, then royalty payments can be easily calculated and paid.
Intellectual property isn’t real. You do not get to tell other people what they can and cannot do with the information they have accessible to them, even if you published it. Copyright is inherently regressive and anti-competitive and the Chinese are rightfully ignoring it.
Sure it is real. If I never share my knowledge, then I have my IP and you never will.
And, this is where AI is taking us. People currently share knowledge online as “loss leaders” to get you to their website, substack, etc. where they can earn money from Ads, products, subscriptions, tips, etc.
But, if the AI systems scoop that info up and present it to others, allowing them to skip coming to your site, then eventually people will stop sharing their information and we’ll go back to a siloed world.
The data sets are all published openly and freely online for anyone to download for whatever use they want.
I’d say that publishing your writing on the Internet constitutes consent to others downloading it.
“publishing your writing on the Internet constitutes consent to others downloading it.” is not relevant. The copyright issue arises when someone tries to use/sell the same writing. This is a separate, and further, act.
Whether or not this is what the LLM models are doing is the central legal question. The current consensus is that LLM datasets are transformative enough to avoid copyright. But the output of LLMs, being machine generated, is not itself copyrightable.
Not saying any of this is correct, moral, or permanent. Just that this is the state of play.
Yeah people have forgotten what open source means again. I guess that in hindsight it was too much subtlety to entrust to the public.
The source code is literally published on GitHub for anyone to download and train. What more would you like to have them include? The hundreds of terabytes training data would be pretty hard to host anywhere and I doubt anyone would be interested in downloading their data set anyway. If you’re going to train it yourself anyway, why would you use their censored data set instead of your own?
And/but FWIW, they are working on better documenting their training data and methods, over and above the already most-open-ever-in-AI whitepapers they’ve written.
I think it’s reasonable, given Deepseek’s actions, to take their openness goals at face value. They are putting their money where their mouths are.
(Cough, “Open” AI. Heck, even cough, Meta, although they’re a world better than OAI.)
LM Studio or Ollama. Get yourself a mini deepseek running in an afternoon. Go go go.
Of course, that’s after a billion or two of investment in hardware and R&D.
Yeah, people who parrot that price point just seem dumb to me at this point.. That is like parroting the “peak wattage” rating of a cheap amplifier that refuses to publish RMS figures.
True they also claim that DeepSeek pre-training only took 2 months GPU time but they are forgetting that human evolution began some 4 billion years ago.
They conveniently didn’t include that time to impress lay people.
50k+ Nvidia processors in a multimillion dollar data center. Granted they have made their own data center to do AI work. That gives them incredible flexibility.
When writing software on your laptop, make sure to factor in the time and money that went into developing the hardware and software you’re running! I wonder how many billions of R&D went into the Dell laptop I’m writing this comment on.
This comment cost nothing to write, after a billion or two investment in hardware and R&D!
This is the difference between fixed cost and marginal cost, though. Yes, they spent money and time on R&D, and yes, this wasn’t their first model. They have a string of white papers for the last two years documenting their previous efforts.
But the point is that this model costs so much less to train than Open AI is claiming. And part of this reason is certainly the “moat” logic — Open AI wants you to believe that the fixed startup costs are so high that you don’t enter “their” market.
But now that Deepseek has made their methods, etc public, it would probably only cost you six to ten million to train a similar model. This is a provocation to enter the market and take away some of Open AI’s lunch money, and makes the whole “Stargate” thing look ridiculous, IMO.
But the real point is that the model runs so much more efficiently that they can sell time on their model for something like 30x less than OpenAI does. And OpenAI is reportedly running losses at these prices. I suspect the real difference is larger.
Deepseek trains on 5-10x less compute hours, and runs on 20-30x less. This is mostly the mixture-of-experts thing and their clever reinforcement learning training setup. It’s more efficient, and it was made more efficient by humans who thought hard about the problems rather than just spending money like it’s raining from the sky. (Which, to be fair, it was for Open AI.)
This space will see more shakeups in the next few years. Efficiency gains will be made. Better algorithms will be found. Billions of dollars spent last year will be millions in a year, and hundreds of thousands in a couple years. But by then the additional value of training a model from scratch will be essentially zero anyway, because we will have so many to choose from.
So the point about how much it cost to train is misleading, but the point about the long-run impact of the cheapening and democratization of AI models is not made strongly enough, IMO. We’ll see!
where are we on the bell curve atm? can’t wait till we are over the top already.
I did not consent to be scraped, I guess that’s rape.
The other day I saw it being referred to as ChatCCP.
It fails nearly every security test
https://www.google.com/search?q=deepseek+failed+security&oq=deepseek+failed+security
The crappy website demo does. That’s like saying algebra fails security tests because CoolMathGames.com got hacked.
So do most readers of hackaday, after a few beers.
I think we see the future with this model. I tried for fun to have it make an arduino project. It reasoned really well, and wrote the code somewhat correct. I compared it to chatgpt(both using the free chat online). I’d say the deepseek was ahead by, well, how to measure it, but, yeah, 4 times better?. It was still unusable though, due to the “server is busy” which could make each question take 20 minutes to get answered. It will not replace a programmer(yet). You still need to know how to program, and understand hardware to use it, but to be honnest; it was almost scary how close it got to actually make something work with just a few tweaks. Give it a few years, and we live in a different era.
Plus programming is just fun. Having someone, or something do it for me is, well, boring. The fun part is seeing your code work and do something. A program creating a program … yuck… Plus having to define the ‘specs’ … that’s documentation ….
Counter-point: Intellisense and tab-complete in code editors made coding fun for me. Having an AI copilot guess what I’m trying to do and filling in the boiler plate reduces the annoyance of coding.
Unless typing “for (i=0; i < array.Length; i++)” for the millionth time is what you consider fun. In which case, turn off the copilot and have fun.
Cntl-c, Cntl-v works fine. And if the millionth time, maybe use a function is in the cards instead of retyping everytime :) . I use geany for my work. Or notepad++ at the work place whether c/c++ or python, or….
co-pilot? Not on my machines … Never, ever. At least in Linux we can pick and chose what we want on our machines….
I have R1 running locally (ollama). It has its issues and should not be given agency, but put it under the control of 700 lines of python code and it can write entire books for you. It just finished one on bare metal GPU programming. 3460 A4 pages! The one on “Low latency programming under the hood” turned out amazing, particularly when the folks on X were trolling each other about it, as the book didn’t exist until I decided to generate it as a meta-troll. :-))