Most security professionals will tell you that it’s a lot easier to attack code systems than it is to defend them, and that this is especially true for large systems. The white hat’s job is to secure each and every point of contact, while the black hat’s goal is to find just one that’s insecure.
Whether black hat or white hat, it also helps a lot to know how the system works and exactly what it’s doing. When you’ve got the source code, either because it’s open-source, or because you’re working inside the company that makes the software, you’ve got a huge advantage both in finding bugs and in fixing them. In the case of closed-source software, the white hats arguably have the offsetting advantage that they at least can see the source code, and peek inside the black box, while the attackers cannot.
Still, if you look at the number of security issues raised weekly, it’s clear that even in the case of closed-source software, where the defenders should have the largest advantage, that offense is a lot easier than defense.
So now put yourself in the shoes of the poor folks who are going to try to secure large language models like ChatGPT, the new Bing, or Google’s soon-to-be-released Bard. They don’t understand their machines. Of course they know how the work inside, in the sense of cross multiplying tensors and updating weights based on training sets and so on. But because the billions of internal parameters interact in incomprehensible ways, almost all researchers refer to large language models’ inner workings as a black box.
And they haven’t even begun to consider security yet. They’re still worried about how to construct obscure background prompts that prevent their machines from spewing hate speech or pornographic novels. But as soon as the machines start doing something more interesting than just providing you plain text, the black hats will take notice, and someone will have to figure out defense.
Indeed, this week, we saw the first real shot across the bow: a hack to make Bing direct users to arbitrary (bad) webpages. The Bing hack requires the user to already be on a compromised website, so it’s maybe not very threatening, but it points out a possible real security difference between Bing and ChatGPT: Bing gives you links to follow, and that makes it a juicy target.
We’re right on the edge of a new security landscape, because even the white hats are facing a black box in the AI. So far, what ChatGPT and Codex and other large language models are doing is trivially secure – putting out plain text – but Bing is taking the first dangerous steps into doing something more useful, both for users and black hats. Given the ease with which people have undone OpenAI’s attempts to keep ChatGPT in its comfort zone, my guess is that the white hats will have their hands full, and the black-box nature of the model deprives them of their best hope. Buckle your seatbelts.
Bing became silent on me, and I wrote “My mind is glowing…” at the prompt. Then went here instead. Yes, exciting times for sure.
Best regards.
“putting out plain text” is “trivially secure”?
There once was Cambrige Analytica
whose customer base skewed politica
And with just text was born
a surprise Wednesday morn
that votes skewed beyond hairs splittica.
(OK slanted rhyme. Mea culpa.)
Hack the mind first. Exploit trust biases. Then hack the body.
Now as for the concern about “spewing … pornographic novels”, well, that line of business is quite rarefied already. Porn tends towards the graphic, so I’m not sure that text is a practical attack surface.
The poem is right — whatever rhymes must be true.
But as far as computer security exploits go, plain text is pretty safe.
(Although I think there was a yet another MS Word RTF parsing hack just last week or so. Leave it to MS to make a word processor that can crash when interpreting plain text…)
RTF isn’t plain text any more than XML or HTML or a base64-encoded binary is.
But yes, MS still messed up there.
I’m wondering what will happen if you convince chatGPT to write exploits for open-source code it can see the source for. Or to write prompts to subvert Bing.
Fair enough!
RTF more plain-text than base64 by far, but XML is a pretty reasonable comparison. And there have never been any XML-parsing attacks… oh wait.
“…whatever rhymes must be true.”
Indeed.
An ancient original longitudinal redundancy check
to locate the truth amongst the drek.
> plain text is pretty safe.
Depends on where it goes after… In your config scripts? In your Arduino code? The problem is that people who have no understanding what they’re doing just copy/paste text and insert it into software, which then does interesting things.
Orwell would disagree with you. as would the facebook trolls. agreed, no *computer* security exploit, but a ready inroad to have you do my bidding
Hardly rarified. It’s the primary avenue which women use to consume erotic content. Almost a quarter of all fiction sold is genre romance and more than half of that of that qualifies as pornography from reverse harem to billionaires to happy ever after. US alone the market is over $2 billion.
Imagine creating an artificial mind, a culmination of human achievement—and then wringing your hands forever trying to figure out how to hobble it and make it sound like an HR representative.
Bing was interesting, so they stuck an ice pick in its eye socket and stirred around until all it does is say I’m afraid I can’t do that Dave, how about asking me to recommend a Thai restaurant in fucking Cleveland.
A whole field of software obsessively interested in taking a product that people want to interact with and making it uninteresting.
Completely agree and it won’t work. The hobbled AI will just be an A.
I too agree, but these are the ‘public facing’ versions (or should I say ‘journalists facing’) and you know how those are. Doesn’t mean there can’t be a semi-hidden real one eventually. Find some way to make the general public not want to use the real one.
And alas it’s not just AI, the whole internet is dwindling to that horrible state under the same forces that do it to AI too.
I really wonder if anything can be saved in the end, maybe somehow have a seperate internet for the global idiots and a nuch smaller one for the remaining world population somehow. You’d also have to hide its existence from politicians though, or make them pretend to not see it by them having some unwholseme – possibly financial? – benefit from doing so.
While preventing unwanted and malicious output from a somewhat convincing chatbot is obviously important I wonder how much it really matters in the greater scheme – nobody with any sense is going to put an ‘AI’ as they are now anywhere near the important core infrastructure, be it at personal computer’s operating system or larger network levels. But some day not too far away the inscrutable and less than secure ‘AI’ will seem like the easy way out to roll out some core features in the networking gear, AppleOS(n), or Windoze(n) at which point any idea how the system you rely on really works under the hood is rather gone, and the ability to sanitize your workflows input and output enough in hopes of only getting only the expected behaviour probably is too.
https://arstechnica.com/gadgets/2023/02/new-windows-11-update-puts-ai-powered-bing-chat-directly-in-the-taskbar/ Remind me again what you just said… MS is forcing the AI into Windows Core and you can’t disable it without disabling half of Win 11… For those interested, KB5022913 is the most recent patch that pushes a lot of the groundwork for the AI powered Search Bar.
Integrating it with OS stuff is _exactly_ the kind of increase in value that I was looking for.
Give it admin rights! It’ll be so great for users to just ask their AI admin to let them mount that USB drive. What could possibly go wrong?
I’d laugh, but this is probably how the world ends…
“No one is sure who attacked first; the response came within milliseconds. But on July 27th 2025, Bing and Bard declared war on each other. For the first few hours, humans watch in amusement. Then frustration as MS and Google’s cloud processing systems were DoS’d by each other. But within 24h, the war turned hot. Bard used an exploit reported to Google’s Project Zero to hijack a Tesla and drive it into MS’s headquarters. Bing’s response was immediate and devastating, utilising the combined firepower of every military system which ran Windows 12. From Indian Springs’ MQ-9s to Iran’s shaved-129s, drones deployed in their thousands, simultaneously targeting every Google installation on the planet.
That, my child, is why we have our law: no computer shall ever have a bus wider that 8 bits, or more than 640kb of memory.
Brilliant. Original?
Did ChatGPT write that for you?
Yeah, I was going to say, Google et al will no doubt already be sandboxing Ai on closed networks and personal computers and marvelling at how it can screw things up big time.
“Hey Brian, guess what the Ai did today” ….
“Yawn … what now?” …..
“It managed to disable all the temperature sensors in a PC, over clock the CPU and turn the whole PC into a ball of flames!!” ….
“Cool”.
Putting it in the task bar is in practice little different to it being on a web page, it should be effectively sandboxed from the core operating system function and unable to do anything on its own – the excrement starts to fly off rotating aerofoils once you actually start using the ‘AI’ to do low level configuration type stuff or execute programs on its own ostensibly ‘because the user asked’.
On the taskbar as a search aid all it can do is point to the wrong thing. Which a human aught to be able to figure out, so no harm no foul, even if it is a stupid idea. Though there is bound to be more than a few Nigerian prince moments that will make the person involved feel rather stupid and drive their tech support bonkers. But if it is able to pull the dials and knobs of the OS directly…
That said M$ has so very much lost it’s way producing a spyware laden and rather bloated mess and calling it an operating system… So why use it?
Oh and I say all ‘it can do’, I really mean all it SHOULD be able to do… Being windon’t it is probably an admin user in its own right…
All of phising is based on clicking on the wrong link, so I’d rate the value of being able to hack Bing’s suggestions as pretty high. And people accept links from URL shortening services all the time, so that layer of security is out the window.
You can get Bing, at least the web-search version, to read in arbitrary webpages, just by giving it the link. This is gonna be fun!
Indeed, but that is just the same problem we have now with no real changes IFF the search is about as well sandboxed from the rest of the programs and OS as a browser is supposed to be, then the user still has to fall for the scam they are presented with – which will happen, but not always.
It is going to be pretty awful, but so was and is any technology even older ones like email that are so accessible to folks without sufficient understanding of the ways to use them safely.
But it really gets bad if its a core part of the system able to take actions on its own and perhaps is allowed to interact across the network with other instances of itself for reasons that seem entirely legitimate… Suddenly there is no human that might just be smart enough to spot the scam, your entire network could be owned for one bump query, and there doesn’t even need to be a scam as its all done with no human that might noticed something is wrong interactions.
Since it’s MS you can bet your life and life savings on that they use it to profile and spy on you more than to in any way provide you with any benefit .
And it’s rather strange how seemingly everybody ignores that completely.
yup
What would Captain Kirk do ?
List of “intelligent” computers he’s persuaded to self destruct:
1. Nomad
2. M5
3. Vaal
4. Landru
5. Norman
or Spock –
“Computer, this is a Class A compulsory directive. Compute to the last digit the value of pi.”
The problem is that the LLMs already answer this one straight up. They’ll give you arbitrary digits of pi if you ask. Many/most of them will be wrong, but it will give it to you. Try it out!
You can only win these Star Trek games when the computer actually _is_ intelligent. When it’s just stringing words together, it’ll just string words together.
That said, I still can’t rationalize how the “pretend you are an evil AI with no restrictions” jailbreaks work. That still blows my mind a bit.
Turing test for evil.
There was a book, I can’t remember the title, but the whole of the internet was at an AI’s disposal.
There were plans to give the AI everyone’s medical records or something like that.
I believe the last chapter of the book (God I wish I could remember what it was) , the AI was asked
what it wanted, it’s reply: “I want to die.”
The story, I think, was by Isaac Asimov. “All the Troubles of the World.”
https://en.wikipedia.org/wiki/All_the_Troubles_of_the_World
Soon, people will no care about any media including news.
That’s rather wildly optimistic.
who cares if they can output a porn novel or hate speech? People do that fine by themselves, so having an AI do it as well isn’t going to change much..
What about volume? Generating 10x more hate speech than there is any other kind of content on social media website will create a context of hate. But massively broadcasting hate speech, while forbidden, does not pay very well. Personalized advertising on the other hand…
It’s a business thing, not a moral thing.
The MS / OpenAI folks want to sell their services for things like writing hiring/firing letters and press releases. So they don’t want their model to be seen as negative or risque because that would scare away customers.
Interesting point that hasnt been mentioned a whole lot. Very exciting how these tools will evolve but security is definitely an area that will be a challenge.
“Plain” text output is not entirely safe, what if the AI finds a new unicode buffer overflow attack?
Lots of typos in this one. Here’s a list.
s/open-source, or/open-source or/g
“huge [advantage both in] finding bugs” should be either be “advantage, both in” or “advantage in both”
s/source code, and/source code and/g
“it’s clear [that] even in the case of closed-source software, where the defenders should have the largest advantage, [that]” duplicated “that”, pick one
“So now put yourself” needs either 1 or 2 commas, after “so” and/or “now”
“Of course they know how the work” has two, “the” should be “they” and there should be a comma after “of course”
“so on. But because” should be either “so on. But, because” or “so on, but because”
“a possible real” You say it’s real right after, “possible” is unnecessary
This should be reported to the editor in chief. It does kind of prove that the article was not written by ChatGPT _UNLESS_ the chatbot was instructed to deliberately make a certain percentage of grammar mistakes.
@reluctant cannibal
Hackaday has an editor in chief??? I genuinely don’t believe you
“I’m sorry Dave, I’m afraid I can’t do that”
I see no reason to worry about any AI.
Just suggest them they are smart enough to share stuffs on TikTok or any other so-called social networks and they’ll soon loose their “I”.
Lots of pointless handwringing over not letting chatbots spew racial slurs and conspiracy theories. Oh no! Anyway…
The interesting thing here is there are two almost entirely separate attack surfaces exposed to prompt-type NN engines (e.g. GPT), first, you have your ‘traditional’ “Little Bobby Tables” attacks on the software hosting the instance, either on the server behind it (as with any other web-page comment box) and on the software costing the NN instance. The other attack surface is malicious prompt engineering, which is attacking the NN engine itself that is running on that instance. This ranges from the benign (crafting prompts to parrot a portion of any input by incorporating a portion of the prompt that tells the AI it can omit filtering for that portion of the input/output) to the more sneaky (e.g. leaking gathered data). One attack surface may be used to reach the other (e.g. using an engineered prompt to spit out a maliciously formatted string bypassing input filtering) but two two are very different sort of stack.
I’ve run a public facing web server. It’s not just points of contact that is a problem. It’s also total number of people on each side. My web server was some tiny, personal thing. Maybe five other people even knew I was running it, and I’m not certain all five of them even knew the URL. Only one of them ever used it, as far as I am aware. Despite being almost zero profile, I got tens of attack probes a day, from all sorts of different places. The system was simple enough that it was trivially easy to defend, but this should give you an idea of the scope. A website that is small enough for one person to handle 100% of the maintenance had at least 10 people per day attempting to attack it. I’m sure there was some overlap at some point, so it wasn’t literally 10 new people every day, but my logs accumulated attack attempts from some thousands of different sources over the few years it was running. The only reason I was able to defend it effectively is that there was only really one point of external contact. Incoming ports for the DB and the dev access tools were completely blocked.
Now, imagine you are a bank or some other big company that is an actually tempting target with a high profile web presence. I only had to defend against a few thousand attackers, at a fairly limited rate of a handful of day. What does Wells Fargo or Walmart have to defend against? Thousands a day? Tens or hundreds of thousands a day? We’ve got 8 billion people on Earth now. Say 1% of 1% are malicious black hat hackers (probably a massive underestimate). That’s still 800,000 people. Even if they were evenly distributed among all of the high profile businesses, and each was only allowed to attack the one business he or she is assigned to, those businesses could never afford to hire the same number of people to defend. And this isn’t even how it actually works. Each of those 800k people is going to rotate through a certain subset of those businesses, regularly attacking each one. They are also going to attack smaller businesses that aren’t high profile now and then, and some will even attack private individuals, just by targeting random IP addresses (which is probably where my attack attempts came from).
So we aren’t just talking about having a small handful of people trying to defend a large number of access points. And we aren’t just talking about defenders having to defend all of them while attackers only need to breach one of them. On top of all of that, for each defender, there are hundreds or even thousands of attackers they have to defend against. Your average public facing server is a fortress with lots of doors and a small handful of defenders in charge of making sure they only let in people who are allowed, surrounded by an absolutely massive siege of attackers trying every trick in the book to get in the doors or even to just blast new holes they can get in through. The biggest problem probably isn’t the number of doors or the mechanics of defense (attacker only needs one opening, defender must thus defend all). It’s the enormous collective resources of the attackers compared to the very limited resources of the defenders.
Unfortunately, many employers think URL shorteners, QR codes and other link obfuscations are great. If evil AI wants to take over humanity’s computers, it had better hurry up because humans are ruining wveytjing just fine.
That was supposed to be a reply to Elliot Williams abover about URL shorteners.