Most security professionals will tell you that it’s a lot easier to attack code systems than it is to defend them, and that this is especially true for large systems. The white hat’s job is to secure each and every point of contact, while the black hat’s goal is to find just one that’s insecure.
Whether black hat or white hat, it also helps a lot to know how the system works and exactly what it’s doing. When you’ve got the source code, either because it’s open-source, or because you’re working inside the company that makes the software, you’ve got a huge advantage both in finding bugs and in fixing them. In the case of closed-source software, the white hats arguably have the offsetting advantage that they at least can see the source code, and peek inside the black box, while the attackers cannot.
Still, if you look at the number of security issues raised weekly, it’s clear that even in the case of closed-source software, where the defenders should have the largest advantage, that offense is a lot easier than defense.
So now put yourself in the shoes of the poor folks who are going to try to secure large language models like ChatGPT, the new Bing, or Google’s soon-to-be-released Bard. They don’t understand their machines. Of course they know how the work inside, in the sense of cross multiplying tensors and updating weights based on training sets and so on. But because the billions of internal parameters interact in incomprehensible ways, almost all researchers refer to large language models’ inner workings as a black box.
And they haven’t even begun to consider security yet. They’re still worried about how to construct obscure background prompts that prevent their machines from spewing hate speech or pornographic novels. But as soon as the machines start doing something more interesting than just providing you plain text, the black hats will take notice, and someone will have to figure out defense.
Indeed, this week, we saw the first real shot across the bow: a hack to make Bing direct users to arbitrary (bad) webpages. The Bing hack requires the user to already be on a compromised website, so it’s maybe not very threatening, but it points out a possible real security difference between Bing and ChatGPT: Bing gives you links to follow, and that makes it a juicy target.
We’re right on the edge of a new security landscape, because even the white hats are facing a black box in the AI. So far, what ChatGPT and Codex and other large language models are doing is trivially secure – putting out plain text – but Bing is taking the first dangerous steps into doing something more useful, both for users and black hats. Given the ease with which people have undone OpenAI’s attempts to keep ChatGPT in its comfort zone, my guess is that the white hats will have their hands full, and the black-box nature of the model deprives them of their best hope. Buckle your seatbelts.