For a brief window of time in the mid-2010s, a fairly common joke was to send voice commands to Alexa or other assistant devices over video. Late-night hosts and others would purposefully attempt to activate voice assistants like these en masse and get them to do ridiculous things. This isn’t quite as common of a gag anymore and was relatively harmless unless the voice assistant was set up to do something like automatically place Amazon orders, but now that much more powerful AI tools are coming online we’re seeing that joke taken to its logical conclusion: prompt-injection attacks.
Prompt injection attacks, as the name suggests, involve maliciously inserting prompts or requests in interactive systems to manipulate or deceive users, potentially leading to unintended actions or disclosure of sensitive information. It’s similar to something like an SQL injection attack in that a command is embedded in something that seems like a normal input at the start. Using an AI like GPT comes with an inherent risk of attacks like this when using it to automate tasks, as commands to the AI can be hidden where a user might not expect to see them, like in this demonstration where hidden prompts for a ChatGPT plugin are hidden in YouTube video transcripts to attempt to get ChatGPT to perform actions outside of those the original user would have asked for.
While this specific attack is more of a proof-of-concept, it’s foreseeable that as these tools become more sophisticated and interconnected in our lives, the risks of a malicious attacker causing harm start to rise. Restricting how much access we give networked computerized systems is certainly one option, similar to sandboxing or containerizing websites so they can’t all share cookies amongst themselves, but we should start seeing some thought given to these attacks by the developers of AI tools in much the same way that we hope developers are sanitizing SQL inputs.
12 thoughts on “Prompt Injection: An AI-Targeted Attack”
I love this so much!
…and here’s $10 on “WWIII is started by an AI interpreting the transcript to a movie as instructions”
I’d guess, unlike SQL injection and the like, this is much harder to detect and prevent, but also due to the massive industry that is around hacking, random ware, etc, AIs will be subjected to these attacks immediately.
Shall we play a game?
Is that you, WOPR?
King on B6..execute.
> this is much harder to detect and prevent…
That is correct, SQLi and other injection attacks are based on a formal and very limited language syntax, but “AI” is based on natural language, so anything goes. On the other hand I’m surprised about the interpretation of commands in the data/information parsing part of the AI. At least there should be a separation between the command channel and the data channel?
If you want to play with prompt injection attacks, you can have a go at https://gandalf.lakera.ai/ The first levels are easy, but it becomes harder very fast a level 4
Thanks for the link ! It’s very relevant to the topic.
You are correct – I’ve made it to 7 but not sure I’ll pass this one! I’ll try it tomorrow 👍🏻
Did you pass the gandalf the white level?
Seems the listening device could have a fine grained understanding of location.
Another good reason to run your training and inference at home on your own hardware. And right now also happens to be a great time to buy used servers, workstations, and GPUs, as a bunch of crypto miners just went belly up and are selling off their gear.
Honestly the prices for premium subscriptions to these hosted services are laughable after checking out the prices for used hardware on eBay. You don’t need anywhere close to an A100 to do the same things at home, you just need some programming experience and patience.
Please be kind and respectful to help make the comments section excellent. (Comment Policy)