[wunderwuzzi] demonstrates a proof of concept in which a service that enables an AI to control a virtual computer (in this case, Anthropic’s Claude Computer Use) is made to download and execute a piece of malware that successfully connects to a command and control (C2) server. [wonderwuzzi] makes the reasonable case that such a system has therefore become a “ZombAI”. Here’s how it worked.
After setting up a web page with a download link to the malicious binary, [wunderwuzzi] attempts to get Claude to download and run the malware. At first, Claude doesn’t bite. But that all changes when the content of the HTML page gets rewritten with instructions to download and execute the “Support Tool”. That new content gets interpreted as orders to follow; being essentially a form of prompt injection.
Claude dutifully downloads the malicious binary, then autonomously (and cleverly) locates the downloaded file and even uses chmod
to make it executable before running it. The result? A compromised machine.
Now, just to be clear, Claude Computer Use is experimental and this sort of risk is absolutely and explicitly called out in Anthropic’s documentation. But what’s interesting here is that the methods used to convince Claude to compromise the system it’s using are essentially the same one might take to convince a person. Make something nefarious look innocent, and obfuscate the true source (and intent) of the directions. Watch it in action from beginning to end in a video, embedded just under the page break.
Continue reading “Prompt Injection Tricks AI Into Downloading And Executing Malware”