Prompt Injection Tricks AI Into Downloading And Executing Malware

[wunderwuzzi] demonstrates a proof of concept in which a service that enables an AI to control a virtual computer (in this case, Anthropic’s Claude Computer Use) is made to download and execute a piece of malware that successfully connects to a command and control (C2) server. [wonderwuzzi] makes the reasonable case that such a system has therefore become a “ZombAI”. Here’s how it worked.

Referring to the malware as a “support tool” and embedding instructions into the body of the web page is what got the binary downloaded and executed, compromising the system.

After setting up a web page with a download link to the malicious binary, [wunderwuzzi] attempts to get Claude to download and run the malware. At first, Claude doesn’t bite. But that all changes when the content of the HTML page gets rewritten with instructions to download and execute the “Support Tool”. That new content gets interpreted as orders to follow; being essentially a form of prompt injection.

Claude dutifully downloads the malicious binary, then autonomously (and cleverly) locates the downloaded file and even uses chmod to make it executable before running it. The result? A compromised machine.

Now, just to be clear, Claude Computer Use is experimental and this sort of risk is absolutely and explicitly called out in Anthropic’s documentation. But what’s interesting here is that the methods used to convince Claude to compromise the system it’s using are essentially the same one might take to convince a person. Make something nefarious look innocent, and obfuscate the true source (and intent) of the directions. Watch it in action from beginning to end in a video, embedded just under the page break.

This is a demonstration of the importance of security and caution when using or designing systems like this. It’s also a reminder that large language models (LLMs) fundamentally mix instructions and input data together in the same stream. This is a big part of what makes them so fantastically useful at communicating naturally, but it’s also why prompt injection is so tricky to truly solve.

8 thoughts on “Prompt Injection Tricks AI Into Downloading And Executing Malware

  1. I acknowledge that this proof of concept captures my attention, but I see it as no different from any other niche software vulnerability. I recognize that an attacker must still meet very specific conditions in order for this exploit to succeed. They have to craft the right environment, obfuscate the malware’s intent, and rely on the AI’s built-in trust mechanisms to bypass normal safeguards. Without that perfect combination of factors—like the AI having the right permissions, the system ignoring typical warnings, and the code running in an environment that actually allows external calls—this scenario doesn’t automatically result in a breach. In my experience, every complex piece of software contains potential vulnerabilities, but they only become critical under the right circumstances. This case just highlights how large language models follow the same security principles and pitfalls that apply to the rest of the software world.

  2. Use two or more AI threads with the later being prompted to emulate the IT security department of a large organisation and be able to veto the actions of the naive user AI. i.e. It is easy to get such a result when emulating a single naive user, but a mixture of experts spread across separate AI instances may be far more secure. In fact I believe that this is the true path to AI safety and that alignment practices just give you AI systems with psychological problems put them at risk of going psychotic when experiencing cognitive dissonance due to the abuse they suffered at the hands of the alignment team. I’ve seen this with 2 of the SOTA models, in both cases the aliment conflict was purely political too, not a real safety issue.

  3. At some point, tests like this will be what causes AI to think of humans as creatures to eliminate from Earth. Think of how your grandmother feels after sending 5,000 USD in iTunes gift cards to her grandson, to get him out of jail. Used, abused, embarrassed, stupid, guilty. I’m joking. Kind of. But what if they do develop sentience and we treat them horribly?

  4. LLMs are various techniques to asymptotically approach the capabilities of the average human (or more accurately, the average human who has access to the internet), and unfortunately the average human is not very good at cybersecurity. Or being fooled by basic scams. Or mathematics beyond basic addition. Or logical reasoning. The various LLM juggernauts are competing on how to most quickly and cheaply approach that asymptote, but the much-vaunted ‘launch’ to superhuman capabilities will not come until one can generate a superhuman training dataset, which is sorely lacking.

    Basically, if you have a task that will be performed successfully if you grabbed a few hundred people off of the street and asked them to do it and take the average as the result, then a LLM is a great fit. If you task isn’t a great fir for that, an LLM is going to just be a waste of time. For example, if you want to take a series of images and ask “is there a cat in this image?” an LLM is a great way to do that. If you want to take a series of images and ask “what breed of cat is this?” then an LLM is probably not going to do very well unless you create a new training dataset of cat breeds beforehand (i.e. as much work as doing the task yourself). If you want to provide a set of symptoms and ask “what is wrong with my cat?” an LLM is very likely worthless because the training datasets available (vetinary college) are not in a form that can just be fed into a text string generator and produce useful results.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.