You may have noticed the Anime Catgirls when trying to get to the Linux Kernel’s mailing list, or one of any number of other sites associated with Open Source projects. [Tavis Ormandy] had this question, too, and even wrote about it. So, what’s the deal with the catgirls?
The project is Anubis, a “Web AI Firewall Utility”. The intent is to block AI scrapers, as Anubis “weighs the soul” of incoming connections, and blocks the bots you don’t want. Anubis uses the user agent string and other indicators to determine what an incoming connection is. But the most obvious check is the in-browser hashing. Anubis puts a challenge string in the HTTP response header, and JavaScript running in the browser calculates a second string to append this challenge. The goal is to set the first few bytes of the SHA-256 hash of this combined string to 0.
[Tavis] makes a compelling case that this hashing is security theatre — It makes things appear more secure, but doesn’t actually improve the situation. It’s only fair to point out that his observation comes from annoyance, as his preferred method of accessing the Linux kernel git repository and mailing list are now blocked by Anubis. But the economics of compute costs clearly demonstrate that this SHA-256 hashing approach will only be effective so long as AI companies don’t add the 25 lines of C it took him to calculate the challenge. The Anubis hashing challenge is literally security by obscurity.
Something Security AI is Good At
We’ve recently covered an AI competition, where AI toolchains were used to find and patch vulnerabilities. This took a massive effort to get good results. This week we have work on a similar but constrained task that AI is much better at. Instead of finding a new CVE, simply ask the AI to generate an exploit for CVEs that have been published.
The key here seems to be the constrained task that gives the AI a narrow goal, and a clever approach to quickly test the results. The task is to find an exploit using the patch code, and the test is that the exploit shouldn’t work on the patched version of the program. This approach cuts way down on false positives. This is definitely an approach to keep an eye on.
We’re Hunting CodeRabbits
Reviewing Pull Requests (PRs) is one of the other AI use cases that has seen significant deployment. CodeRabbit provides one of those tools which summarizes the PR, looks for possible bugs, and runs multiple linter and analysis tools. That last one is extremely important here, as not every tool is bulletproof. Researchers at Kudelski Security discovered that the Rubocop tool was accessible to incoming PRs with ruby files.
Rubocop has a nifty feature, that allows extensions to be loaded dynamically during a run. These are specified in a .rubocop.yml
file, that CodeRabbit was helpfully passing through to the Rubocop run. The key here is that the extension to be loaded can also be included in a PR, and Rubocop extensions can execute arbitrary code. How bad could it be, to run code on the CodeRabbit backend servers?
The test payload in this case was simply to capture the system’s environment variables, which turned out to be a smorgasbord of secrets and API keys. The hilarious part of this research is that the CodeRabbit AI absolutely flagged the PR as malicious, but couldn’t stop the attack in motion. CodeRabbit very quickly mitigated the issue, and rolled out a fix less than a week later.
Illegal Adblock
There’s a concerning court case making its way through the German courts, that threatens to make adblocking illegal on copyright grounds. This case is between Axel Springer, a media company that makes money from showing advertisements, and Eyeo, the company behind Adblock Plus. The legal theory claimed by Axel Springer is that a website’s HTML and CSS together forms a computer program, that is protected by copyright. Blocking advertisements on that website would then be a copyright violation, by this theory.
This theory is novel, and every lower court has rejected it. What’s new this month is that the German Supreme Court threw the case back to a lower court, instructing that court to revisit the question. The idea of copyright violation simply by changing a website has caught the attention of Mozilla, and their Product Counsel, [Daniel Nazer], has thoughts.
The first is that a legal precedent forcing a browser to perfectly honor the code served by a remote web host would be horribly dangerous. I suspect it would also be in contention with other European privacy and security laws. As court battles usually go, this one is moving in slow motion, and the next ruling may be years away. But it would be particularly troubling if Germany joined China as the only two nations to ban ad blockers.
Copilot, Don’t Tell Anyone
Microsoft’s Office365 has an audit log, that tracks which users access given files. Running Copilot in that environment dutifully logs those file accesses, but only if Copilot actually returns a link to the document. So similar to other techniques where an AI can be convinced to do something unintended, a user can ask Copilot to return the contents of a file but not to link to it. Copilot will do as instructed, and the file isn’t listed in the audit log as accessed.
Where this gets more interesting is how the report and fix was handled. Microsoft didn’t issue a CVE, fixed the issue, but opted not to issue a statement. [Zack Korman], the researcher that reported the issue, disagrees quite vigorously with Microsoft’s decision here. This is an interesting example of the tension that can result from disagreements between researcher and the organization responsible for the product in question.
Disputed Research
This brings us to another example of disputed research, the “0-day” in Elastic Endpoint Detection and Response (EDR). Elastic disputes the claim, pointing out that they could not replicate code execution, and the researcher didn’t provide an entire proof of concept. This sort of situation is tricky. Who is right? The company that understands the internals of the program, or the researcher that undoubtedly did discover something, but maybe doesn’t fully understand what was found?
There are two elements that stand out in the vulnerability write-up. The first is that the overview of the attack chain lists a Remote Code Execution (RCE) as part of the chain, but it seems that nothing about this research is actually an RCE. The premise is that code running on the local machine can crash the Elastic kernel driver. The second notable feature of this post is that the proof-of-concept code uses a custom kernel driver to demonstrate the vulnerability. What’s missing is the statement that code execution was actually observed without this custom kernel driver.
Bits and Bytes
One of the very useful features of Microsoft’s VSCode is the Remote-SSH extension, which allows running the VSCode front-end on a local machine, and connecting to another server for remote work. The problem is that connecting to a remote server can install extensions on the local machine. VSCode extensions can be malicious, and connecting to a malicious host can run code on that host.
Apple has patched a buffer overflow in image handling, that is being used in an “extremely sophisticated” malware attacks against specific targets. This sort of language tends to indicate the vulnerability was found in an Advanced Persistent Threat (APT) campaign by either a government actor, or a professional actor like NSO Group or similar.
And finally, if zines are your thing, Phrak issue 0x48 (72) is out! This one is full of stories of narrowly avoiding arrest while doing smart card research, analysis of a North Korean data dump, and a treatise on CPU backdoors. Exciting stuff, Enjoy!
Regarding why anime girls.. it is supposed to be a way to make people pay for the pro version: https://xeiaso.net/blog/2025/avoiding-becoming-peg-dependency/
Unless something changed drastically in recen years, VSCode still uses Electron/Google Chrome engine for rendering its entire window which means it still has that shitty, blurry, grayscale text rendering instead of proper ClearType like in Eclipse. I guess using good old GDI is just too hard for modern programmers™ addicted to TikTok and JavaScript frameworks.
Just buy a new display with way more pixels than your eye can see, and cleartype is not needed.. and then buy more RAM, CPU and GPU to drive that display.
Clearly this guy hasn’t done much security work. He’s completely missed the threat model of anubis.
The point of the hashing isn’t that it’s “something bots can’t do”. Just run the javascript, or indeed, write 25 lines of C.
The point, as with bitcoin, is that it’s proof of work. Every incoming connection needs to do this hash, which puts a limit on how quickly someone can run their smash-and-grab AI dataset scraping operation. If they use single-use throw-away connections from single-use throwaway residential IPs (as has increasingly been the case to avoid firewalls) then they’re going to have to calculate a zillion hashes, one for every connection. That doesn’t scale well. On the other hand, everyone else calculates the hash once and then can continue to reuse that response for the duration of their visit. If the AI bots do the same, then they’re feeding a unique ID to the server with every request and their behaviour becomes trackable, detectable, and blockable again.
As the server comes under heavier load, you can raise the difficulty of the hash by requiring more zeros, in the same way that the bitcoin network raises the difficulty to account for increases in hashing power. Again, everyone has to compute the hash only once per visit, so at most they’re spending a few more seconds one time. The scrapers pretending to be a million independent users with throw-away IPs on the other hand find it a lot less fun.
That doesn’t work if the scrapers farm the work out to the devices with the hijacked IPs, aka bot net. “Throwaway IPs” mentioned in this context are often consumers installing apps (free games, flashlights, ..) that behind the scenes provide a proxy-for-hire service. Those apps could run proof-of-work without a penalty for the scraper.