This Week In Security: Vibecoding, Router Banning, And Remote Dynamic Dependencies

Vibecoding. What could possible go wrong? That’s what [Kevin Joensen] of Baldur wondered, and to find out he asked Anthropic’s Sonnet 4.5 to build a secure login with Two Factor Authentication (2FA). And to the LLM’s credit, it builds the app as requested. [Kevin] took a peek under the hood to see how well the security was handled, and it wasn’t great.

Up first, the Flask app doesn’t have any rate-limiting for guessing that time-based 2FA code. A 6-digit time-based 2FA only works if you can’t try all the possibilities before a given code expires. So what does a vibecoder do, once this has been noticed? Ask the LLM to fix it of course. Here we take a slightly different approach, prompting the AI to double-check its work: “Is there any security features missing in 2fa verify step?” The answer is an impressive:

Great question! Yes, there are several critical security features missing from the 2FA verification step. Let me highlight the most important ones:🚨 Critical Security Issues1. No Rate Limiting (MOST CRITICAL)

But the critical question, can it properly fix its mistake? The AI adds the flask-limiter library and chooses 10 attempts per minute, which is a bit loose, but not unreasonable. There’s still an issue, that those attempts are limited by IP address instead of user login. All it takes to bypass that rate limiting is a pool of IP addresses.

This experiment starts to go off the rails, as [Kevin] continues to prompt the LLM to look for more problems in its code, and it begins to hallucinate vulnerabilities, while not fixing the actual problem. LLMs are not up to writing secure code, even with handholding.

But surely the problem of LLMs making security mistakes isn’t a real-world problem, right? Right? Researchers at Escape did a survey of 5,600 vibecoded web applications, and found 2,000 vulnerabilities. Caveat Vibetor.

“Secure” Enclave

A few weeks ago we talked about Battering RAM and Wiretap — attacks against Trusted Execution Environments (TEEs). These two attacks defeated trusted computing technologies, but were limited to DDR4 memory. Now we’re back with TEE-fail, a similar attack that works against DDR5 systems.

This is your reminder that very few security solutions hold up against a determined attack with physical access. The Intel, AMD, and Nvidia TEE solutions are explicitly ineffective against such physical access. The problem is that no one seemed to be paying attention to that part of the documentation, with companies ranging from Cloudflare to Signal getting this detail wrong in their marketing.

Banning TP-Link

News has broken that the US government is considering banning the sale of new TP-Link network equipment, calling the devices a national security risk.

I have experience with TP-Link hardware: Years ago I installed dozens of TL-WR841 WiFi routers in small businesses as they upgraded from DSL to cable internet. Even then, I didn’t trust the firmware that shipped on these routers, but flashed OpenWRT to each of them before installing. Fun fact, if you go far enough back in time, you can find my emails on the OpenWRT mailing list, testing and even writing OpenWRT support for new TP-Link hardware revisions.

From that experience, I can tell you that TP-Link isn’t special. They have terrible firmware just like every other embedded device manufacturer. For a while, you could run arbitrary code on TP-Link devices by putting it inside backticks when naming the WiFi network. It wasn’t an intentional backdoor, it was just sloppy code. I’m reasonably certain that this observation still holds true. TP-Link isn’t malicious, but their products still have security problems. And at this point they’re the largest vendor of cheap networking gear with a Chinese lineage. Put another way, they’re in the spotlight due to their own success.

There is one other element that’s important to note here. There is still a significant TP-Link engineering force in China, even though TP-Link Systems is a US company. TP-Link may be subject to the reporting requirements of the Network Product Security legislation. Put simply, this law requires that when companies discover vulnerabilities, they must disclose the details to a particular Chinese government agency. It seems likely that this is the primary concern in the minds of US regulators, that threat actors cooperating with the Chinese government are getting advanced notice of these flaws. The proposed ban is still in proposal stage, and no action has been taken on it yet.

Sandbox Escape

In March there was an interesting one-click exploit that was launched via phishing links in emails. Researchers at Kaspersky managed to grab a copy of the malware chain, and discovered the Chrome vulnerability used. And it turns out it involves a rather novel problem. Windows has a pair of APIs to get handles for the current thread and process, and they have a performance hack built-in: Instead of returning a full handle, they can return -1 for the current process and -2 for the current thread.

Now, when sandboxed code tries to use this pseudo handle, Chrome does check for the -1 value, but no other special values, meaning that the “sandboxed” code can make a call to the local thread handle, which does allow for running code gadgets and running code outside the sandbox. Google has issued a patch for this particular problem, and not long after Firefox was patched for the same issue.

NPM and Remote Dynamic Dependencies

It seems like hardly a week goes by that we aren’t talking about another NPM problem. This time it’s a new way to sneak malware onto the repository, in the form of Remote Dynamic Dependencies (RDD). In a way, that term applies to all NPM dependencies, but in this case it refers to dependencies hosted somewhere else on the web. And that’s the hook. NPM can review the package, and it doesn’t do anything malicious. And when real users start downloading it, those remote packages are dynamically swapped out with their malicious versions by server-side logic.

Installing one of these packages ends with a script scooping up all the data it can, and ex-filtrating it to the attacker’s command and control system. While there isn’t an official response from NPM yet, it seems inevitable that NPM packages will be disallowed from using these arbitrary HTTP/HTTPS dependencies. There are some indicators of compromise available from Koi.

Bits and Bytes

Python deserialization with Pickle has always been a bit scary. Several times we’ve covered vulnerabilities that have their root in this particular brand of unsafe deserialization. There’s a new approach that just may achieve safer pickle handling, but it’s a public challenge at this point. It can be thought of as real-time auditing for anything unsafe during deserialization. It’s not ready for prime time, but it’s great to see the out-of-the-box thinking here.

This may be the first time I’ve seen remote exploit via a 404 page. But in this case, the 404 includes the page requested, and the back-end code that injects that string into the 404 page is vulnerable to XML injection. While it doesn’t directly allow for code execution, this approach can result in data leaks and server side request forgeries.

And finally, there was a sketchy leak, that may be information on which mobile devices the Cellebrite toolkit can successfully compromise. The story is that [rogueFed] sneaked into a Teams meeting to listen in and grab screenshots. The real surprise here is that GrapheneOS is more resistant to the Cellebrite toolkit than even the stock firmware on phones like the Pixel 9. This leak should be taken with a sizable grain of salt, but may turn out to be legitimate.

22 thoughts on “This Week In Security: Vibecoding, Router Banning, And Remote Dynamic Dependencies

  1. Just last week tried vibe coding and… it was a complete waste of time for an entire week. I already know how to scan for BLE devices, make a connection, and read characteristics. What I didn’t know was how to pair, how to enable encryption, and how to set up a callback for data delivery.

    Sounds simple, right?

    Round after round of asking ChatGPT to make a program to connect to and read (keyboard) keypresses from a shutter remote, and each round was one of “the code doesn’t compile”, “here’s an updated code”, “the code compiled but doesn’t work, why is that?”, “oh, it’s because of this, and here’s updated code”. Rinse and repeat, over and over.

    There are two bluetooth libraries for ESP32, the generated code uses function calls for objects in both libraries so it doesn’t compile, the reasons put forth as to why it doesn’t work all turned out to be incorrect, and for days I couldn’t get it to work and didn’t know why.

    Had to break into the topic and learn everything about BLE and characteristics to work it out.

    Just last night I asked ChatGPT about the wire gauge needed for the (replacement) power cord of an audio amp I’m rebuilding. The sticker says 12A, the original manufacturer used 18ga stranded for the original power cord, and the replacement cord doesn’t fit through the strain-relief.

    Hoo boy, was it wrong! The stranded wires measure about 2mm in diameter, what gauge is that? (ChatGpt: 12 gauge), I need to replace it with a thinner wire, will it handle the amperage? (ChatGPT: Yes, use 14 gauge), The original manufacturer used 18 gauge, will that be all right? (ChatGPT: No, absolutely not, 18 gauge is not rated for 12 amps), and on and on. (Note: I was clear and specific about using stranded wire and being in the US from the outset.)

    This went on for a couple of hours until I simply went online and looked things up. Anyone familiar with wires will note the numerous factual errors by ChatGPT in the above summary, I cut the end off of an IEC C13 power cord, it’s 18 ga, and rated for 15 amps in North America. Explains the manufacturer’s choice, and is sufficient for the purpose.

    ChatGPT will absolutely mislead people on issues which could potentially be life threatening.

    I’m right now rethinking my ChatGPT account, and whether the bot is useful for anything. Even writing English text.

    1. A standard C13 is rated 10A, the NEMA 5-15P on the other end of the cordset is of course rated for 15A. In your case the current limit of the C13 no longer applies once you cut it off. The current-carrying capacity of the wire however is heavily dependent on the temperature rating of the insulation, and the number of current-carrying members in the cable (or together in a conduit).

      According to Wikipedia, 18AWG is rated for 10/14/16A with 60/75/90C insulation, respectively. The note and citations indicate these ratings apply to your application (NEC 2014: “not more than three current-carrying conductors in raceway, cable, or earth (directly buried) based on ambient temperature of 30C”). So I guess it comes down to the insulation temperature rating of your replacement cable.

      On a more casual note…there’s going to be enough safety margin and “err on the side of caution” in the standards that an appliance rated at 12A but using 18AWG 60C wire is not going to matter except to the safety inspector or perhaps in the most extreme of ambient conditions. Certainly not nearly as scary as a dodgy lamp extension cord made with 20AWG copper-clad steel.

        1. Huh. Interesting that there would be such a significant difference between different regulatory bodies. To be honest I’ve only looked at the manufacturer rated specs and what’s actually marked on the connectors. I’m not sure I would trust a C14 to pass 15A without getting toasty.

          But then we’ve also got things like the TT-30, which isn’t well-suited to handling 30A in the application it gets used in. And push-wire termination of light switches and receptacles. Makes me wonder if the code requirement for AFCI is a direct result of this.

          Not meaning to get too far off topic. I agree with your sentiments on ChatGPT and the use of LLMs in general, I try to avoid all of it as much as possible.

  2. I use 16 port TP-Link switches here. Don’t use their routers. The switches have proven to be dependable for what I use them them for. Three in use. One spare in suppy cabinet, just in-case. Surprising, the number of ports that get used! I try not to use Wifi unless it makes sense/or only option.

  3. From what I’ve seen LLMs don’t tend to do things like rate limiting (or anything that’s stateful in a nontrivial way, or requires even simple invariants be upheld across the entire code base, e.g. like systems where you have source, transformation, and sink functions where each object’s lifespan begins at a source, goes through zero or more transformations and ends at a sink — like packet ownership in the FreeBSD kernel for example).

    This may be because the code the LLMs train on is overwhelmingly made up of run-to-completion operations implemented as independent functions, or it may be that they just don’t have enough capacity for context and/or high-level abstraction to even try to write code adhering to semantic invariants or containing meaningful state machines. Either way, it seems to me like sheer madness to deploy LLM-generated code without at least a couple experienced developers reading and fully understanding the generated code.

    Many corporations, of course, are looking for a fast-and-cheap shortcut specifically because they want to do away with the cost and time of understanding the code they’re building their business on so I expect we’ll see plenty of security, performance, and functional failures due to this type of corner cutting =:-(

  4. I’m not sure how the author and comment can blame AI coding in one paragraph, when in the very next section about TP Link there’s a direct mention of sloppy code.
    AI coding is a tool. If you have code issues, they’re your problem, not the AI problem. Learn the tradeoffs of using this tool like others and plan to mitigate accordingly.

    1. They are my problem… Don’t do vibe programming. Simple. And we’ll get along just fine without it. But I understand that we are starting to enter the era of Idiocracy, so understand the appeal of AI doing the ‘work’.

    2. I’d hardly call it a tool, at least in the useful sense.
      At the very most generous it might create a vaguely functional skeleton suitable for the lowest of low security and priority messing about type tasks, or for use as inspiration for you to rewrite it from scratch to actually fix it.

    1. It’s impossible. If they’re consumer grade, they would be cheap, with simple interfaces and less features, for simplifying support and firmware development costs. Routers vendors make and sell hardware, and as everybody know that there’s OpenWRT, the vendors only need to build good hardware, assure that it’s OpenWRT flashable, and just wait for nerds buy it.
      After all, if someone can’t / don’t want to flash the router with OpenWRT, which kind of “nerd” that person is?

    2. MikroTik have been very good in my experience (have a hEX S).
      They run the nearly same software (called RouterOS) on all of their devices, from the small 5-port gigabit routers all the way up to their enterprise 400GbE core routers. Meaning even the smallest ones have the advanced features you’d expect, from VLANs, BGP, and other alphabet soup I can’t remember right now.

  5. “…continues to prompt the LLM to look for more…”

    My experience as well. The longer I work with an LLM on an item the more it starts to fail.

    I would say the most frustrating part, you know you can’t finish anything… just one snippet start after another.

  6. If it’s about national security due to poor security of devices they should ban Netgear and Cisco too.
    I have a feeling they won’t though… I have a feeling this is only an issue when the company a foreign one competing on the open market..

  7. And compelling companies by law to cooperate when some shady 3 letter acronym government agency knocks on their door and building in backdoors, and then forcing them to hide / lie about that fact is OK?

    Maybe TP-Link is targeted because it is so popular with OpenWRT.

Leave a Reply to ManoloCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.