This Week In Security: Unicode Strikes, NPM Again, And First Steps To PS5 Crack

Maybe we really were better off with ASCII. Back in my day, we had space for 256 characters, didn’t even use 128 of them, and we took what we got. Unicode opened up computers to the languages of the world, but also opened an invisible backdoor. This is a similar technique to last week’s Trojan Source story. While Trojan Source used right-to-left encoding to manipulate benign-looking code, this hack from Certitude uses Unicode characters that appear to be whitespace, but are recognized as valid variable names.

const { timeout,ㅤ} = req.query;
Is actually:
const { timeout,\u3164} = req.query;

The extra comma might give you a clue that something is up, but unless you’re very familiar with a language, you might dismiss it as a syntax quirk and move on. Using the same trick again allows the hidden malicious code to be included on a list of commands to run, making a hard-to-spot backdoor.

The second trick is to use “confusable” characters like ǃ, U+01C3. It looks like a normal exclamation mark, so you wouldn’t bat an eye at if(environmentǃ=ENV_PROD){, but in this case, environmentǃ is a new variable. Anything in this development-only block of code is actually always enabled — imagine the chaos that could cause.

Neither of these are ground-breaking vulnerabilities, but they are definitely techniques to be wary of. The authors suggest that a project could mitigate these Unicode techniques by simply restricting their source code to containing only ASCII characters. It’s not a good solution, but it’s a solution.

More REvil Arrests

Apparently making yourself an enemy of the whole Western world is a good way to get arrested, as REvil members are continuing to learn. Operation GoldDust has netted seven arrests this year, the most recent in Romania. This is the same law enforcement effort that has resulted in the No More Ransom project.

Breaking the PS5

We haven’t heard anything from Fail0verflow for a while, but they’re back with new work targeting the PS5. They’ve found the root encryption keys for the system. This isn’t quite as big a deal as it originally seemed, as the signing key would still be needed to run custom software on the device. What this should allow is decrypting the device firmware, and then looking for bugs in the bootloader and firmware, potentially leading to a PS5 jailbreak in the future. If you’ve been hoping for a homebrew scene for the PS5, your time may be coming.

NPM Again

Last week, the coa and rc packages temporarily updated to versions containing malicious code. The timing, and nearly identical added code, indicates that it was the same individual or group behind both packages. While the malware seemed to be non-functional on some systems, it should be assumed that anywhere these malicious versions were deployed is compromised. At a combined 20 million weekly downloads for these two packages, there are sure to be many compromises, even given the short time the malicious packages were available on the 4th. NPM was hosting the malicious version of coa for one hour and twelve minutes. The rc package pushed the malicious update a couple hours later, and it’s unclear how long that version was available.

The malicious code was run using a preinstall script, which seems to be the common vector for these hacks. There have been suggestions that install scripts should be disabled by default. While that would prevent these very simple attacks, it wouldn’t actually protect against the underlying problem. Supply chain attacks are a growing problem, but they seem to be particularly problematic in the world of full-stack JavaScript. If the popularity of node.js and npm are to continue, we will need a better solution to this pernicious problem.

Palo Alto and Disclosure

Researchers at Randori have discovered a pair of vulnerabilities in Palo Alto firewalls, which chained together can result in full device compromise with no prior authorization required. The attacks are an HTTP-request-smuggling vulnerability that leads to a buffer overflow. The overflow is normally not exploitable, but the request-smuggling allows an attacker to reach the vulnerable code. The flaws were fixed in version 8.1.17, and versions 9.0+ were never vulnerable. An in-depth analysis is due in December, but there’s another interesting angle to this story. Randori’s researchers found the bugs in November 2020, and didn’t disclose them until September 2021 — nearly a year later.

What did they do during that time? Apparently they used this and other 0-day vulnerabilities to perform red-team penetration tests for their clients. The motivation seems to be that a real attack is likely to use 0-days, and to really test a company’s defense-in-depth, unknown attacks have to be part of the equation. What do you think? Good idea or unethical?

33 thoughts on “This Week In Security: Unicode Strikes, NPM Again, And First Steps To PS5 Crack

  1. Umm … “Apparently making yourself an enemy of the whole Western world is a good way to get arrested”

    Romania is a member of NATO/EU (West), South Korea is US ally, Ukraine and Kuwait too. What am I missing here?

    If this was supposed to be a jab at Russia/China/N. Korea/Iran, it didn’t quite pan out. Cybercriminals are everywhere.

    1. You missed the point and then lashed out at something that wasn’t there?
      Hint: international criminals are targeted after attacking “western” infrastructure, the latest arrest was in Romania.
      Hint 2: that means Romanian authorities helped.

    2. Not a jab at those three. I’ve even commented in the past, how odd it is that “Russian” criminals keep getting arrested in Ukraine. REvil hit targets in the West, and so put a big target on their own heads.

  2. Python, npm etc. have *TERRIBLE* packaging- and dependency-management capabilities. For example for Python, the documentation says “Comparison of project names is case insensitive and treats arbitrarily-long runs of underscores, hyphens, and/or periods as equal”, ie. e.g. “Cool-Stuff”, “cool.stuff”, “COOL_STUFF” and “CoOl__-.-__sTuFF” are *ALL EQUAL*, they are all treated as the same package! That’s just absolutely effing mind-bogglingly stupid!

    Anyone with even a tiny bit of security-consciousness should understand that there should be absolutely *zero* ambiguity when it comes package-naming. I haven’t read what the npm-issue is about, but my hunch says it’s related to naming/lookup of packages and someone managed to slip a package pretending to be another package in there.

    1. “Comparison of project names is case insensitive and treats arbitrarily-long runs of underscores, hyphens, and/or periods as equal”

      That’s actually a pretty helpful mitigation against what you’re thinking about – similarly named packages being uploaded that do nefarious things, which is known as typo-squatting. Doesn’t stop people from doing stuff like uploading a ‘request’ library (thinking of the prolifically used ‘requests’ package in python) but does reduce the space of potential typo squatting names a little.

      But no, this post is referring to when real packages get compromised and code slipped in that runs _during install_, highlighting a different attack vector which targets developers’ machines, rather than sneaking code into libraries, due to the fact the package systems allow the packages to run arbitrary code during install – which has potentially more of an impact, as compromised development machines are more likely to be on an organisations internal network, have lots of other code bases on them, have credentials stored in plain text etc….

      Surprised that pypi and npm don’t allow developers to sign their own packages, in the same way that lots of Linux package managers do (dnf/yum/rpm, apt). This post from last year shows the immature state most package managers for programming languages are in, when it comes to package signing!:

      https://blog.tidelift.com/the-state-of-package-signing-across-package-managers

    1. For programming I can honestly agree that ASCII only for valid code is a decent idea. Making comments within the code however can be nice if one can write with all of Unicode. A similar story for entering strings.

      But partly it is Unicode that is the problem, having so many characters that look identical is somewhat of a bad idea…. Be it spoofing web addresses, or a slew of other issues.

      Though, as already stated, I personally wouldn’t mind if the compiler error out if a non ASCII character were used within the actual code itself. With the exceptions to comments and strings. (The ends of these are fairly obvious in any decent programming environment.)

      1. Why should programmers be forced to use Latin for their variable names etc? “It’s too hard” isn’t a good reason.

        In any case, Unicode is a disaster and needs to be replaced. It doesn’t work well for East Asian languages and it has numerous problems like this. Even simply stuff like counting the number of characters in a string is difficult.

        Unicode should be replaced by something that firstly uses better encoding to avoid all the Chinese/Japanese/Korean and similar issues, and secondly comes with reference implementations of common functions and libraries of code that handle it while avoiding these kinds of issues.

        The second part is really important. It was expecting too much for programmers to handle Unicode by themselves, not least because it requires quite a lot of knowledge about how other languages work.

      1. That isn’t great for programmers who speak languages other than English though. ASCII doesn’t even cover many European languages. Despite being the American Standard Code for Information Interchange, it doesn’t even cover all the languages used in America.

        A better solution in cases like this is to detect known problematic Unicode characters, and to use a font that replaces invisible ones with a box so they aren’t invisible.

        1. And what about all the identical in look but not in Unicode duplication.

          Restricting to ASCII (or any other more restricted zero duplication set) makes sense – you can always comment in your native language freely – which you should be doing anyway as you don’t want a variable that is a paragraph long…

    2. agreed, I thought “by simply restricting their source code to containing only ASCII characters” was what everyone did – I certainly have it turned on in my editor.
      And I’d never pass a arbitrary unicode string to anything without cleaning it..

      Then again, I do everything in english, so who needs unicode anyway…

      1. If making a more multi lingual application/program then it can be useful to have unicode when entering in the contents of a string, or when making the occasional comment.

        Though, personally I tend to write my code in English as well, despite speaking Swedish normally. Though, å ä ö is part of 8 bit ASCII but one might want to support more than just one’s own native language and it is here that unicode becomes useful.

        Now, an actual application/program shouldn’t hardcode in the text regardless, but rather have an external language file with the requisite content needed. (Though, a lot of programmers quickly underestimates the challenge of properly supporting a variety of languages.)

  3. Unicode didn’t do this. Javascript did it by being so brain dead that it can pull in code from somewhere and execute as if it were in the source.

    Javascript is the COBOL of the future. The next generation of programmers will wonder how today’s programmers managed to do anything give all of javascripts flaws. By then it will be in so many things that you can’t just kill it anymore and we (programmers) will be supporting this generation’s extensive use of the world’s worst programming language for decades to come.

    1. So then rather than complain, offer a solution! What would be your answer to the ‘perfect’ language for web development? I certainly don’t know. I was really keen on Java Applets when that technology came out. I really liked the idea of developing real apps that you could embed in a browser… Seemed like the perfect answer to deploying certain types of applications. Then security issues popped up that killed off that avenue. Now we have JavaScript. Good or bad, it is what we have to work with at this time.

      I don’t care for unicode code either, just caused more headaches, but that is just me. Stick to just plain ascii set and we are all happy programmers :) .

    2. I think every language has some way to do a system() call. It’s always been dangerous, but there will always be the need to spin up child processes. This isn’t JavaScript being dumb.

    3. I’m no expert on COBOL but I understood it to be rather solid really, just nobody uses it anymore as all the C likes and python have taken over everything!

      What failings does it really have beyond being something nobody knows how to use anymore?

      1. Cobol defaults to fixed point math, so they don’t have funny issues associated with floating point representation.
        There are certain values that cannot be represented exactly in floating point binary representation. The last thing you want is floating point truncations when you are calculating other people’s money.

  4. IMHO, compilers should reject non-ASCII characters in code while permitting them in data.
    e.g. foo(“blah Unicode blah”); should be fine.
    while foofoo(“blah blah blah”) should be rejected by the compiler.

    This doesn’t provide complete safety, because it would not protect against things like
    system(“”); or exec(“”);, but it would at least provide some protection
    against the examples given.

    It might also be good for compilers to have a warning mode for unicode characters in string literals.

    Perhaps we could develop something like good old “lint(1)” that scans source files for unicode in
    literals and could even be somewhat smart about how dangerous they are, reporting as such.

    1. I would imagine it’s possible that allowing Unicode in comments or data might still allow for invisible hacks, assuming you can do things like change the print direction and have it overwrite the code on the same line. Obviously, this would only happen when displayed with certain renderers, but that might be enough.

    1. You know the story about the infinite number of monkeys? You just give an infinite number of monkeys an infinite number of keyboards, and they’ll eventually type out all the code you’ll ever need. The trick is knowing when they’ve done so, because all their code is unknown. But nothing keeps you from trying to compile and use it. You might think I’m making an analogy here.

    1. Do you know more about the identity / nationality / employers of these people than we do?

      We’ve reported that they were arrested in the Ukraine, Romania, and recently S. Korea and the Phillipines. I don’t believe that it will end up being a small number of people by the time this all gets mopped up.

      Russian citizens have also been charged, but they are not being extradited, and Russia was not among the 17 countries that helped Interpol crack the ring. One could ask onesself why.

      1. Both observations are probably valid. As [charles] points out, it’s not accurate to blindly assume that every ransomware criminal is Russian, like certain politicians seem to do. But it also seems to be the case that Russia isn’t too interested in prosecuting computer criminals that attack businesses in other countries.

  5. start /B node compile.js & node compile.js

    Hmmm… what’s this do? Also, who keeps a filesystem object called “B” in their root directory? And why are we parallel-executing the start command and the NodeJS call?

    RC=0 stuartl@rikishi ~ $ man start
    No manual entry for start

    ???

  6. Unicode is deeply flawed in so many ways. It is the obvious developer-y solution to the hot mess that is code pages: Throw it all in one giant heap that’s supposed to support everything, and call it good. Turns out, no, that doesn’t work, and oh dear, endless fun with the various kinds of overlap in the encodings. Just ASCII isn’t enough (even though it was designed to do accents on paper and so latin-1 should not be needed; the computer boys broke that with the transition to glass), but Unicode is the other extreme of massive software bloat with free vulnerabilities thrown in, and still not everyone is happy. But since “everyone” agrees that it beats code pages, we’ll just have to push and pull and work harder, eh?

    I think Unicode, (either this one or a theoretical “clean” version), does not beat a (theoretical) clean unambiguous way to switch code pages and a limited bunch of well-curated purpose-tied code pages, but creating both requires rather more domain-specific knowledge than even most code page designers can muster. So we’ll continue the current hot mess and massive bloat that is supporting Unicode in software. But hey, at least we can put smiley piccies wherever we can put Unicode text!

Leave a Reply to Jonathan BennettCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.