Punycodes Explained

When you’re restricted to ASCII, how can you represent more complex things like emojis or non-Latin characters? One answer is Punycode, which is a way to represent Unicode characters in ASCII. However, while you could technically encode the raw bits of Unicode into characters, like Base64, there’s a snag. The Domain Name System (DNS) generally requires that hostnames are case-insensitive, so whether you type in HACKADAY.com, HackADay.com, or just hackaday.com, it all goes to the same place.

[A. Costello] at the University of California, Berkley proposed the idea of Punycode in RFC 3492 in March 2003. It outlines a simple algorithm where all regular ASCII characters are pulled out and stuck on one side with a separator in between, in this case, a hyphen. Then the Unicode characters are encoded and stuck on the end of the string.

First, the numeric codepoint and position in the string are multiplied together. Then the number is encoded as a Base-36 (a-z and 0-9) variable-length integer. For example, a greeting and the Greek for thanks, “Hey, ευχαριστώ” becomes “Hey, -mxahn5algcq2″. Similarly, the beautiful city of München becomes mnchen-3ya. Continue reading “Punycodes Explained”

You Think You Can’t Be Phished?

Well, think again. At least if you are using Chrome or Firefox. Don’t believe us? Well, check out Apple new website then, at https://www.apple.com . Notice anything? If you are not using an affected browser you are just seeing a strange URL after opening the webpage, otherwise it’s pretty legit. This is a page to demonstrate a type of Unicode vulnerability in how the browser interprets and show the URL to the user. Notice the valid HTTPS. Of course the domain is not from Apple, it is actually the domain: “https://www.xn--80ak6aa92e.com/“. If you open the page, you can see the actual URL by right-clicking and select view-source.

So what’s going on? This type of phishing attack, known as IDN homograph attacks, relies on the fact that the browser, in this case Chrome or Firefox, interprets the “xn--” prefix in a URL as an ASCII compatible encoding prefix. It is called Punycode and it’s a way to represent Unicode using only the ASCII characters used in Internet host names. Imagine a sort of Base64 for domains. This allows for domains with international characters to be registered, for example, the domain “xn--s7y.co” is equivalent to “短.co”, as [Xudong Zheng] explains in his blog.

Different alphabets have different glyphs that work in this kinds of attacks. Take the Cyrillic alphabet, it contains 11 lowercase glyphs that are identical or nearly identical to Latin counterparts. These class of attacks, where an attacker replaces one letter for its counterpart is widely known and are usually mitigated by the browser:

Continue reading “You Think You Can’t Be Phished?”