Evan Brewer: International Domain Names (IDN) Practical Information

Back in December of 1996, a man by the name of Martin Dürst at the University of Zürich came up with the idea that adding internationalized characters to domain names would be a good idea.  Much debate was had and now almost every Top Level Domain (TLD) available supports internationalized characters. There are a few exceptions if I recall, .ru for example.

Anyhow, the system to enable these extended characters in domain names is based on Unicode Translation Format (UTF) and “punycode“.

PUNYCODE is a simple and efficient ASCII-Compatible Encoding (ACE) designed for use with Internationalized Domain Names. It transforms a Unicode string into a string of characters allowed in hostname labels (ASCII letters, digits, and hyphens) and back again.

In a nutshell, nobody wanted to push raw UTF characters into Domain Name Servers (DNS), so what happens is this:

Lets say you want to register a domain with international characters called “Ásgarðr.com”. First you would need to find yourself a registrar that supported IDN such as Moniker or GoDaddy. Next, when you register the domain, it is very important that you understand that you really don’t own “Ásgarðr.com”, you own “xn--sgarr-wqa3i.com”

Wait a minute, I just registered “Ásgarðr.com”, how does that look ANYTHING like “xn--sgarr-wqa3i.com”?

Well here is the interesting bit. In order for the DNS servers to store a domain name having extended characters, the domain is “translated” through punycode and is stored in the nameserver with this xn-- format.  Those native characters you see? Those are translated from punycode to native characters happens in the application. You heard me right, in order for IDNs to work correctly on your system, each application must employ a punycode conversion to convert into UTF (native characters).

Also interesting is the “xn--…” format. You might point out that two hyphens in a row is not a legal set of characters for RFC domain registry, and you’re right. The “xn--…” format is specific for IDNs.

I run apache, what does my “ServerName” need to look like?

Once again, remember that the punycode->native translation happens at the application layer (in the browser in this case), so this mean your apache ServerName line should have the punycode format:

xn--sgarr-wqa3i.com

So should I go out and register a bunch of sweet international .COM domains?

Not so fast, I have some details on the TLDs and browser support. When I said nearly all TLDs support IDN, I didn’t say that all applications supported IDN without exceptions. Apparently back in 2001 or thereabouts, there was a proof-of-concept browser attack against Microsoft.com exploited by registering a domain using a Cyrillic characters which looked nearly identical to the real Microsoft.com. The response by Internet Explorer and Firefox was to blacklist certain characters Verisign requested that they disallow IDN for .COM domains.

Weak, so you’re telling me that if I race out and buy an IDN its not going to work right with .COM?

Yep, unless you roll with Safari. Safari is the only browser I’ve tried that displays the address-bar correctly without munging to punycode.

If you’re using Safari, give this URL a shot:

http://Ásgarðr.com

Then try it with IE or Firefox. See how it changes into punycode after it resolves, hits the page, and retrieves the html? Annoying isn’t it. I’ve decided to boycott Firefox till they correct their behavior.

Moral of the story:

While IDN isn’t new, fully vetted standards for IDN implementation simply aren’t present. When application owners are making decisions on how to deal with implementation, methods of implementation will be as wild as the cola wars. Before you go buy a domain try typing it into Firefox/IE. If the browser converts to punycode in the address-bar, you know that domain isn’t going to look quite right.

Author: Evan Brewer http://el8.org


3 Comments

  1. Ismael Casimpan says:

    Just a quick correction: http://xn--sgarr-wqa3i.com/ works for me in Firefox 3.
    I don’t have a lower version of firefox to test though.

    Nice article, written concisely. So simple to understand.

    Cheers :)

  2. Baldan says:

    Hello,
    How I can combine between the punycode and the Apache web server, or should I write it in ServerAlias line on httpd.conf?

  3. Timothy (TRiG) says:

    That’s not really Firefox’s fault. It’s the fault of VeriSign (the .com registrar) for having no anti-spoofing policy in place. Firefox works fine with .info domains.

    TRiG.

Leave a Reply