Internationalized Domains – A reality?

An Internationalized Domain Name (IDN) is a domain name which contains one or more non-ASCII characters. As an increasing portion of the Internet users worldwide are speakers of languages that do not use the Latin alphabet, the introduction of IDNs has provided a way for these users to navigate the Internet in their own language.

Internationalizing Domain Names in Applications (IDNA) is a mechanism for handling IDNs. While DNS can technically support non-ASCII characters, applications such as email and web browsers restrict domain names to what can be used as a hostname. Rather than redesigning the existing DNS infrastructure, it was decided that non-ASCII domain names should be converted to a suitable ASCII-based form by web browsers and other user applications; IDNA specifies how this conversion is to be done. The conversion of the non-ASCII names are performed by algorithms designed for the purpose – ToASCII and ToUnicode. For example: пустыня (Russian word for “desert”) is equivalent to xn--m1adged4c3a in ASCII. These algorithms are not completely failsafe which add to the long list of imminent difficulties associated with the implementation of IDNs.

The main complication lies with the displaying of the special characters of an IDN. To highlight the idea, Kim Davies on CircleID, illustrates eleven top-level domains. The version on the left (text) is how it should be rendered by a browser, and the version on the right (image) is how it actually looks.

إختبار

آزمایشی

测试

測試

испытание

परीक्षा

δοκιμή

테스트

טעסט

テスト

பரிட்சை

If you find some of the versions don’t match, you would be in the majority of Internet users. The fact is most people cannot see these labels properly and consistently.This is mainly due to the fact that most internet users are not equipped with all the fonts required to present the non-Latin characters properly. When the correct font cannot be found, it will usually display something like the following:

Also, there are many ways of representing a word in scripts such as Devanagiri and Arabic. The implementation may vary across different IDNAs and cause inconsistency in display. For instance, for the following word in Devanagiri, both representations hold true; however, the idea of consistency crops up across various implementations:

Also, the textual orientation in scripts like Arabic (right-to-left readable) further pose difficulties in their conversion into uniform ASCII or UNICODE. However, most of the issues are solved when the respective foreign font support is installed, which is usually shipped with the operating systems.

The most exploitative threat that IDNs face is the homograph attack. The IDN homograph attack is a means by which a malicious party may seek to deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters may have nearly (or wholly) indistinguishable glyphs. On February 7, 2005, Slashdot reported that this exploit was disclosed at the hacker conference Shmoocon with an example available at http://www.shmoo.com/idn/. On browsers supporting IDNA, the URL “http://www.pаypal.com/” (where the first a is replaced by a Cyrillic а) appears to lead to paypal.com but instead led to a spoofed PayPal web site that said “Meeow.” Read more on the homograph attack at Wikipedia.

Over the years, internet has been adopted by the humankind from all walks of life and all over the world. This definitely means that global multilinguism invariably helps in building a stronger world wide web and the content is only going to get concrete and compelling. IDNs are a means to that end and provides a platform for better navigability of native content. As research and improvements continue to take place in this regard, the dream of global IDNs is not really far from a reality.

source: http://blog.dntemple.com


Leave a Reply