Registry for .ORG domain names seeks submissions from Internet leaders

Reston, VA - December 19, 2008 - .ORG, The Public Interest Registry, the premier domain where people turn to find credible information, get involved, fund causes and support advocacy, announces it is seeking Internet leaders to fill four (4) open seats on the .ORG Advisory Council.

Composed of leaders from a broad spectrum of the noncommercial organizations around the world, the .ORG Advisory Council provides a valuable resource for PIR management and its Board of Directors. The Council was created to advise on issues ranging from public policy to the introduction of new services.

4 of the 15 Advisory Council seats are to be filled by April 2009, each for a three year term. Public Interest Registry is seeking individuals with significant Internet leadership experience within the noncommercial, nongovernmental organization (NGO), and domain name arenas who represent the broad and geographically diverse spectrum of the global internet community. The Advisory Council is divided into working groups who contribute in four areas: Internationalized Domain Names (IDN), Policy, Domain Name Security Extensions (DNSSEC), and Outreach & Awareness.

“Each term, we look forward to nominations from industry leaders who would take a stake in our direction and growth over the next three years,” says Alexa Raad, CEO of Public Interest Registry. “With their expertise these individuals and the Advisory Council are invaluable to our organization.”

Interested individuals are encouraged to submit nominations, including self-nominations. A nomination statement of approximately 400 words should include details of the nominee’s experience with the Internet, commitment to promoting the noncommercial use of the internet, understanding of the technical or policy issues facing the .ORG registry, and perspective regarding the needs of the .ORG community. Within the statement, the nominee must also detail which of the four working groups the individual desires to participate in and contribute to. A current biography and CV are also requested.

New council members will be announced by April 2009. Nominations must be submit by Friday, March 6, 2009, 17:00 UTC. Please submit nominations to nominations@pir.org.

For additional information on .ORG and the Advisory Council, go to http://www.pir.org.

About .ORG, The Public Interest Registry

Trusted across all ages, backgrounds and nationalities, .ORG is where people turn to find credible information, get involved, fund causes and support advocacy. .ORG, The Public Interest Registry empowers the global noncommercial community to use the Internet more effectively and, concurrently, takes a leadership position among Internet stakeholders on policy and related issues. The .ORG domain is the Internet’s third largest “generic” or non-country specific top-level domain with more than 7 million domain names registered worldwide. .ORG, The Public Interest Registry, was founded by the Internet Society in 2002. It is based in Reston, Virginia, USA.

詳細

ドメインといえば、英字、数字、ハイフン以外の文字は使用できないというのが一般的である。しかし、その一方で1998年頃から非英語圏のユーザーが中心となって、それぞれの母国語でもドメインを使えるようにする国際化ドメイン(Internationalized Domain Name:以下IDN)が、本格的に検討されてきた。2003年3月にようやくIDNIETFInternet Engineering Task Force)によってRFCとして標準化されたことで、その動きは急速に活発になっている。

IDN日本語ドメインだけではないワールドワイドな規格

IDNのひとつである日本語ドメインのサービスがスタートしたのは2001年3月。当時は大きな話題になっただけに記憶されている方も多いだろう。しかし、まだIDNに正式対応するブラウザがなかったことから、爆発的な普及には至らなかった。こうした経緯もあり、国内ではIDNに対してネガティブなイメージをもつ方や、そもそもIDN日本国内の独自技術と誤解している方も多いようだ。そこでまずIDNの簡単な歴史から振り返ってみたい。

インターネットは、もともとアメリカ発の技術であったために、その上で動くツールは英語以外の言語ことはあまり考慮されていなかった。メールも1バイト文字しか使用できなかったため、英語やローマ字でしか書けないという時代もあった。しかし、インターネットが世界的に普及していく中で、多くのツールやプロトコルの国際化が図られ、英語以外の言語でインターネットを利用できる環境は格段に整っていった。そんな中で、ドメイン名に関しても自国語を使いたいというニーズがでてきたのは、ある意味当然といえるだろう。しかし、その実現には大きな問題があった。それはドメイン名IPアドレスを変換するDNSが、ASCII以外の文字を想定していなかったことだ。つまり日本語のような2バイト文字は、そのままのカタチではDNS変換できない。そのためIDNの実現には、まずそれを解決する包括的な技術とルールの確立が不可欠だったのだ。

この技術研究の場となったのがIETFの中に設置されたIDNのWorking Group。IDNはここで国際的なコンセンサスとりながら、互換性・相互運用性が研究された。そして、2003年3月にその研究成果がRFCとして発行され国際的な標準仕様が決定された。同時に2003年6月にはICANNが「IDN Guideline Ver.1.0」を発行。こうしてIDNICANNにも承認されたワールドワイドな公式規格として実を結んだわけである。

引用元:http://www.n-c-c.org/modules/weblinks6/singlelink.php?lid=4234

広告戦略に最適な日本語ドメイン 約6000を保有。日本語ドメイン をレンタル・賃貸します!転送サービスOK。お名前ドットコム (お名前.com)やムームードメイン 、バリュードメイン で最適のドメイン が見当たらない場合はこちらで!ドメイン マーケティング の決定版!販促用サイトアドレスとしても最高なドメイン を数多く取り揃えています。早い者勝ちです!

http://japandomain.jp/index2.html

A quick synopsis of public comments about new top level domains.

The deadline to comment on the latest revisions of ICANN’s plan for introducing new top level domains was Monday. As is usually the case, the big companies waited until the last minute to file their comments. The general consensus is:

1. We don’t like the trademark implications
2. There’s no need to add more TLDs (save for IDNs) as they won’t actually create competition for .com
3. Don’t rush it just to meet some artificial deadline; get it right
4. Start with a phased rollout of IDNs rather than ASCII gTLDs
5. We’re worried this will become one big cluster[expletive]

My prediction is that the new gTLD process will be significantly delayed. It’s possible we’ll see some lawsuits against ICANN to delay it further.

Here are some of the comments:

Microsoft: As a practical matter, Microsoft objects to the introduction of new ASCII gTLDs for several reasons. History suggests that the introduction of new ASCII gTLDs will not result in true competition…the introduction of potentially hundreds of new ASCII gTLDs is far more likely to threaten the security and stability of the Internet as a commercial platform than to ensure it.

Time Warner: ICANN needs to examine seriously for whose benefit the proposed new gTLD round is being launched. If it is truly for the benefit of the “next billion” Internet users around the globe, then the launch should focus on IDN TLDs to serve populations that have historically been excluded from full participation in the Internet in their native tongues. If after the launch of new IDN TLDs ICANN can demonstrate that there remains a strong need for additional gTLDs, only then should it consider the launch of such extensions.

US Chamber of Commerce: The new gTLD program will introduce significant threats to businesses and consumers without clear evidence of counterbalancing benefits…ICANN has provided little persuasive evidence that establishment of additional gTLDs will provide competition against .com addresses.

Internet Commerce Association: The new gTLD process must not be used to resurrect much less validate the concept of differential pricing by registries; any exceptions to this policy must only be for a carefully circumscribed group of “closed” registries subject to strict numerical registration limits.

Bank of America: We strongly believe that ICANN is proceeding too hastily to enable the unlimited expansion of new generic Top Level Domain names…We do not believe there is significant demand from businesses or consumers for additional gTLDs to host commercial sites. The dot com gTLD is the preeminent top level domain in the world. No other commercially-oriented domain comes close to dot com in popularity, whether measured by the number of registered domain names or by the amount of user traffic.

Read all comments on ICANN’s web site.


© DomainNameWire.com 2008.

Review and rate domain name registrars at Registrar Judge.

Internationalized Resource Identifiers (IRI’s) are a new take on the old URI (Uniform Resource Identifier), which through RFC 3986 restricted domain names to a subset of ASCII characters - mainly lower and upper case letters, numbers, and some punctuation. IRI’s were forecasted many years ago by Martin Dürst and Michel Suignard, and formalized in RFC 3987. IRI’s bring Unicode to the domain name world, allowing for people to register domain names in their native language, rather than being forced to use English.

It was apparent long ago that spoofing attacks would be a huge deal, and we’d need a system to deal with the problem. Anti-spoofing protections are sort of built in to the specifications, with Nameprep, Stringprep and Punycode primarily.  Nameprep is actually considered to be a profile of Stringprep.  In other words, Stringprep defines all the nitty gritty details available, and Nameprep creates a profile of a subset of those details which should be used when handling IDN’s.  Whew, let’s pause for a deep breath.

Before getting into it, a quick look at a normal IRI, or traditional URL. The first part indicates the scheme, which could be http://, https://, ftp://, mailto:// among others. This scheme does not support IDN right now. The next part is the subdomain label. It supports Unicode and IDN along with the next label, the domain name. Each of these labels are handled separately, meaning you can have Unicode in one label, but only ASCII in the other. In that case, only the label with Unicode will be processed with Nameprep and Punycode. The next label is the TLD, or top-level domain. These don’t support IDN yet, however, most browsers will parse them as if they did. The last part is the path, which can be a combination of Unicode and ASCII, and should be treated as UTF-8 in most all situations. The path does not have the same requirements as IDN, it’s completely separate just as the scheme.

IDN domain name, URI scheme, and path

IDN domain name, URI scheme, and path

Punycode
Punycode provides the encoding mechanism for representing a domain name with non-ASCII characters.  So once you have your cool Unicode domain name like www.ҀѺѺ.com, Punycode can make it DNS ready by converting it to all ASCII characters to look like www.XN–E3AAQ.com.  I’m not using very good examples, because they both look bad, but Punycode in particular looks hideous. But, it helps user’s distinguish between a spoofed domain name like www.microsоft.com which in Punycode looks like http://www.xn--microsft-sbh.com/.

Nameprep
Wait a second, what about Nameprep?  This specification requires normalization form KC be applied to IDN’s.  Normalization form KC performs a compatibility decomposition, followed by canonical composition.  If that sounds confusing, read the spec and see just how confusing it can be!  The reason I say Nameprep sort of provides anti-spoofing protection against homograph attacks, is because the normalization reduces some characters to their compatibility equivalents. For example, the Latin full-width character ‘W’ (FF37) looks a lot like the ASCII ‘W’ (0057). By normalizing the string with form KC, the full-width character is mapped down to its ASCII equivalent. This process reduces the chance of a spoof attack working for a large set of confusables.

The IETF defines IDN in RFCs 3490, 3491, 3492 and 3454, and and bases IDN on Unicode 3.2. This means that changes to the Unicode spec, currently at 5.1, will take a long time to get applied to most software that deals with IDN’s. Searching for differences between Unicode 3.2 and 5.1 or the most current spec are sure to yield some interesting test cases.

TLD Whitelisting
After all this, it’s not enough to protect the innocent. Some registrar’s have designed policies to prohibit or specially deal with lookalike characters for the TLD’s they represent. This is an approach, but are we now relying on this distributed network of trust and scattered policy? Seems that’s part of the strategy with several browsers.

Firefox
maintains a TLD
whitelist. That means Firefox will display IDN domain names in their pure Unicode form for trusted TLD’s, rather than convert them to Punycode in the display and URL bar. You can get at this configuration through about:config and going to network.IDN.

Safari also maintains a whitelist of TLD’s although I don’t know how to find this information. Opera makes their whitelist configurable by going to the opera:config#Network|IDNAWhiteList URL in your browser. I believe .com, .net, and .org were on this list, at least they were in mine until I clicked ‘default’ which reset the list.

Internet Explorer does not implement a TLD whitelist that I’m aware of, but it does support limited mixed-scripts within domain labels.

Browser testing by W3 verifies this, and also documents the behaviors of each browser. There are many differences across browsers of course, and Opera mostly seems to have several inconsistencies within its own operation.

IDN Testing
IDN testing and research has been going on for a while, some good resources:

Although these resources have their own IDN testing pages, I made one of my own. Mainly to test some characters I was interested in, and also some from the list of stringprep prohibited.

Test Cases:
http://www.lookout.net/test-cases/idn-and-iri-spoofing-tests/

source: http://www.lookout.net/

This is a IDN Response and Summary to the Cairo public forum that took place on 6 November 2008.

A PDF version of full document is available at: http://www.icann.org/en/participate/cairo-public-forum-response.pdf


CONTENTS

COMMENTS

QUESTIONS


SUBJECT AREA: IDNs and IDN ccTLDs

Giving governments control over ccTLD space may stifle competition (Respondent 1, .ng, Respondent 2, unknown affiliation)

ICANN Staff response: We are aware of the concerns that people have regarding the IDN Fast Track – where a limited number of internationalized domain names (IDNs) are approved before a full policy is developed by the country-code names supporting organization (ccNSO).

However, with respect to the fears raised about governments having some form of control over this space, we believe this stems from a misunderstanding of what ICANN is doing with regard to internationalized domain names.

First off, it should be noted that IDN applications will be accepted as part of the new gTLDs process. That means that anyone following the gTLD Applicant Guidebook requirements will be able to apply for a top-level domain in their script or language.

There are additional criteria that need to be considered for IDNs (all of which are outlined in the Applicant Guidebook). However, applications for IDNs will be accepted and will be introduced at the same time as other gTLDs.

The ccTLD Fast Track on the other hand covers a very specific type of IDN – namely, those domain names that represent the name of a country or a territory.

During the course of the policy processes that the community has gone through over the past year or more, both governments (through the Governmental Advisory Committee, or GAC) and country-code managers (through the ccNSO) expressed their concerns about people applying for new top-level domains that represent the names of their countries or their existing top-level domains.

It has long been a rule that new generic top-level domains must be made up of at least three letters. One-letter TLDs are held back for technical reasons; and two-letter TLDs are reserved for use by the countries of the world i.e. .de for Germany; .jp for Japan; .us for the United States (and are based on an international standard).

The addition of TLDs in other languages and scripts complicates this system. Firstly, in some scripts whole words can be produced using a single character. Secondly, taking Japan’s ccTLD as an example, .jp is an ASCII representation for Japan, but Japan has its own script that does not use “j” or “p”. The countries of the world are justifiably proud of their own ccTLDs - many of which represent the Internet itself to their peoples – and so they have asserted that they have a right to have their language equivalent of their ccTLD.

In the same vein, many governments are concerned that individuals or companies will register top-level domains that represent the country. To use Japan again as an example, something like “.japan” or the equivalent of .japan in Japanese script.

It is for these reasons that the ccNSO is embarking on a policy development process to decide how to resolve such applications. Since this process will take some time, and because of the significant demand that has built up for TLDs in other scripts, the ccTLD Fast Track was created to allow for the creation of IDN TLDs that both the GAC and the ccNSO could agree would not be challenged (it should be noted, incidentally, that The Fast Track is based and builds upon the current IANA practices for the delegation of ccTLDs).

This means that those IDNs that come through the Fast Track will, by design, need the endorsement (or non-objection) of the relevant public authority, which in many cases will be a government department. At the same time, it must also meet the need of that particular community and the community must demonstrate that they are ready to implement the IDN ccTLD.

That is very different from saying that governments will have controls over IDNs or even IDN ccTLDs, however. Although it is true that IDNs that denote a specific country will be unlikely to make it through the new gTLD application process (as they are likely to be considered part of the ccTLD Fast Track), the whole world of top-level domains in different scripts is open to those that wish to apply.

So while a Japanese organization will not succeed with an application for .japan, or .jp in Japanese script, it will be able to apply for something that has meaning to Japanese Internet users in their own language. So, for example, cartoons are extremely popular in Japan. If an organization felt there would be sufficient interest in a whole area of the Internet dedicated to cartoons, it could apply for .cartoon in Japanese script.

So the ccTLD Fast Track is not stifling competition at the ccTLD level any more than current practices. While at the same time, the new gTLD process will hugely increase the opportunities for competition for Internet users across the world and in their own languages by allowing IDNs.


The three-letter rule for new gTLDs does not work in some scripts where one character can represent an entity (William Tan, individual)

ICANN Staff response: Thank you for this feedback and for highlighting the disparity that can be created by applying English-language rules and assumptions onto other scripts and languages.

The example given in the public forum of “.cat” being represented by a single character in Chinese but also being represented by many more than one character in the domain name system itself (all domains in non-ASCII scripts being represented a the technical level by the ASCII prefix “xn--“) was a helpful illustration.

Please be assured that ICANN will carefully review whether and how the three-character rule can be applied with regard to IDNs. As always with IDNs, however, the fact that there are many thousands of different scripts, each with its own attributes, means the issue is likely to be complex.

If it is indeed possible to waive the three-character rules for IDNs, or certain types of IDN, without detrimental impact elsewhere, ICANN will follow that path. As it currently stands, single-letter characters will not be allowed for technical reasons and two-character domains are held back because of the traditional use of the ISO list for defining country-code TLDs. We are waiting on further public comments to guide final recommendations.


QUESTIONS

SUBJECT AREA: Applicant Guidebook
Will IDNs and gTLDs be available at the same time? (Respondent 2, unknown affiliation, Annette Muehlberg, individual)

ICANN Staff response: Staff is working as fast as possible to get both processes implemented and currently it looks like they will go live at the same time.

However, should one of the processes be delayed then this will not slow down the launch of the other process, as was suggested in earlier comments. As of today there is no specific launch date for either process.

The situation is complicated by the work being done by the IETF on an IDNA protocol standard. We sincerely hope that the IDNA protocol will be finished in time for the rollout of gTLD applications (which will include IDNs) but we are preparing to go ahead without the protocol being finalized.

If you are confused about the introduction of IDNs through the so-called Fast Track and how that relates to the new gTLD process, please see an earlier answer above for more context.

Let’s face it, playing tricks that mess with people’s perception can be fun.  With Unicode, there’s lots of fun tricks to be had.  What’s to stop someone from believing the following is what it appears to be:

www.аmazon.com

Looks like amazon.com of course, but it’s not.  The first ‘a’ is the Cyrillic small letter a, not the English, or Latin rather, small letter ‘a’, although they look identical - they’re from two different languages.   Confused?  Good.  Now hover your mouse over the link above, don’t click it because I don’t know where it goes but it probably isn’t nice.  In your browser’s status bar you should see the Punycode encoded version of the domain name:

http://www.xn--mazon-3ve.com/

Because DNS does not support Unicode (only a subset of ASCII characters are allowed), we have IDN (Internationalized Domain Name) standards which define how domain names with Unicode characters should be encoded.  Punycode is the name of the encoding mechanism.

The above is often referred to as an IDN homograph attack.  Aside from spoofing with lookalike characters from completely different alphabets, we can do a bunch of spoofing just within our own alphabets.  For example, certain fonts make combinations of characters hard to determine.  Just like the letter’s ‘r’ and ‘n’ together can look like the letter ‘m’: rn == m Zeroe’s can look like ‘O’ and the number 1 can look like a lower case ‘l’.  So you wind up with lots of clever visual attacks:

  • www.rnu11ets.com looks a lot like www.mullets.com
  • www.rnu11ets.com looks a lot like www.mullets.com
  • www.rnu11ets.com looks a lot like www.mullets.com
  • www.rnu11ets.com looks a lot like www.mullets.com
  • www.rnu11ets.com looks a lot like www.mullets.com
  • www.rnu11ets.com looks a lot like www.mullets.com
  • www.rnu11ets.com looks a lot like www.mullets.com
  • www.rnu11ets.com looks a lot like www.mullets.com

I’ve listed the same text here in several different fonts, because in some fonts, you wouldn’t be able to tell the visual difference between the two words.  The visual appearance of characters has a lot to do with the fonts used to display the glyph, not just the alphabet.

The Confusables

These types of visual attacks are attributed to what’s known as ‘the confusables‘ and have been documented in Unicode’s Technical Report 36 and TR39.  The confusables is a name given to scripts that essentially lookalike each other. The Unicode consortium has defined three main classes of confusable strings which are possible:

  1. Single-script
  2. Mixed-script
  3. Whole-script

I want to investigate each one in turn.  Because I’m simplifying things here, I may not be accurate in my use of the terms script, alphabet, letter, and so on.  Linguistics people get it better than I do but for the rest of us, the term ‘script refers to:

A collection of letters and other written signs used to represent textual information in one or more writing systems. For example, Russian is written with a subset of the Cyrillic script; Ukranian is written with a different subset. The Japanese writing system uses several scripts.

Single-script confusables

These occur when letters from the same alphabet, or script, are used to give the same visual appearance.  This definition should be extended to say that these occur when letters from either the same script, inherited script, or common script, are used together.   For example, the following two combinations of Latin letters look identical:

  • so̷s
  • søs

If you take these apart, there’s a big difference.  While the letter ’s’ is the same in each, the ‘o̷’ and ‘ø’ are different.  The first uses the Basic Latin ‘o’ with a combining diacritical mark named COMBINING SHORT SOLIDUS OVERLAY, which is considered an inherited script.  To put it a different way, we have two atomic Unicode code points here, which together give the affect of a single character or letter.  The second uses the atomic character LATIN SMALL LETTER O WITH STROKE.  Let’s take these apart and look at the Unicode code point values for each.

  • so̷s == \u0073\u006F\u0337\u0073
  • søs == \u0073\u00F8\u0073

As you can see, the first ‘o̷’ gets formed from two Unicode code points, u006F and u0337.  If you copy and paste that word into a text editor that supports Unicode (e.g. Notepad) and click backspace, you’ll see the first backspace removes the combining diacritical mark, and the second removes the ‘o’.  Continuing with the example, the second ‘ø’ is made of a single Unicode code point u00F8 part of the Latin-1 Supplement Unicode block. At a lower level, because we’re using different code points and bytes to achieve the same visual affect, we have a case of the confusables.

Let’s take a closer look at what qualifies as a single-script confusable for the Latin lower-case letter ‘a’ - taken from the confusables table at http://unicode.org/reports/tr39/data/confusables.txt.

FF21 ; 0041 ; SA # ( A → A ) FULLWIDTH LATIN CAPITAL LETTER A → LATIN CAPITAL LETTER A
1D400 ; 0041 ; SA # ( 𝐀 → A ) MATHEMATICAL BOLD CAPITAL A → LATIN CAPITAL LETTER A # {nfkc:119809}

1D434 ; 0041 ; SA # ( 𝐴 → A ) MATHEMATICAL ITALIC CAPITAL A → LATIN CAPITAL LETTER A # {nfkc:119861}

Update: I just realized that some of the characters broke Wordpress so I’ve converted them all to NCR. In the above you can see three characters that all visually look similar to the Latin lowercase letter ‘a’. The first number is the code point for the confusable, the second number 0041 is the code point for ‘a’, and the following stuff is some descriptive text.

The reason the ‘Mathematical’ characters are considered single-script confusables is because they have the common script class assigned to them.

Other scripts exist which have their own characters confusable with the Latin ‘a’, but those are considered mixed-script, which I’ll go over in another post. For now I’ll leave you with a list of test cases for single-script confusables. Some are more obvious than others, and it all depends on the font - I’ve set Lucida Sans Unicode which is supported on most Mac’s and Windows machines.

  • Microsoft → Micros𝗈ft
  • Apple → Ap𝗉le
  • Google → Google
  • IBM → IBM
  • Oracle → O𝗿𝗮cle
  • Intel → Int𝗲𝗹

Mixed-script confusables

These occur when letters from one alphabet or script, are used to give the same visual appearance as letters from a completely different script.  For example, the following words contain a mix of Latin and Cyrillic letters which are indistinguishable from their counterparts:

  • Spооfing with hоmogrаphs

If you look at the letters, you’ll see that the ‘oo’ in ‘Spoofing’ is made up of two Cyrillic small letters ‘o’, and the ‘a’ in ‘homographs’ is Cyrillic as well.  Let’s take some of the words apart and look at the Unicode code point values for each.

  • Spoofing == \u0053\u0070\u006F\u006F\u0066\u0069\u006E\u0067
  • Spoofing == \u0053\u0070\u043E\u043E\u0066\u0069\u006E\u0067

The first version of ‘Spoofing’ uses all ASCII Latin letters, but the second mixes in the Cyrillic letters ‘oo’. Now if the word ‘Spoofing’ was being filtered, you could probably bypass the filter using this case of mixed-script confusables.

In fact, the confusables can be used to bypass profanity filters, ad filters, or just about any system that wants to blacklist words but still accepts Unicode.

As a test case, most browsers and other software shouldn’t allow the end-user to be fooled by the following IDN homograph attacks. These domain names contain mixed-script confusables, and should be represented in their lovely Punycode encoding for the user to realize they may not be what they appear to be.

www.microsоft.com is http://www.xn--microsft-sbh.com/
www.Αpple.com is http://www.xn--pple-zld.com/
www.faϲebook.com is http://www.xn--faebook-6pf.com/

I’ll take them apart another time, planning to look closer at IDN, IRI’s and the rules around them.

Whole-script confusables

It’s starting to make sense now. Let’s look at the Unicode TR39 definition of a whole-script confusable:

X and Y are whole-script confusables if they are mixed-script confusables, and each of them is a single script string. Example: “scope” in Latin and “ѕсоре” in Cyrillic.

If we look at the code points, we’ll see the clear difference between the two scripts being used:

  • scope == \u0073\u0063\u006F\u0070\u0065
  • ѕсоре == \u0455\u0441\u043E\u0440\u0435

The first version of ’scope’ uses all Latin letters, but the second uses all Cyrillic letters. We call it a whole-script confusable because each word is made of entirely of a single script, we’re not mixing scripts within the same string.

The confusables can be used to bypass profanity filters, ad filters, or just about any system that wants to blacklist words but still accepts Unicode.

As a test case, most browsers and other software shouldn’t allow the end-user to be fooled by the following IDN homograph attacks. These domain names contain whole-script confusables, and should be represented in their lovely Punycode encoding for the user to realize they may not be what they appear to be.

www.аЬс.com is http://www.xn--80a8a6a.com/
www.ігѕ.com is http://www.xn--c1a2eb.com/

source: http://www.lookout.net

Eurid expects to introduce IDN in September 2009. There will only be a landrush, no sunrise. This means that there’s no special protection for the owners of a registered trademark or for the owners of a registered .eu name.

Internationalised Domain Names (IDNs) are domain names that contain characters such as letters with accents and characters in non-Latin languages such as Greek. For example, with IDN the registration of the domain name www.café.eu becomes possible while you can now only register www.cafe.eu.

The implementation of IDNs under .eu will not only support all characters of all the 23 official EU languages. Also the complete alphabets of all the official EU languages will be supported. This means that characters of these alphabets, which aren’t used in any of the official languages will be possible. This way, Eurid hopes to be prepared for the accession of new countries to the EU..

source: http://www.tld.sc

ICANN is pleased to release an updated draft Implementation Plan for the IDN ccTLD fast track process [PDF, 265K]. The updated document contains clarifying information about the notion of IDN tables.
An IDN Table is a table listing all those characters that a particular TLD registry supports. If one or more of these characters are considered a variant this is indicated next to that/those characters. It is also indicated which character a particular character is a variant to. The IDN table usually holds characters representing a specific language but can also be characters from a specific script.

With the updated draft document, ICANN is seeking community input about the approach to development of the IDN tables. The comment period for the Draft Implementation Plan has been extended to 7 January 2009 to allow for additional community review of the updated draft document.

Comments on the Draft Implementation Plan are welcome via email to ft-implementation@icann.org . An archive of all comments received will be publicly posted at http://forum.icann.org/lists/ft-implementation/.

Related links:

Public comment period: http://www.icann.org/en/public-comment/public-comment-200812.html#plan-idn-cctlds

Revised Fast Track Draft Implementation Plan: http://www.icann.org/en/topics/idn/fast-track/idn-cctld-implementation-plan-26nov08-en.pdf

Working Group Final Report (with public comments): http://www.icann.org/en/topics/idn/fast-track/staff-considerations-idnc-wg-final-report-23oct08-en.pdf [PDF, 269K]

Fast Track webpage: http://www.icann.org/en/topics/idn/fast-track/

In early 2009, the Internet Engineering Task Force (IETF) plans to adopt the update for international domain names (IDN) discussed since the beginning of this year. This became evident in the talks at the developers’ meeting in Minneapolis this week. One of the new entries on the list of characters allowed for domains that don’t use the ASCII character set will be the German “eszet” or “scharfes S” character (”ß”), which has been excluded from the IDN standards (RFC3490, RFC3492) until now. It won’t make an immediate difference for German internet users, though – domains containing an eszet will continue to replace it with ss. Marcos Sanz, who represents the German internet registrar DeNIC in the IETF’s IDN working groups, said in Minneapolis that DeNIC welcomes the additional possibility. He said, however, DeNIC has so far not decided how and when to make use of this new registration option. Sanz said it is important to consider that many users rely on the current mapping rules and state their contact addresses with the ß character accordingly.

Up until the last moment, there were discussions within the IETF about the extent to which registries should be allowed to determine their own special language characteristics and how to use them. The authors of the voluminous new standards series about internationalised domains to be adopted as IDNA 2008 advocated stricter rules within the actual standard documents.

On the other hand, Vinton Cerf said that “in terms of the numerous special requests we strongly depend on the registries”. The IETF’s board asked the Turing Award 2004-winning co-author of IP to lead the hot-headed working group. Cerf said that the registries know best about language specific problems. In Minneapolis, he strongly advised to hold a final consultation with representatives of the Arabic countries, whose alphabet-related problems make the Germany’s single “eszet” character seem rather unimportant. A separate standard document called BIDI, for example, allows domain names to be written from right to left. The current “consultation” relates to the various numbering systems within the Arabic language community.

Apart from the Western numbers 1,2,3, …, which came to Europe from India via the Arabic countries, Arabic languages also use classical Eastern Arabic numerals, also called Indic numerals. Things get complicated because the numbers four, five and six are written differently in the Eastern and Western Arabic countries. The problem was further complicated by Unicode, the organisation that takes inventory and files the code of languages. Instead of just giving different character codes to the three character variations, Unicode gave different character codes to the entire two sets of numbers. As a result, the Arabic number one matches two different Unicode character codes, depending on whether the Unicode character set for Western Arabic or Eastern Arabic is used.

If the two character sets are adopted and used in parallel, the overlap will at best cause confusion, at worst it will be exploited by phishers. Arabic language experts have now been asked to help decide whether to adopt only one of the character sets, whether to restrict domains to using one specific character set, or whether to only stipulate that the numbers can’t appear right next to each other. Some observers think that the deadline of two weeks is much too short, warning that the whole standardisation process is far too driven by Western experts. Indeed it sometimes seems rather peculiar that Americans and Europeans should argue about the finer aspects of Persian, Urdu or Dhivehi – the language spoken in the Maldives.

(Monika Ermert)

(lghp)

source: heise-online.co.uk

  • IDNs - An Overview (English)

Sun 02 Nov 2008

What it is | At this session ICANN Staff will give an introduction to internationalised domain names (IDNs). The session should cover basic concepts as well as technical rules to be followed if an IDN is considered for future New gTLD applications.

Internationalized Domain Names - A Basic Introduction (pdf)

Audiocast:
أسماء النطاقات المدولة – لمحة عامة

وصف الموضوع: في هذه الجلسة سيقدم موظفي آيكان عرضا لأسماء النطاقات المدولة. ستشمل هذه الجلسة المفاهيم الأساسية وكذلك القواعد الفنية التي ينبغي اتباعها في حالة مراعاة أسماء النطاقات المدولة في طلبات أسماء نطاقات المستوى الأعلى العامة الجديدة في المستقبل

نوع الجلسة: محاضرة
الحضور: جميع المهتمين بالتعلم عن أسماء النطاقات المدولة وأسماء نطاقات المستوى الأعلى العامة الجديدة.

Internationalized Domain Names - A Basic Introduction (العربية) (ar)

  • IDN Program Status Report ( by Tina Dam, Director, IDN Program )

Thu 06 Nov 2008