Our Readers Aren’t Always English

MultilingualismThe internet is an amazing tool. It can bring together people from all walks of life, from anywhere on the planet, regardless of culture, gender, sexual orientation or physical condition. We can read about world events moments after they occur, and we can read about someone’s personal life or opinions at our leisure. The internet doesn’t care what we look like, what political views we may have, or what our biases are. It’s open for everyone.

So why do site owners continue to ignore the fact that English, while still the dominant language used on the internet, is not the only language people use to communicate?

According to Internet World Stats, 30.1% of all internet users (as of November 2007) were English speakers. Chinese was second with 14.7%, Spanish had 9.0%, and Japanese a respectable 6.9%. If we add up all the non-native English speakers, we’ll see that a whopping 882,503,350 people are using the internet to collect and share information. That’s quite a chunk of the human population. So why the heck are websites designed to promote reader feedback (eg. blogs), not properly equipped to handle people writing something in a non-Roman character set?

UTF-8 Is Hardly New

With close to 70% of the internet population coming from non-English nations, one would think that websites that invite readers to comment or otherwise participate in an online community would permit other character sets. How often have we seen websites unable to handle Chinese, Japanese and Korean characters? How often do we see pages incorrectly handle Cyrillic, Greek, Turkish, Hebrew, Arabic, Baltic and Vietnamese languages? Why aren’t these supported by default when we install applications like Mambo, Joomla or WordPress?

Universal code pages have been available since 1993, with the most popular being UTF-8 and UTF-16. However, it seems that most sites are designed to handle just one or two target languages and completely forget that the internet is essentially open to anyone with the resources to get online.

UTF-8 can encode any character that is in use by the world’s languages, which avoids the need for readers to know which code page they should change their browsers to read. On top of this, it’s completely backwards compatible with the standard ASCII character set, which means that existing sites could (in theory) easily migrate their databases to use UTF-8 tables without any data loss. This isn’t to say that changing a database from one code page to another is easy, because nothing could be further from the truth, but the sooner the world moves towards a single universal character standard, the sooner we can start fostering closer ties between the languages.

Why Does This Matter If The Target Audience Is English?

I’ve mentioned a few time that I prefer to write my name in comments as “ジェイソン (Jason)”. This gives me the opportunity to stand out from the rest of the Jason’s online, as well as convey the fact that I’m one of those typical geeks who thinks seeing his name in Japanese Katakana is cool. 97% of all the blogs I visit and leave comments on are clearly designed for an English audience. There is no option for a machine translation of the site. There is a lovely error thrown when you type a non-English character into the name or comment fields of most blogs. And sites that don’t throw errors when presented with a non-English character will convert that text to either a question mark or a square box.

Nothing says “duh” like ????? (Jason).

Most sites exist because the content creators want to share information with the world, so wouldn’t it make sense to allow the world to leave comments in whatever language they choose? If we can’t understand what is being said, we would still have the option of editing the message or deleting it. So why not give people the option?

In the last few weeks I have seen far more cases of sites silently running into character encoding errors and converting some of my comment either to question marks or squares. Although English is my primary language (French was my first), it’s hardly one that I’m limited to. When visiting one of those ever-present “blog about blogging” or “make money online” sites, I’m always astounded by how often I’m greeted with errors. Wouldn’t these two niches have the most incentive to capture the 800,000,000+ non-native English reading crowd?

Domains Are MultiLingual, Too

With the introduction of IDN multilingual domains in 2004, I was really hoping to see some changes take place to close the language gaps between readers. Japanese, Chinese, Hebrew, Arabic and Korean sites typically have no problems accepting an English comment, even with my Japanese katakana/romaji name mixture, so it shouldn’t be too much of a stretch for the rest of us to get on board with a UTF-8 website.

Heck, I even have two of these domains myself. ジェイソンアーウィン.jp and ジェイソン.jp!

We all know that it’s only a matter of time before English is a secondary or tertiary language online. Sure, there will always be a huge English presence, and there are millions of people all over the world that have English as a second or third language. So it seems to make sense that the sooner we make our sites more friendly to the non-English crowd, the better it will be for all of us.

This doesn’t mean that we should be putting machine translation links on all of our sites, or getting someone to translate our posts into other languages for a more natural read, though. Instead, we can just look to prepare ourselves for the coming change in online communication.

What do you think of the lesser role of English on the internet? Should site owners start making their sites more friendly to everyone, or will language-specific sites forever dominate the web? Have you bought your IDN, yet?


Leave a Reply