Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
News

Registrations Now Accepted For Asian Domain Names 138

Eric Sun was among the first to point out that as of Thursday evening, VeriSign has begun accepting Chinese, Japanese and Korean domain names. "This increases the possible characters from 37 (26 letters, 10 numerals, and hyphen) to 40,282. Find more information [see this AP story]." snrsamy points to the same story as featured on C|Net . jamie suggests reading the technical lowdown at VeriSign.
This discussion has been archived. No new comments can be posted.

Registrations Now Accepted For Asian Domain Names

Comments Filter:
  • To help us keep the internet English compatible.

    Come on, we invented it, we populated it, we control it, and now the Asian hordes are trying to subvert it.

    Let them make their own internet.

    Not to mention some of the domain names may belong to Al Gore :)

  • But what's the POINT?? It just totally screws up existing protocols that have been tried, tested and proven to work totally well. And what if we want to browse to an asian site? Seriously, lets say you have a friend in China and he has an article up that you want to read that's in english. HTF are we supposed to get to it? Wheras these guys can view the entire internet, they also have their own "private" section that only very few of the western world can access. tom
  • Basically you need a pair of 8 bits for one Chinese character (the first bit for identifying this is Chinese and the other 7 for actual encoding). So each character tooks two bytes.

    However, Chinese are commonly known as more concise than English or other languages with a small character set. There are thousands of commonly used characters each of which have the function of a word in English. Many characters have more than one meaning, and their combination (2 characters in most case) makes new words. And don't forget the amazing flexibility in the Grammar system (e.g. fewer stop words like "the")! We are not even talking about the ancient Chinese which is much SHORTER.

    Give me any sentence with more than 10 English words (with no words like Yugoslavia of course), I guarentee to re-write it in Chinese in less space.

    You see, this is the basic rule of information. You increase the complexity of encoding scheme, you get more density.

    How complex this is? Well, I have to say that the 12 years' of Chinese class are a painful memory.

  • Just wanted to add that anyone who wants to know more about the whole topic should start at this page. [linux.or.jp]

  • > How can I enter these funky characters?
    > I dunno, just a guess, but maybe someone's already thought of this? Perhaps...

    It's easy to enter kanji if you are using Internet Explorer - just visit Windows Update and download the Japanese Input Method Editor update, and you'll be able to type kanji in your browser (using romaji I think). I don't know how you do with Mozilla...

  • What would happen if someone said "Let's add 2 new data pins to RS232"?

    They already did [iu.edu]:

    pin 14 STD Secondary transmit data
    pin 16 SRD Secondary receive data
    (also pin 19 SRTS Secondary RTS, pin 13 SCTS Secondary CTS, etc.)

    These pins can be used to double the amount of data sent through your RS-232 cable, which would be useful if you decided to (say) switch from 8-bit characters to 16-bit characters.

    It's not an RS-232 cable unless it has all 20 wires!!! (-: (-:
  • by Rob Parkhill ( 1444 ) on Friday November 10, 2000 @07:27AM (#632551) Homepage
    Since nobody seems to want to read the article, or research any of the info, here is the quick low-down (since I have to deal with this at work right now...)

    - This solution is only for web browsers. It requires a special version of a web browser, or a plugin, to be able to use the new encoding scheme. It won't work for email, ftp, telnet, gopher, etc, unless a special version of the program is written.

    - DNS doesn't break. DNS still uses ASCII. This scheme uses RACE to encode the multi-lingual character set into ASCII. NSI will put a small prefix at the start of the domain name to identify it as multi-lingual (for example eq- would be found at the start of the domain name. The exact prefix has not yet been released to prevent squatters from snapping them up.)

    - The special browsers will detect the prefix, and translate the ASCII gibberish into the specified multi-lingual character set. The browser also does the conversion back to ASCII to allow a DNS lookup.

    - WHOIS does not/will not support this. You can only use WHOIS with the ASCII encoded gibberish.

    - This is not supported by the IETF. This is a custom solution implemented by NSI. But it looks like they are going to be WAY behind schedule in actually rolling this out.

    - They are accepcting registrations right now, but none of these names will resolve for at least a month, probably much longer. In other words, the system isn't useable yet, but NSI can collect money.

    - The IETF is working on their own, probably completely incompatible system, to do the same thing.

  • Although according to the gov't, it was never a war at all.
  • Well, if your only connection to the Asian population is spam email, this should make your isolationism even more simple: the standard uses a standard prefix for RACE-encoded domain names; block those and you're in arrogant English/USian bliss.

    Blah. Spare us your arrogant anti-English/US attitude.

    Fact is, it is conveniant to be able to block certain top-level country codes at the business gateway (or ISP) in order to cut down on spam.

    Incidentally, someone's connection to the Asian population is most likely NOT through spam, since most spam coming from asian top-levels is actually just U.S. spam--either routed through someone elses mail system, or with spoofed headers.

  • Whether or not this is compliant with one or more RFCs, it is entirely noncompliant with most. Internationalization of the Internet is inevitable and a Good Thing(tm), but only when it takes place via the appropriate processes. As others have pointed out, internationalization was already happening, but it takes time.

    This is nothing more than an attempt by NSI to open another huge revenue stream without any consideration for the effect it will have on the Internet, or the long-term interests of the Internet community. After all, they see an untapped potential market and a chance to dominate it by jumping in before the standards are developed that would allow others to participate. Now their competitors will have to follow their lead or risk losing the market, and the standards process will have been neatly circumvented. The cost is borne by the Internet community, and the benefits are reaped by NSI.

    Why did I vote for Nader? Now I remember...
  • The > and < symbols are not part of the RACE string. I tried typing in anime () into their "Multilingual Conversion Tool" and got the following result:

    Input String
    Utf-8

    Prepared String
    Utf-8

    Registration String
    RACE
    bq--gcrmxyi

    --
    EFF Member #11254

  • The problem isn't necisarily with buffer overflows, read bug-traq...

    there was a report a couple of weeks ago regarding a problem with internationalised IIS's where unicode representations of directory traversal codes (.,/,\,etc) where being substitued after access checks had been applied...

    Now imagine domain based trust relationships - these will be implemented in numerous sub-systems (tcp wrappers, .rhosts, sendmail.cf, etc...) each of which may perform the normalisation/access checks slightly differently.

    I imagine that this will lead to numerous security issues due to slight differences in systems support for multi-byte characters.

    Another question (which I suspect will be answered in the FAQ) is do you need to register the same domain name several times to take account of the differing unicode byte widths?
  • by Megane ( 129182 ) on Friday November 10, 2000 @06:35AM (#632557)
    Okay, for the PDF challenged, it seems to not be an RFC, but to be compliant with the current RFC spec, in consideration of RFC2825, which points out that there is simply too much software out there which will break when given UTF-8 domain names.

    How it works is there is a special prefix "<rp>" (or maybe this just represents the prefix, I can't really tell from the PDF, but I didn't think < and > were valid domain name characters) that indicates a part of the domain is encoded, followed by the encoded name which only uses ASCII characters, and includes information about which character set was used (Unicode, SJIS, etc.). The algorithm is called RACE, Row-based ASCII Compatible Encoding.

    A couple of examples were given for both a domain name and a server name:

    <rp>45dfg62de34432.COM
    <rp>3df45gd345.<rp>45dfg62de34432.COM

    So I guess you can set your spam filters to block any domain starting with <rp>! :)

  • I think this is 'a bad thing'.
    I don't think standards like this scale well.
    What would happen if someone said
    "Let's add 2 new data pins to RS232"?

    I live in a country where we want 3 extra symbols to accomodate the language. They're all in Latin-1, of course. I don't even think that expanding to 8-bit Latin-1 is necessarily a good thing, let alone introducing an entirely new character encoding (16-bit) to the scheme of things.
    I don't want to be
    "f\0a\0t\0p\0h\0i\0l\0.\0o\0r\0g\0\0"

    We don't let Russian trains into central Europe (the tracks are wider), why should be let Kanji into our character sets. (Yes, I know Russian trains do come to Europe, I live at the end of one of the lines, just not central europe!)

    Anyway, here's to 3-bit serial lines...
    (Could I patent that? I'd need to design an IC of flip-flap-flup-flipflop-catflap-flatcap-fatcat-flo ps first...)

    FatPhil
  • by Speare ( 84249 ) on Friday November 10, 2000 @06:38AM (#632559) Homepage Journal

    Will moderators shoot down the fact that I mention Microsoft?

    Windows has had a CJK-capable kanji input scheme for years. CJK: Chinese, Japanese, Korean. Windows also has had bidi (bidirectional) support for right-left and/or top-bottom languages, including Hebrew.

    If you have the appropriate cjk-input features installed, it's just a funky keyboard shortcut to open it up to enter kanji. If not, you'll probably be limited to clicking on visible links, not entering domain names or other text by hand.

    I don't know what features Linux has to handle EFIGSS (English, French, Italian, Swedish, Spanish) differences, nevermind bidi or kanji input.

  • What happened to "AND"? I want and back!!!
  • Man, you're pretty fucked up.
  • Um...just out of interest how often do you go to Asian sites? An estimate will do - maybe once since you first logged on? For the vast majority of Internet users in the West this will have no effect whatsoever at all because the vast majority don't speak Chinese, Japanese and the other languages which use different alphabet sets. The people affected will be the ones whose alphabets are being introduced, and therefore the ones who are likely to find it convenient not to have to use our system. The sites run by companies such as, say, Sony will continue to have sites which can be easily accessed by the rest of the world. A very black day for the net? Not really. More a sign that the system doesn't have to be designed by Americans for Americans.
  • by Megane ( 129182 )
    So is there an RFC on how this works?
  • AFAIK the RFC describing URLs limits the valid characters in a URL to basically lower and upper case letters and some marks (like slash, underscore, etc.) But not even european letters are allowed. If so, having a chinese domain name is fine, but you can't have a URL pointing to it. Or can you?
  • > you don't need a special keyboard

    Yeah, I'd kind of figured that, hence the reference to the fictional "UnicodeMap". I occasionally use character map programs for accents, and even know a few keyboard shortcuts for common ones. I can't imagine doing that for a whole line, let alone a language I don't know enough (any) to have a clue where to start looking for the character that probably can't be displayed anyway because the neccesary fonts are not installed, Chinese might as well be Martian in that respect.

    I don't really think it's going to be an issue though; NonLatinAlphabet.com is almost certainly going to register their URL in the DNS supported languages of all the countries they wish to do business in and point them to that language version of the site. Ultimately it should make it easier for users who don't have Latin keyboards to get by on the web, and this is definately a very good thing.

    English may well be the lingua-franca of the web, but why should a Chinese speaker get to a Chinese web site, hosted in China, that is displayed in Chinese by entering a URL in English. All web users require some support for Latin characters, and probably always will, but as a failsafe the reverse should apply too, and we can't fall back on IP numbers because the web is supposed to be using HTTP 1.1 isn't it?

  • If I wrote "yahoo.com" in Katakana (the Japanese phonetic language for forign words) would Yahoo be able to sue me?

    What about sites that want their corporate name in all these new languages (would Yahoo have to register it's name under all the new languages?). Is there a market for this kind of registration?

    Capt. Ron

  • Had you ever actually considered what using the Internet must be like for non-English speaking countries? Probably something equally unpleasing to the eye.

    Seeing as the Internet is supposed to be the medium that allows a break-down of barriers between nations and a free flow of information, don't you think that it might be a good idea to include as many languages as possible rather than exclude anybody who doesn't use a language that conforms to your standards?

    I think you need to realise now, that English is not the only language in the world - in fact we're in a vast minority. It's possible that at some point enough people will undertake the task of learning enough foreign languages to free up communication between ourselves, and perhaps ulitmately one language will be considered the accepted standard - however, don't expect that to be English.
  • by AntiPasto ( 168263 ) on Friday November 10, 2000 @06:13AM (#632568) Journal
    Man I thought the long IP http://2034890234890294 thing was annoying... now I won't be able to make sense of *anything* in their damn spam. Oh well... another clue to hit delete.

    ----

  • It's nice to see that the global part of the Internet is still spreading...

    No, it's not. This is one of the most brain dead decisions ever made, in the name of political correctness, with complete disregard for the practical issues. The effect of this will be to reduce the global appeal of the web, not increase it. Western surfers will now effectively be cut off from many far Eastern domains. Sure, there's a reasonable workaround for entering non-ASCII domains on an ASCII keyboard, but it's too complex for the general public, and far Eastern companies are unlikely to publish the ASCII-fied domain anyway. This is a very black day for the net...

  • Characters are sorted according to the Japanese alphabet ordering (Unicode uses random ordering)

    No, the Unicode hiragana/katakana ranges are ordered in standard Japanese ordering, and the kanji in the CJK range is ordered in Chinese dictionary order (radical first, then stroke count). You do know that kanji means Chinese characters, right? It's not unreasonable to order them the Chinese way.

    In IE or Netscape, look under the encoding menu. You will find 3 choices; Shift-JIS, JIS, and EUC.

    Well, I also find Unicode (UTF-8) in IE, and both Unicode (UTF-7) and Unicode (UTF-8) in Netscape. You need to realize that Unicode is for displaying all languages, not just Japanese.

    Most Japanese experts on this subject view Unicode as an unwanted Western imposition.

    True... also known as "Not Invented Here".

  • There is a technical whitepaper on the Verisign site... just look above...
  • It's a shame that this happened now, instead of 5 years ago. I bet if I had spent the last 5 years on a net with Asian characters in their domain names, I would've learned more than a few words in the language just from exposure. (The only real way to learn a language, imo.)
  • How are you supposed to be able to type all 40,000+ new characters? Are we going back to Escape-Meta-Alt-Shift for an upper case 'Q'?

    Kierthos
  • the commies tried with pinyin but it doesn't work very well because of the many homophores in chinese. hanzis are much cooler anyway and a more compact way of writing and representing data.
  • Don't they have glyph composite symbols or something?
  • So will we have to extend ASCII to 65,536 from 256? Will legacy Japanese URLs look like "http://%0077%0077%0077.%0073%006F%006e%0075.%0063 $006F.006A%0070/"?

    And what will the new ones look like to us Americans? Ugh, I can't bear to think of it.

  • ... to see clueless news readers reading out a URL with all these characters in it ;)
  • by Yardley ( 135408 ) on Friday November 10, 2000 @06:42AM (#632578) Homepage
    This is probably an attempt to force migration over to Unicode. Anyways, why is Verisign behind this? Didn't we learn from Network Solutions that a privately-owned, commercial company is not the solution to internet domain name databases (and their "ownership")?

    How can one company be granted the monopoly rights to something so important to the world's economy and everyone on the Internet again? Should this be assigned to a not-for-profit entity under the auspices of ICANN?

    --
  • This is fundamentally a good idea for the future. It's also a prime example of the marketeers making decisions that the technology is not yet ready to support. My understanding is they're basically telling people that "we'll take your money and register your name, but if you can't use it (and you can't, for some time yet), you don't get your money back." Foo.
  • Since the majority of chinese users input their chinese as big5, (eg www.ê.com) will not be the same as the unicode equivalent

    I think it's probably not too difficult for the Chinese browsers to do the conversion behind the scenes. Kinda like ASCIIEBCDIC conversions; you don't need to change the keyboard to enter text of the other variety.

    Now, which one does the registrar accept, and the DNS servers cache? Read the article? From the first couple pages, it appeared that the domain name is actually not in Unicode nor Big5; it's translated to an ugly ASCII encoding.

  • Unless you want to register domain names in Klingon.

    Michael Everson of Everson Gunn Teoranta has proposed an encoding of Klingon in Plane 1 of ISO/IEC 10646-2 [dkuug.dk]; if it gets adopted, future versions of Unicode may adopt it (Everson's one of the editors and authors of Unicode 3.0).

  • Why do fuckwits hide behind AC?

    Wide guage trains physically cannot come to _CENTRAL_ Eurpoe, where the 6" narrower guage is used.
    However, I can hop on a wide guage train here in Helsinki which goes all the way to Moscow.
    You see not all of Europe is CENTRAL Europe.
    I'm sure you'd agree that not all of America is Central America. Screw it, I don't need your agreement, your opinion is less than worthless.

    Now safe me the fucking effort and go kill yourself.

    FatPhil
  • Hmmm, the Chinese on the menus at my local Chinese resautant in Cambridge took up about 4 times th space of the English. The characters had to be twice the height as well as wider than the Latin characters due to resolution issues. Maybe they were just being more descriptive, but they seemed to have the same redundancy in them as the English, so I assumed they were in exact equivalence.

    I can't agree with your "basic rule of information". I can see nothing about it in my copy of Cover and Thomas. Kolmogorov or Chaitin have stuff to say about this kind of thing.

    FatPhil
  • Most (I would say all, but I'm not entirely certain of that) have roman alphabet representations, usually without using accents, umlauts, or what have you. So, they can represent their languages in urls, just a less commonly used form. German, Spanish, French, etc, often have words that, stripped of special characters, are written identically. On top of this, it's relatively easy to write special roman-alphabet characters on a QWERTY keypad (I managed to figure it out through trial and error), but quite difficult to type asian characters, so asian character urls will serve to make the Internet more regional.

    I have occasion to buy an international airline ticket this year, and I refuse to use priceline because they have Will Shitner doing their ads. Give me Nemoy, Stewart, Dorn, Spiner, McFadden, anyone but shitner. Blow me priceline.

    Man, you have got some real problems, don't you? Did Shatner beat you as a child or something? I mean, I'm not crazy about Troi, but it's not like I carry some kind of grudge. And you manually typed in a .sig as an anonymous coward? That's just weird.

  • So what your saying is that it's ok for non-english speaking people to try and use our ASCII system but totally wrong and inappropriate for them to have their own native language system and for us to to try and learn how to use that? It would seem you embrace the global village idea... providing it is english speaking and conforms to your native character set.


    --
  • So, the era when humans could remember an easy, pronouncable name instead of an IP address is over, then?

    I guess I better start learning the numbers. . .


    ---
  • The proposal includes umlauts - it's based on a mapping to US-ASCII from any Unicode string. (Admittedly if you only wanted to represent a handful of European languages you'd come up with a different scheme, but it would obviously be less general.)

    Presumably they're pitching it at the asian market cos that's where they expect to make money.

    There are apparently good reasons for not allowing 8-bit characters not in US-ASCII in domain names - it would break too much.
  • Asia Carrera, and she runs Linux.

    I think she runs Solaris now. *sigh* a pornstar after my own heart

  • As far as I know the S-Zet is still used, but not in the words where it can be replaced with "ss" without affecting the pronounciation.

    I.e. Gie/3er remains Gie/3er, but da/3 becomes dass.

  • Wouldnt it make more sense to implement umlauts like ö/ü/ä first?
    It's been there for a while, please visit www.whats.nu [whats.nu] for details.

    // Klaus
    --

  • It's possible that the menus you mention say more descriptive or fancy stuff in Chinese than in English.

    e.g. Steamed fish vs steamed red snapper in soy sauce ;).

    As for the spoken language, chinese is actually easier for human ears in noisy channels/environments than english because you can detect the changes in pitch. Whereas in english much of the pitch component is "wasted".

    Cheerio,
    Link.
  • Hot Asian teens? I didn't know there were any. Well, maybe if you're a latent homosexual who likes flat chested, smush faced girls.

    Asia Carrera,

    and she runs Linux.
  • I think there's a company in Richmond WA that claims to make one...
  • The current memo (pre RFC) from the Network Working Group can be found at: ftp://ftp.isi.edu/in-notes/rfc2825.txt where it clearly states that the matter of UTF-8 names are solely up to the IETF. (second paragraph of the Abstract section)

    The IETF draft (clearly not an RFC) on the matter, dated 28 June 2000 can be found at: http://www.i-d-n.net/draft/draft-ietf-idn-requirem ents-03.txt

    The remaining questions are a) NSI has no control over the TLD for each respective character set, so why are they offering these? b) why are they polluting the .com, .net, and .org TLDs? c) if you already own "wine.com", does this mean they're willing to give the UTF-8 translation to Joe Blow so he can hijack all your asian client and ruin your otherwise good name?

    Clearly this is not well thought out at all.

    Please peruse this: http://www.emarketer.com/enews/reuters/11_09_2000. rwntz-story-bcnetinterlanguagedc.html?re f=dn and come up with your own conclusing as to the real reason why. (hint: third paragraph)

  • If I remember correctly, it do NOT allow special chars in the domainnames.

    Damn you're quick. Of course the whole point of this is to provide a work-around to that problem. All it does is make an ASCII representation of a different character set. These representations are flagged by having the hostname start with bq-. So if you run across a hostname that looks like bq-safjdlfaqwue72819.bq-hewaguifuifdajhks.co.jp you'll know that the hostname probably makes good sense to anyone who has a Japaneese web browser. If you are in the habit of reading such pages you'll get the appropriate plugin. If you don't have the plugin, you probably couldn't read the content anyway and believe you me, there is a LOT of content on the web that's written in a language you can't read. (I'm not saying that you're stupid or anything, I'm just making the bet that there isn't anyone here who knows every language in which material has been posted to the internet, this includes Klingon)
    _____________

  • I know Al Gore invented the internet in terms of convincing Congress to heavily fund the net... and some other congressman opened up the net to commercial use... but wasn't the web invented in Europe (CERN)? and aren't domain names a big part of the web? does that mean that if we try to keep asian hordes away from the net, the europeans will try to keep the crass american lummocks from using the web? ^^;; -confused
    --
    Peace,
    Lord Omlette
    ICQ# 77863057
  • by truthsearch ( 249536 ) on Friday November 10, 2000 @06:44AM (#632597) Homepage Journal
    My Chinese co-worker has informed me that to type Chinese, he sets the desired language in whatever app to Chinese and then types phonetically. The problem is that even phonetically there are many similar words, so he basically types a few English letters to verbally spell out a word, then Chinese characters appear on the screen which he must then choose. He tells me there are also special keyboards where you hold down multiple keys.
  • by BJH ( 11355 ) on Friday November 10, 2000 @06:45AM (#632598)
    Kanji are usually input under Linux with kinput2 (although Netscape has always had a few... problems... in dealing with them). Luckily, Mozilla is much better in this respect.
    Some programs, like Emacs, communicate directly with the Japanese conversion server (canna, Wnn[4|6], ATOK, etc.), but there are very few apps which can do this.

  • The web was not invented by americans, even if you like to belive that :p
    It has been invented at the CERN in Swizzerland .. you know thats in Europa

  • ummm... DNS is only used in name resolution, packets are routed according to the IP address once resolved which is totally unrelated to the domain name - that happens right now - nothing has changed.

    If anything extending the number of TLD's will reduce latency as it will spread the load accross more servers probably on a geographical basis!

    feel free to troll its your god given right, but do try to remember that acting both jingoistic and technically ignorant in the same mail is very unlikely to get you any respect.
  • Most CJK-capable computers use a pretty standard QWERTY layout...

  • So that they can centralise more power to themselves.

    Verisign owns Network Solutions and Thawte.

    So they own your certs (need to be renewed) and your names (refer to Network Solutions' terms and conditions).

    And there's this push for DNSSEC, which isn't that great anyway. But it'll be a convenient tool to centralise even more power.

    Open your eyes a bit and you'll see more scary stuff.

    Soon there'll be a bigger push for certificates becoming mainstream - via smartcards and other stuff. And Windows 2000 has some nice support for that... Maybe Microsoft will buy Verisign.

    What do you think?

    Have fun,
    Link.

  • There are a couple of Japanese domain names I've thought of purchasing, but would rather use the CORE registrar joker.com than register.com due to the difference in price (joker.com is around $8-11 per year, depending on the exchange rate of the Euro). I was sad to see that I'd have to use register.com and spend $20 for a Japanese domain name.

    But now what's to stop me from looking through the RFC, figuring out how to encode my domain name using RACE, and then registering it using joker.com as a domain name that begins with "bq-"?
  • by whaley ( 6071 )
    The RACE draft says that:

    - Host parts that have no international characters are not changed.
    so it should not be possible to RACE-encode a domain name in order to hijack it.

    Ofcourse its still possible to describe slash dot in Chinese and register that name :)

    See also
    http://www.i-d-n.net/draft/draft-ietf-idn-race-0 2.txt
  • I'm gonna get Släshdöt.org. It has a more "heavy metal" feel to it, like "Mötley Crüe".

    -B
  • On a Windows system at least with a Chinese Input Method Editor (who the hell thought of that TLA?), I *think* it will in fact display in native characters due to the IME taking control of the text input fields and clobbering the rest of the OS with a blunt stick.

    Have to try that one when I get to work tomorrow.

    (NJStar's not bad as IME's go - at least it's not a Microsoft product)

  • What'll stop you doing that is that the prefix will change, and your domain will be left out in the cold.

    This has already been tried - stories were doing the rounds last week of registrars doing this. When bq- changes, they'll have some very annoyed customers.

  • So what your saying is that it's ok for non-english speaking people to try and use our ASCII system but totally wrong and inappropriate for them to have their own native language system and for us to to try and learn how to use that?

    Yes, that's *exactly* what I'm saying. I'm not saying it because I happen to use ASCII, but because ASCII is a more natural system for computers to deal with. If Western European and American languages consisted of 30000+ characters, and those in the the East consisted of some 100 or so, I'd suggest using the Eastern system at the drop of a hat, even if it wasn't my native system. This has nothing to do with whether or not it's my native character set that's chosen, and everything to do with whether a good decision is made from a techincal perspective.

  • I want to be able to register domain names in French, German and Russian too. If they are going to support all three zillion kanji and Chinese characters, they need to at least support the various Cyrillic and eastern European Roman alphabets, and the rest of ISO-Latin-1 (which covers all the major and most of the minor Western European languages.) The Persian-based alphabets (Arabic, Farsi, Urdu, etc), Hebrew and Thai are written right-to-left, so I suppose that won't be implimented right away, but it needs to be on the drawing board.

    If all those other languages are accounted for, I view this as a good thing. If this is part of an overall shift to Unicode on the web, then all these languages are automatically supported, and I would think it an even better thing.
  • I'm not familiar with Chinese, but I am studying Japanese writing, and if Chinese has the same general system, then it may be all phonetics, in which case it will probably take the same or more room than the roman languages to write out.

    --
    EFF Member #11254

  • Though what you say is true, it would still be interesting to see how they deal with the fact that, say, Japanese character sets provide for full-width alphanumeric characters, which, although they look the same as A,B,C,etc... except for their width, have a different encoding.
    In addition, there's the inherent difficulty in the fact that a Chinese website using a Simplified Chinese set of ideographs could hijack surfers wanting to go to a site with the same name, but with Traditional Chinese ideographs.
    In Japanese, there are hiragana, katakana, and kanji. The first two are phonetic alphabets, and the third is an ideographic alphabet based on Chinese characters. Generally, input methods convert from the first to the second, often selectively, so difficult ideographs are replaced with simpler phonetic symbols, though the meaning remains. One word could have lots of representations, and still mean (and read) the same!
    These issues should have been thought out before NSI started this idiocy.
  • or in a url (using directories)

    the domain name wouldn't work though, they are talking alphabetic symbols rather than length of domain name, i.e. you have the existing English alphabet of 26 letters + 10 numbers & - for _each_ char of the allowed 67, now you can also use ascii encodings of asian characters as well.
  • Actually, most of my spam is from Asian top-levels (mostly cn) and in some CJK encoding. (Not being able to read it, I don't know if it's _really_ US spam in a foreign language, but....)

    Furthermore, much of that spam comes through the same set of systems which never seem to do anything about it.
  • by Speare ( 84249 ) on Friday November 10, 2000 @06:47AM (#632614) Homepage Journal

    The rp is a variable. The first couple pages notes that the implementation-testers should assume that the "RACE Prefix," or rp, should be "bq-".

  • Well, I know that some of the Oriental 'alphabets' have numerous different ways to represent the same concepts, but hwo would using glyph composite symbols help (if I understand what you mean)?

    Just because there exists in a language two symbols 'blah' and 'thingy' so that 'blahthingy' means something else doesn't mean that this standard will adopt it. It's much more likely to use the 'common' kanji. (Ob note: There's only about 50 different Japanese characters for dragon from a quick search on lycos... or some really poor kanji writers).

    That being said, it would be impossible to set up all possible configurations where composite symbols would redirect to the 'obvious' site. (i.e. www.golddragon.com, no matter how it's spelled in kanji or whatever would not necessarily all go to the same site.) It would be a neat trick if it could, but it would require registering dozens of permutations.

    Kierthos
  • ö/ü/ä

    argh, i want [mozilla.org] it to be easier to tell urls apart from each other, not harder.

    --

  • I think the point of the original quote of 37 characters max is the 'old' number of characters in the symbol set that were allowed, not the length of the actual URL. And your article from 2600 lists a maximum URL length of 63, not 67.

    BTW, are hyphens and tildes inter-changeable? Because I've seen a lot of web-pages with tildes, and only some of them turn into hyphens when reloading.

    Kierthos
  • There was such a thing as a Chinese Typewriter. It had 300 keys and required multiple presses (Shift, Ctrl, Meta, Alt, Hyper etc. style)
    to generate characters.

    This is a really crap picture of one:

    http://acc6.its.brooklyn.cuny.edu/~phalsall/imag es/typewrit.gif

    So many keys each one is barely distinguishable from the next (that's also poor photo quality though)

    If fell into disuse fairly swiftly because it was slower than script.

    Our typewriters were invented so that they could be faster than script.

    They lose.

    FatPhil
  • Within a few minutes of this story being posted, most of the posts are along the following lines.

    • Why not get European hacks like uumlauts working first?
      I dunno; maybe because the Japanese don't know enough German? Why should the Asians wait for Europe to get its act together before they solve the issues they face every day?
    • Great, now I have to see even more ugly spam!
      Well, if your only connection to the Asian population is spam email, this should make your isolationism even more simple: the standard uses a standard prefix for RACE-encoded domain names; block those and you're in arrogant English/USian bliss.
    • How can I enter these funky characters?
      I dunno, just a guess, but maybe someone's already thought of this? Perhaps the people who work in kanji all day know something about entering kanji, and have hardware or software solutions around. If you don't normally have to type it, I'm sure your browser will let you CLICK on encoded links just fine.

    Missed anything?

  • If it's implemented properly, surely it shouldn't matter. It's not just size of the Unicode chars, but also the big and little endian-ness. If it's implemented properly, the DNS would just determine what you're using (UCS-2BE, UCS-2LE, UCS-4BE. UCS-4LE) and convert it to it's internal representation for the lookup.
  • I guess I can look at this two ways...

    1) Oh God, there's gonna be a MASSIVE amount of spam coming from domains with characters outside of the standard 37.

    2) I can block anything and everything coming from domains with characters outside of the standard 37.

    -S
  • Gad. We should just say that "bytes with the high bit set must be sent unchanged" through everything and scrap everything that does not obey this.

    This would allow all transports to ignore the character encoding as long as the encoding only uses bytes with the high bit for non-ascii. It also means that case-independence of non-ascii would be illegal, thus stopping the emergence of a dangerous (for security) mess of incompatable implementations of equality tests for URLs.

    This would allow us to use UTF-8 for the URL, for the page contents, for email, for everything, and we would not have this horrid mess of prefixes and mime types.

    Yes, some programs, routers, etc, would not pass this stuff through. Well, tough, those should be obsolete!

  • According to the artivle, they're working on a substitution scheme so ASCII only users can still type in the URL's. Does this mean that ASCII equivalets will be arbitrary and unintuitive? If so, that's a problem. Let me propose something slightly different:

    Unicode is not supposed to over-unify characters, so the ASCII fallback for Japanese could be the romanji transcription - and therefor registering a Japanese domain name automatically registers the romanji equivalent, except that some kanji have more than one possible romanji transcription.

    However, some kanji are unified with Chinese characters, which have a different pinyin trasncription.

    Chinese is another problem. The logical ASCII equivalent is pinyin stripped of its diacritical marks. But then, many different characters may have the same transcription.

    All Cyrillic languages also have an ASCII trasncription scheme too, but it isn't unified. One character may be trasncribed one way in Russian and another way in Bulgarian. Is there a unified transcription scheme for all Cyrillic languages, and is it truely one-to-one? I don't think so. Look at the character usually transcribed as "j" in Russian, and the one usually transcribed that way in Serbian.

    ISO-Latin-1 and -2 fallbacks: For ISO-Latin-1, the fallbacks are pretty obvious: "Champs-Élysée" ==> "Champs-Elysee" or in German "Düsseldorf" ==> "Duesseldorf", but in Czech it's a little less obvious. Does "C hacek" map to "Cz" or "Ch" or "Cs"?

    So, here is a possible solution: devise unified ASCII transcritption schemes for each language, admitting whatever ambiguities exist in Japanese or similar languages. Then, when you register a non-ASCII name, you are asked on the form to fill out the transcribed ASCII name that corresponds to it and it is also automatically registered to you.

    There is some potential for conflict here, if the ASCII transcription corresponds to an existing registered domain or, as in the case of Chinese more than one foreign name corresponds to the same transcription, but I think the problem is manageable.
  • It's easy under Windows. For everything but Win2K (and ME?) you will have to download and install Global IME from MSFT. I don't know how you do this under X, or for Lynx users, in a console. I have to admit, MSFT makes it quite easy for us developers to internationalise our products.
  • by Speare ( 84249 ) on Friday November 10, 2000 @06:52AM (#632634) Homepage Journal

    So how's this gonna work for systems not set up to handle the asian character set?

    Read the links.

    The proposal implements an ASCII encoding scheme, called RACE. A certain prefix (they list the debugging prefix as "bq-") indicates a RACE-encoded domain name.

    The rest of the ASCII encoding either appears in ASCII for dumb browsers, or is converted to Unicode or Big5 or whatever character set it wants.

    For "dumb browsers" (not a flame, just an indication of character-set-awareness), you'd see some crazy domain like http://www.bq-ag0970ag00ah07h.or.jp/; for "smart browsers," it would appear in your own kanji font.

  • Has there been an update to the DNS RFC allowing this? If I remember correctly, it do NOT allow special chars in the domainnames.

    Furthermore, does this limit those domains to 32 chars of length? (unicode, 2 bytes per char, dns system allows a maximum of 64 chars for domainnames .. but, that should probably be interpreted as bytes).

    Also, doesn't it kinda suck to make large parts of the net unavailable for most?

    --paddy
    --
  • by Smuj ( 249217 ) on Friday November 10, 2000 @07:02AM (#632636)
    A few notes...

    The Internet Society [isoc.org] probably isn't too happy about this. They released a statement [isoc.org] on November 8th encouraging NSI to back off and let the IETF [ietf.org] IDN WG [i-d-n.net] do its job.

    Also, there are companies that are already currently operating in this market, including WALID [walid.com], which is taking registrations for Arabic domain names (AND RESOLVING THEM), and will soon be adding Hindi, Tamil, and two Chinese scripts before moving into other markets.
  • Because the Mediteranneans figured out that if they came up with simple symbols that represented sounds (an alphabet) and could be strung together to transcribe those spoken words instead of sepeate ideograms for each spoken word, you could not only learn to read and write much more easily you could also write down other languages with the same written symbols.

    One of the major reasons this happened was there was they were trading with different peoples who used ideograms instead of alphabets. Since learning one ideogrammatic written language is hard enough and learning 5 is a single lifetime's achievment, a simpler way was found.

    The Chinese were heterogenous and didn't need to deal with anyone other than the Chinese and hence kept their ideogrammatic written language.

    It's a simple fact that it's far easier to implement the Roman alphabet on a computer than a zillion independant symbols -- you need less RAM, simpler displays and so on.

    What the Chinese need to do is settle on a single way to transliterate spoken Chinese into the Roman alphabet (or even the Cyrillic, Hebraic or Greek if that's what they want). Ideograms are neat, but they're a pain in the ass.

    Sorry, it's not cultural imperalism, just pragmatism!
  • by FigWig ( 10981 ) on Friday November 10, 2000 @07:03AM (#632640) Homepage
    Wouldnt it make more sense to implement umlauts like ö/ü/ä first?

    I have dibs on släshdot.org!!

  • by dizee ( 143832 )
    w3m, the console web browser that can format tables, frames, etc, was written by Akinori Ito. He includes support for kanji. I know because there is a #ifdef PC_KANJI that is misplaced every time I go to download and compile it without japanese character support.

    I believe there is also a xterm counterpart for kanji.

    Mike

    "I would kill everyone in this room for a drop of sweet beer."
  • It's called evolution. Things weren't implemented properly the first time. Now we're correcting that. A lot of modern computing was invented in English speaking countries, it's hardly any wonder our systems can't cater for the rest of the world. It seems rather unfair to put them at a disadvantage. Besides, they will eventually force a change, and we don't want incompatibilities now, so we? Personally, I can't wait for everybody to move to Unicode - it will make life as a software developer easier.
  • by jbert ( 5149 ) on Friday November 10, 2000 @07:04AM (#632643)
    Hmm. This could lead to fun. Some character sets/character encodings allow different byte sequences to map to the same character.
    (See the Unicode bugs recently in IIS, where a unicode representation of '../' is used to navigate upwards in the directories of the server to view files outside of the server root.)
    Now, does a company have to register all possible permutations of byte sequences which all map to the same character sequence? As well as doing so in .com, .net and .org.
    We'll see.
  • by tomjgroves ( 236290 ) on Friday November 10, 2000 @06:16AM (#632645)
    So how's this gonna work for systems not set up to handle the asian character set? Lets say I want to send to joe.bloggs@somechinesename.net from my FBSD or Linux boxes? Not too much fun, I think...
  • This would be great for China, if half (if not all) it's mail servers didn't relay spam back to the US (and therefore be blocked independently by ISP's and by the MAPS [mail-abuse.org] RSS). There's been no responce out of those admins who don't have the latest software (comeon! Sendmail 8.10 is free! Why are you running the broken SMI Sendmail?!?).



    --
    WolfSkunks for a better Linux Kernel
    $Stalag99{"URL"}="http://stalag99.keenspace.com";

  • by Ashran ( 107876 ) on Friday November 10, 2000 @06:17AM (#632648) Homepage
    Wouldnt it make more sense to implement umlauts like ö/ü/ä first?
    Easier to test etc..
  • by Giant Robot ( 56744 ) on Friday November 10, 2000 @06:19AM (#632650) Homepage
    How is this going to work? Since the majority of chinese users input their chinese as big5,
    (eg www.ê.com) will not be the same as the unicode equivalent..

  • The general FAQ [verisign-grs.com] answers how the names will appear in a web browser, but they use a GIF to show the Chinese name. So I'm still wondering how it will look to someone without an OS that displays the characters properly. Never mind that you can download extensions to display the content in the web browser; the location will be garbage, right?

    Will this be a good kick in the butt for internationalization of your OS?
  • Though what you say is true, it would still be interesting to see how they deal with the fact that, say, Japanese character sets provide for full-width alphanumeric characters, which, although they look the same as A,B,C,etc... except for their width, have a different encoding.

    True, they say that any name part consisting entirely of USASCII characters are not allowed to be encoded this way, but they would have to go out of their way if they wanted to ensure that double-wide SJIS romaji were not confusingly registered. Then again, we can already do "s1ashdot.org" with just plain ASCII.

    In addition, there's the inherent difficulty in the fact that a Chinese website using a Simplified Chinese set of ideographs could hijack surfers wanting to go to a site with the same name, but with Traditional Chinese ideographs.

    IIRC, in Unicode, Chinese and Japanese ideographs all map to the same code if they're basically the same character, with the differences considered font-specific. In the extreme case, one common radical is rendered with one less stroke in Japanese, which could have created hundreds of extra codes.

    Most simplified kanji/hanzi should be unique, but a few, at least in Japanese, use an already existing, more common character. Generally, though, this won't be a problem if Unicode is used.

  • by truthsearch ( 249536 ) on Friday November 10, 2000 @06:54AM (#632655) Homepage Journal
    Kind of ironic the algorithm is called RACE, isn't it? Can we filter by RACE? Can we browse domains of only a certain RACE? Can it be enhanced with RACISM, Row-based ASCII Compatible Interface for Stereotyping Mayhem?
  • Such an authoritarian title. Are you sure? It proposes ASCII encoding, not a Unicode or other mbcs usage directly.

    Also, doesn't it kinda suck to make large parts of the net unavailable for most? Don't you think the Chinese and Japanese people could say the same thing about English?

  • I'm surprised it took so long for somebody to do this. I don't relish trying to learn a whole new set of shortcuts (my grasp of the 255 odd ASCII set is slipping fast, never mind kanji!). I did a story about this yesterday called over at http://www.t3.co.uk. It's nice to see that the global part of the Internet is still spreading...
  • It is easy, you use CXterm, a special program developed to input chinese under X. And there are a number of other programs you can use to input chinese under UNIX's console mode as well.

  • Isn't it odd that the acronym for the encode scheme of asian domains is called RACE? Who's in charge over there at Verisign, the Klu Klux Klan?
  • by Fjord ( 99230 )
    I noticed a promotion for this on networksolutions website a week or two ago. I think that this is great, but we need TLDs in these characers as well, one with the chinese character for commercial, one for organization, one for educational. I wonder if that new TLD system that they are testing will allow these characters. For 50,000, you could register one of these Chinese TLDs and probably make a lot of money.

Saliva causes cancer, but only if swallowed in small amounts over a long period of time. -- George Carlin

Working...