Viewing By Category : G11N / Main
April 28, 2007
g11n chapter: anything else need covering?
in case you missed it, there's a new edition of the ever popular cf wack (ColdFusion Web Application Construction Kit) on the way. and in case you don't already know, i handled the chapter on cf & globalization (g11n).

given the timezone hell i recently passed through and the recent US and Australia DST changes, i plan on beefing up the section on timezones. and in keeping w/ben's idea to slim things down, we'll be pushing most of those "boring" locale table comparisons out to on-line appendices. might also add a wee bit on using flex in g11n cf apps.

so before i really begin the excruciating process of revising that chapter, i'm looking for feedback on it. anything missing? anything not too clear? you can respond here or simply email me with your suggestions.

thanks.

October 24, 2005
g11n gotchas
a couple-three emails i got recently prompted me to think (again) about what globalization means to the average coldfusion developer. coincidentally mark davis, IBM's front man for g11n and president of the Unicode Consortium, is putting together a presentation for the next Unicode conference dealing with "Globalization Gotchas". i highly recommend cf developers doing i18n/g11n work to review these, it's certainly worth the effort.

among my favorites that apply in one way or another to coldfusion (i've yakked about these in various articles/books/blog entries but good stuff usually bears repeating):

  • Unicode encodes characters, not glyphs: U+0067 » ggggggg
  • Unicode does not encode characters by language: French, German, English j have the same code point even though all have different pronunciations; Chinese 大 (da) has the same code point as Japanese 大 (dai).
  • Length in bytes may not be N * length in characters
  • Not all text is correctly tagged with its charset, so character detection may be necessary. But remember, it's always a guess.
  • Use properties such as Alphabetic, not hard-coded lists: isAlphabetic(), /p{Alphabetic} in regex
  • Transliteration (Ελληνικά ↔ Ellēniká) is not the same as Translation (Ελληνικά ↔ Greek)--users of my transliteration CFC please take note
  • Unicode ≠ Globalization. Unicode provides the basis for software globalization, but there's more work to be done...
  • Don't simply concatenate strings to make messages: the order of components different by language. Use Java MessageFormat or equivalent. (like the rbJava or javaRv CFCs)
  • Don't put any translatable strings into your code; make sure those are separated into a resource file.
  • Don't assume everyone can read the Latin alphabet. Don't assume icons and symbols mean the same around the world.
  • Tag all data explicitly. Trying to algorithmically determine character encoding and language isn't easy, and can never be exact.
  • Formatting and parsing of dates, times, numbers, currencies, ... are locale-dependent. Use globalization APIs that use appropriate data.
  • If you heuristically compute territory IDs, timezone IDs, currency IDs, etc. make sure the user can override that and pick an explicit value. (ie be automagical about locale choice, etc. but allow the user to manually pick what they want)
  • Don't assume the timezone ID is implied by the user's locale. For the best timezone information, use the TZ database; use CLDR for timezone names.
  • Java globalization support is pretty outdated: use ICU to supplement it. (cf developers should use ICU4J)

April 28, 2005
new sun i18n content
sun has released the latest version of its eGADC Newsletter for folks "who want to know about the latest internationalization and localization developments at Sun". among the more interesting content: you can find sun's g11n site here. and if you're so inclined, you can subscribe to the newsletter here.

October 22, 2004
new i18n w3c faq
if you want to know how the W3C defines g11n, i18n, and l10n have a look at this. it was prepared by susan k. miller over at Boeing.

but you already know all that....

May 20, 2004
my tools too
a few days ago, sean c. blogged about the tools he was using, which finally prompted me to blog this "me too tools". the g11n world is slightly different in that a "tool" is more often than not a place to find information than a chunk of software. with that in mind here's my tool list too:
  • icu4j: i literally couldn't do g11n work without this java library. while much of its pioneering i18n functionality has been absorbed into the java core, it still offers hard/impossible-to-duplicate functionality like non-gregorian calendars, holidays & super-sized collations. it is the bee's knees of i18n s/w. and of course, its free.
  • unicode: after watching folks' codepage encoding antics in the user forums, what can i say, just use unicode ©.
  • Common Locale Data Repository: while still in beta, the CLDR is going to be the locale reference. it was thought to be so important that its maintainence was handed-off to the unicode organization by the openi18n org. need to know the currency used in Thailand? short weekday names used Turkish? writing system direction in Afghanistan? this repository is the place to look first. all the info is contained in an XML file per locale (not that i enjoy parsing XML files but i can put up with that chore for the goldmine of locale info it provides).
  • rbManager: if you do g11n work, you build resource bundles (well you should be doing this anyway). if you build resource bundles (rb), then you need a tool. i've looked at and played around with a bunch of rb tools & still haven't found anything as easy to use or as sophisticated as rbmanager, the price (free) is pretty good too. i18nEdit gets an honorable mention for its nifty unicode char picker for those days when you're too lazy to load another locale.
  • SC UniPad: need a unicode text editor that can handle inuktitut and brail at the sametime? look no futher than the plenty fine SC UniPad. i get a kick out of just using it. also a nice tool to double check rb files.
  • unifier: if you have to batch convert text/html docs from codepage encodings to unicode (and who doesn't) this will probably be the best 15 bucks you'll ever spend.
  • javaInetLocator: i built my geoLocator CFC around nigel wetter's javainetlocator class. if you need to know the country and locale of a user (well their IP anyway), this is probably the best non-commercial tool around (and i can say its probably better than many commercial ones i've looked). its fast (i have another geoLocator tool built around db-based IP range queries and nigel's class beats the pants & socks off of it) and free.
  • iText: i've used this java library quite a bit to burn PDFs. it offers really fine control that we often need (municipal tax receipts for instance) & is a piece of cake to use.
  • cfstudio 5: what can i say, i'm old and in the way. while my colleagues laugh that i still use this "antique", i keep remnding them that muscle memory means more and more as you get older (i've literally pounded the alt f & s keys off of several keyboards over the years while i still have the same industrial-strength ms mouse for almost 10 years). and nope, no reference as i couldn't for the life of me tell you where to buy this these days. that said, i'm trying to give cfEclipse a fair trail (it would help a whole bunch though if it had better docs, hint hint spike).
  • java i18n forums: while i don't spend much time there these days, these forums are still a valuable i18n info source. if you do serious i18n work with cf, you know you have to dip down into java quite a bit and if you get stumped as much as i did, these forums are often a life saver. another good java library/info site is of course IBM's developer works. just a for instance, i wanted to learn how to do i18n string searchs & "Efficient text searching in java" turns up (yes that article is a bit dated).
  • books-on-line (BoL): i do a lot of work with ms sql server (frankly i prefer it) and the BoL has come to be my constant companion (my cat neutron uses the pile of sql books i've bought over the years as a spot to cat nap--speaking of cats i still get a great kick out of the my cat hates you site). you really can't to better than this for an ms sql server reference.

May 5, 2004
the MAT cometh...
microsoft, it seems is lending i18n app developers a helping hand, at least apps on xp and 2003 OS's. ms has just anounced a beta for MAT. what's MAT? to quote the public site "Microsoft Application Translator (MAT) provides on-the-fly translation of applications' User Interface (UI) from one language to another. Using MAT, you can run applications in your preferred language". in simple terms this means if you develop a desktop app in thai, you could translate it to arabic with MAT.

at first glance it looks like it just does text localization, which while not the only part of i18n work, it is however the dreariest. MAT also really won't help apps that aren't at least somewhat i18n (at least according to the public FAQ). from the public site, i'm not really sure if it does web apps. not sure what smaller localization shops will make of this. it might lose them their marginal/low end business. is nothing safe ;-)

now if we could just get mm to provide native resourceBundle functionality....

March 12, 2004
g11n article on cfdj and Oops
there's an introductory g11n article in the latest coldfusion developers journal issue. might be worth a read. next article up is one on character sets & encoding, a subject i truly do not like.

and oops! in my zeal over the cookie encoding issue posted a few days ago, i failed to doublecheck whether setEncoding function actually works with cookie scope. it doesn't of course. sorry about that. i wonder if it should?

January 7, 2004
new multilingual web application article on sun site
ok so it is a JSP article (just pretend all the tag based code is actually cf ;-) but the article does contain a boatload of content that applies to g11n cf apps equally well. makes good reading.

November 10, 2003
it's official: canadians no longer matter ;-)
well not really but from an advertizing point of view maybe they won't much longer. i often argue that "backyard globalization" is an important point to consider when developing cf applications. if you're not looking to develop fully global apps, at least consider non-english speakers in your "backyard" (let's say it's in the US). well according to this news article (yes, its also a snazzy cf-powered site), hispanics in the US now outnumber canadians in canada. you're looking at a 38.8 million people growing marketplace with an estimated $675 billion annual purchasing power. something to chew over next time you're designing out an application.

you can find some more interesting reading on g11n business aspects here. the article on chinese whispers is particularly cool.

ps: yes i know canada is bilingual and a very compelling case for "backyard globalization" too but i just couldn't resist ;-)