i guess we can expect to see this in JDK 1.6 update 4 (latest update is 2). i wonder if i should just pile all the CLDR vs core java locale differences (there's a lot) into a single java bug report?
- zh_SG - Chinese (Simplified), Singapore
- en_MT - English, Malta
- en_PH - English, Philippines
- en_SG - English, Singapore
- el_CY - Greek, Cyprus
- id_ID - Indonesian, Indonesia
- ga_IE - Irish, Ireland
- ms_MY - Malay, Malaysia
- mt_MT - Maltese, Malta
- pt_BR - Portuguese, Brazil
- pt_PT - Portuguese, Portugal
- es_US - Spanish, United States
hopefully this trend will continue.
beyond the new locale data it also provides support for the Japanese Imperial Calendar which we can tap into for date conversion and formatting simply by setting coldfusion's locale to the new JP variant:
// set appropriate locale
setLocale("ja_JP_JP");
// Japanese Imperial Calendar date format writeoutput("#lsDateFormat(now(),"FULL")#");
</cfscript>
which should give you something like: 平成19年7月26日 how cool is that?
for more details on the new i18n bits in JDK 6 see this.
core java's locale data for en_AU (Australia) and en_NZ (New Zealand) time formats is a bit off. it uses a format of H:mm:ss where the "H" stands for 24 hour clock, ie 5:00 PM would be formatted as 17:00. the CLDR (common locale data repository) however states that the time format for en_Au & en_NZ locales is h:mm:ss a (well actually it's proposed to include the timezone, "h:mm:ss a z" see the en_AU time format entry here). while most users in those locales are smart enough to get that 17:00 is 5:00 PM when your ColdFusion app outputs time values, it would play havoc when ColdFusion tries to parse what those same folks would normally input for a time value.
so hey en_AU and en_NZ locale people, time to start helping yourselves. Sun has accepted this as a new bug, go vote for it (you have to be a member of the Sun Developer Network to vote but these days, who isn't).
- support for Unicode 5.0
- 25% more CLDR locale data in 245 locales in ICU
- a flexible date/time format generator has been added, allowing for multiple date and time format patterns to be generated that are valid for specific locales (sounds interesting)
- under "Globalization Preferences", a new flexible container for locale data was added
- for more charset conversion bang-for-your-buck, a preview of the ICU4J implementation of the java.nio.charset.Charset API was added
addendum: apparently the nifty timezone bits proposed earlier this year didn't make it into this release. too bad, so sad, could have been very useful.
to recap:
- i18n is a zero level goal (that is the project won't leave home without it).
- it will be based on icu4j java library and by based i mean every single i18n function, except some parts of the resource bundle CFC and (probably) the Gregorian calendar will be derived from it.
- besides the basic Gregorian calendar most ColdFusion developers are familiar with, this project will also include Buddhist, Chinese, Japanese, Islamic, and Hebrew calendars to handle that tricky calendar math.
- user centric timezone, users will see datetimes in their individual timezones--and yes, even this functionality will come out of icu4j. by divorcing this functionality from core Java, the project will be able to take advantage of icu4j's more frequent updates.
- locale based collation (sorting).
- strict use of resource bundles (rb), you will be able to l10n skin this puppy, though we haven't 100% decided on the "recommended" rb management tool yet. besides icu4j's rb manager, any ideas?
- standard localized date/numeric/currency formatting, all hail CLDR.
- the project will make use of the super cool JavaLoader in order to load the icu4j from off the server classpath (shared hosts will not be a problem). this also allows for painless updating of the icu4j jar file.
so, have we missed anything? some i18n related functionality we've overlooked? any rb managemnet tool you particularly like? if you have any ideas please submit them here as comments or better yet via the UI preview. we'd really appreciate it. thanks.
for more information on the project see the "BoardFusion News Page" and the Project Wiki.
one of the side effects of this core java locale is that ColdFusion's old locale name Norwegian (Nynorsk) actually produces no_NO locale data. any legacy apps still using this locale identifier are probably telling people the wrong thing, for example:
writeoutput('#lsDateFormat(now(),"DDDD")#');
produces: mandag
while
setLocale('Norwegian (Nynorsk)');
writeoutput('#lsDateFormat(now(),"DDDD")#');
also produces: mandag
icu4j on the otherhand produces:
måndag for nn_NO
mandag for nb_NO
it looks like ColdFusion got tripped up on the "variant instead of language" locale.
taking this a step further, doing a "FULL" date format shows up even larger differences between core java and icu4j:
core java
8. mai 2006 for no_NO
8. mai 2006 for no_NO_NY
icu4j
måndag 8. mai 2006 for nn_NO
mandag 8. mai 2006 for nb_NO
oops. to my way of thinking, a "FULL" date format should include the day name as well as the rest of the date (date in month, month and year). i really wish ColdFusion would use icu4j.
and the "A-Go-Go" reference? nothing to with g11n or ColdFusion, just been listening to a lot of Dengue Fever lately and that song has just stuck in my head ;-)
- a complete set of POSIX-format data generated, along with a tool to generate different platform versions.
- the addition of new data to support localization of timezones
- the addition of data for UN M.49 regions, including continents and region
- the canonicalization (data in many forms converted to a "standard" form) of the data files, including the consolidation of inherited data
- currency codes are restricted to ISO 4217 codes (historical as well)
- number and data tests to verify LDML implementations
- metadata for LDML
- mappings from language to script and territory
- various other fixes and additions of data, and extensions to the specification
for more details see the press blurb and the version information page.
as a reminder, icu4j makes use of the CLDR for it's locale data. hubba hubba.
as usual, you can file any bugs you find here.
on a somewhat "lighter" note, the wikipedia has a page on the "Heavy metal umlaut". it's an interesting read on the use (gratuitous and otherwise) of the umalut (Ä ä Ö ö Ü ü) associated with heavy metal music. and no, i'm not a heavy metal fan, though i guess the led zepplin of my youth might qualify (i can only listen to I to III these days, untitled onwards kind of lost me).
- ICU main page
- library's download page
- ICU documentation page, with the icu4j API docs now here
- icu4j FAQ
- RB manager
- additional docs
on the topic of icu4j, i knocked off a couple of pages to explore it's new ULocales class (after somebody asked me how many new locales for India and i had no idea). i was surprised by the answer.
if that doesn't surprise you, try the United Kingdom or Ethiopia.
and you can now use Java style locale identifiers like ar_AE instead of the "pretty" locale name Arabic (United Arab Emirates), so now it's that much easier to synch up your calls to core Java's ResourceBundle class from cf. and you can buy into all that locale info using the super simple setLocale() function.
of course, as soon as i get what i've asked for after years of asking, i find some new plaything. as you might have read in this blog, icu4j's latest release (3.2) switched to the CLDR's locales, all 232 of them (with 60 more in beta). the graph below compares cf with and without icu4j.
gives you pause, which should i use for locale support? oh my. i'll be revisiting this issue again.
you can pick up the cldr here and read more about it here.
via the unicode mailing list.
in case you're interested, there's also a cldr wiki.
- better documentation for date/number format patterns (one of my favorites)
- added stuff about references/validity/etc.
- new timezone localization model
- weekend data
- added Oriya,Malayalam,Assamese,Welsh,Dzongkha,Bhutan,Khmer and Lao (woohoo se asian) locales
- added more country,language,currency, and type display name data for ar,bg,cs,el,he,hr,hu,is,mk,pl, ro,ru,sk,sl,sr,tr,uk (the arabic stuff is way cool)
read more on the cldr website. you can compare the cldr versus platform data here. and you can report bugs here.
via the unicode mailing list.

