Viewing By Category : CLDR / Main
August 8, 2007
wow, that was fast
i submitted a bug to sun about australian and new zealand time formats being wrong compared to the CLDR on 18-may (CLDR & some common experience says it should be "h:mm:ss a", ie 12 hour AM/PM format, while core java thinks it should be "H:mm:ss", ie 24hr format). according to this (might require login) it was fixed on 21-may--funny thing is that i was only informed via the bug parade just "now" (7-aug). also funny was that it attracted only 1 vote--what you guys down there all asleep?

i guess we can expect to see this in JDK 1.6 update 4 (latest update is 2). i wonder if i should just pile all the CLDR vs core java locale differences (there's a lot) into a single java bug report?

July 26, 2007
scorpio's i18n changes
in case you were wondering, the main i18n changes for scorpio (coldfusion 8) really revolved around upgrading coldfsion's JDK to version 6. what did that buy us? well core Java's first set of locales based on CLDR data:

  • zh_SG - Chinese (Simplified), Singapore
  • en_MT - English, Malta
  • en_PH - English, Philippines
  • en_SG - English, Singapore
  • el_CY - Greek, Cyprus
  • id_ID - Indonesian, Indonesia
  • ga_IE - Irish, Ireland
  • ms_MY - Malay, Malaysia
  • mt_MT - Maltese, Malta
  • pt_BR - Portuguese, Brazil
  • pt_PT - Portuguese, Portugal
  • es_US - Spanish, United States

hopefully this trend will continue.

beyond the new locale data it also provides support for the Japanese Imperial Calendar which we can tap into for date conversion and formatting simply by setting coldfusion's locale to the new JP variant:

<cfscript>
// set appropriate locale
setLocale("ja_JP_JP");
// Japanese Imperial Calendar date format writeoutput("#lsDateFormat(now(),"FULL")#");
</cfscript>

which should give you something like: 平成19年7月26日 how cool is that?

for more details on the new i18n bits in JDK 6 see this.

May 19, 2007
God helps those who help themselves
since it looks like they'll be playing ice hockey in hell before ColdFusion makes use of the very cool icu4j library, i figure we better start helping core java get it's locale resource act together. so lets start somewhere near my neighborhood, australia & new zealand.

core java's locale data for en_AU (Australia) and en_NZ (New Zealand) time formats is a bit off. it uses a format of H:mm:ss where the "H" stands for 24 hour clock, ie 5:00 PM would be formatted as 17:00. the CLDR (common locale data repository) however states that the time format for en_Au & en_NZ locales is h:mm:ss a (well actually it's proposed to include the timezone, "h:mm:ss a z" see the en_AU time format entry here). while most users in those locales are smart enough to get that 17:00 is 5:00 PM when your ColdFusion app outputs time values, it would play havoc when ColdFusion tries to parse what those same folks would normally input for a time value.

so hey en_AU and en_NZ locale people, time to start helping yourselves. Sun has accepted this as a new bug, go vote for it (you have to be a member of the Sun Developer Network to vote but these days, who isn't).

September 9, 2006
icu4j 3.6 hits alpha
the ICU project has announced the release of an alpha version of icu4j 3.6. you can grab this cool java library here. so what's new for 3.6? according to the brief release notes:
  • support for Unicode 5.0
  • 25% more CLDR locale data in 245 locales in ICU
  • a flexible date/time format generator has been added, allowing for multiple date and time format patterns to be generated that are valid for specific locales (sounds interesting)
  • under "Globalization Preferences", a new flexible container for locale data was added
  • for more charset conversion bang-for-your-buck, a preview of the ICU4J implementation of the java.nio.charset.Charset API was added

addendum: apparently the nifty timezone bits proposed earlier this year didn't make it into this release. too bad, so sad, could have been very useful.

May 15, 2006
BoardFusion's i18n bits
just in case you missed it, the BoardFusion project (BF) has released a preview of the user interface (UI). and while the project accumulated a lot of useful comments, i'm posting this to solict some specific i18n feedback before we close the books on the UI preview. so this is kind of the "last gas for 200 km" review.

to recap:

  • i18n is a zero level goal (that is the project won't leave home without it).
  • it will be based on icu4j java library and by based i mean every single i18n function, except some parts of the resource bundle CFC and (probably) the Gregorian calendar will be derived from it.
  • besides the basic Gregorian calendar most ColdFusion developers are familiar with, this project will also include Buddhist, Chinese, Japanese, Islamic, and Hebrew calendars to handle that tricky calendar math.
  • user centric timezone, users will see datetimes in their individual timezones--and yes, even this functionality will come out of icu4j. by divorcing this functionality from core Java, the project will be able to take advantage of icu4j's more frequent updates.
  • locale based collation (sorting).
  • strict use of resource bundles (rb), you will be able to l10n skin this puppy, though we haven't 100% decided on the "recommended" rb management tool yet. besides icu4j's rb manager, any ideas?
  • standard localized date/numeric/currency formatting, all hail CLDR.
  • the project will make use of the super cool JavaLoader in order to load the icu4j from off the server classpath (shared hosts will not be a problem). this also allows for painless updating of the icu4j jar file.

so, have we missed anything? some i18n related functionality we've overlooked? any rb managemnet tool you particularly like? if you have any ideas please submit them here as comments or better yet via the UI preview. we'd really appreciate it. thanks.

for more information on the project see the "BoardFusion News Page" and the Project Wiki.

May 10, 2006
norwegian locale A-Go-Go
some recent work has me again turning over the rocks where core java locales are hiding and once again a closer look at what crawled out reveals just how sweet icu4j's locale support really is. according to several resources, such as ethnologue and the odin archive (gotta love that name), norway has two main written languages Bokmål and Nynorsk, with Bokmål being dominant. in core java there is one (well two if you include the plain norwegian langauge, no) locale and one variant for norway: no_NO and no_NO_NY. assumming core java meant Bokmål for plain Norwegian (no and no_NO), then i suppose the variant (no_NO_NY) is for Nynorsk. huh? but i thought Nynorsk was a language? why is it a variant here? in icu4j, which uses the CLDR for it's locale data, we can see two locales (four if you count the plain nb/Bokmål and nn/Nynorsk languages): nb_NO (Bokmål Norwegian) and nn_NO (Nynorsk Norwegian). neat and tidy.

one of the side effects of this core java locale is that ColdFusion's old locale name Norwegian (Nynorsk) actually produces no_NO locale data. any legacy apps still using this locale identifier are probably telling people the wrong thing, for example:

setLocale('Norwegian (Bokmal)');
writeoutput('#lsDateFormat(now(),"DDDD")#');
produces: mandag

while
setLocale('Norwegian (Nynorsk)');
writeoutput('#lsDateFormat(now(),"DDDD")#');
also produces: mandag

icu4j on the otherhand produces:
måndag for nn_NO
mandag for nb_NO

it looks like ColdFusion got tripped up on the "variant instead of language" locale.

taking this a step further, doing a "FULL" date format shows up even larger differences between core java and icu4j:

core java
8. mai 2006 for no_NO
8. mai 2006 for no_NO_NY

icu4j
måndag 8. mai 2006 for nn_NO
mandag 8. mai 2006 for nb_NO

oops. to my way of thinking, a "FULL" date format should include the day name as well as the rest of the date (date in month, month and year). i really wish ColdFusion would use icu4j.

and the "A-Go-Go" reference? nothing to with g11n or ColdFusion, just been listening to a lot of Dengue Fever lately and that song has just stuck in my head ;-)

June 4, 2005
eat your heart out core java
the unicode consortium has announced the release of version 1.3 of the Common Locale Data Repository (CLDR). this release pumps up the locale data from 230+ to 296 locales (96 languages and 130 territories). this release's highlights include:
  • a complete set of POSIX-format data generated, along with a tool to generate different platform versions.
  • the addition of new data to support localization of timezones
  • the addition of data for UN M.49 regions, including continents and region
  • the canonicalization (data in many forms converted to a "standard" form) of the data files, including the consolidation of inherited data
  • currency codes are restricted to ISO 4217 codes (historical as well)
  • number and data tests to verify LDML implementations
  • metadata for LDML
  • mappings from language to script and territory
  • various other fixes and additions of data, and extensions to the specification

for more details see the press blurb and the version information page.

as a reminder, icu4j makes use of the CLDR for it's locale data. hubba hubba.

April 23, 2005
cldr 1.3 goes beta
the unicode consortium has announced the release of cldr 1.3 beta version. chief among the new stuff is data to support timezone localization, data for UN M.49 regions (including continents and region), and some number and data tests to help you verify your implementation. the only thing i'm not yet seeing is a clear/standard indication of writing system directionality. you still have to read through the data looking for "hints". not that i don't simply just love staring at pages and pages of XML goop but i sure wish there were something i could quickly search for.

as usual, you can file any bugs you find here.

on a somewhat "lighter" note, the wikipedia has a page on the "Heavy metal umlaut". it's an interesting read on the use (gratuitous and otherwise) of the umalut (Ä ä Ö ö Ü ü) associated with heavy metal music. and no, i'm not a heavy metal fan, though i guess the led zepplin of my youth might qualify (i can only listen to I to III these days, untitled onwards kind of lost me).

February 19, 2005
icu4j has moved
just in case you haven't been notified, the icu4j sites have moved.

on the topic of icu4j, i knocked off a couple of pages to explore it's new ULocales class (after somebody asked me how many new locales for India and i had no idea). i was surprised by the answer.

if that doesn't surprise you, try the United Kingdom or Ethiopia.

February 7, 2005
blackstone locales
maybe i didn't look hard enough but i haven't seen any mention about locales in any of the blogs/articles/etc. concerning the release of blackstone (now officially known as ColdFusion MX 7). ditto during the beta pr period. no idea about why this was but it's sure like hiding your light under a bushel. if you're a g11n developer, Blackstone's going to be a real eye-opener. core Java's locales are now Blackstone's locales. from the measly 20 odd locales in cfmx 6.1, Blackstone gives us 130. the figure below compares locale support across different versions of cf. pretty cool, huh?

cf supported locales

and you can now use Java style locale identifiers like ar_AE instead of the "pretty" locale name Arabic (United Arab Emirates), so now it's that much easier to synch up your calls to core Java's ResourceBundle class from cf. and you can buy into all that locale info using the super simple setLocale() function.

of course, as soon as i get what i've asked for after years of asking, i find some new plaything. as you might have read in this blog, icu4j's latest release (3.2) switched to the CLDR's locales, all 232 of them (with 60 more in beta). the graph below compares cf with and without icu4j.

cf w/icu4j supported locales

gives you pause, which should i use for locale support? oh my. i'll be revisiting this issue again.

November 5, 2004
cldr 1.2 released
the unicode consortium has announced the release of version 1.2 of the Common Locale Data Repository (cldr). quoting the press release, the latest version contains "232 locales, covering 72 languages and 108 territories. There are also 63 draft locales in the process of being developed, covering an additional 27 languages and 28 territories". wow.

you can pick up the cldr here and read more about it here.

via the unicode mailing list.

October 21, 2004
cldr 1.2 in beta
the latest version of the cldr (1.2) has entered beta. of particular interest are the 'interim vetting charts' which gives you a sneak preview of what's been changed & what's coming for the release version. many of these are "common" changes such as localized territory names, etc. but there are some local stuff that's been "fixed".

in case you're interested, there's also a cldr wiki.

October 1, 2004
cldr 1.2 alpha
unicode has just announced the public release of the alpha version of the cldr (Common Locale Data Repository). some of the highlights include:

  • better documentation for date/number format patterns (one of my favorites)
  • added stuff about references/validity/etc.
  • new timezone localization model
  • weekend data
  • added Oriya,Malayalam,Assamese,Welsh,Dzongkha,Bhutan,Khmer and Lao (woohoo se asian) locales
  • added more country,language,currency, and type display name data for ar,bg,cs,el,he,hr,hu,is,mk,pl, ro,ru,sk,sl,sr,tr,uk (the arabic stuff is way cool)

read more on the cldr website. you can compare the cldr versus platform data here. and you can report bugs here.

via the unicode mailing list.