Viewing By Category : ICU4J / Main
January 16, 2009
icu4j 4.01 released
the icu4j project has just released version 4.01. its a regular maintenance release with the following changes (common across all flavors):
  • Unicode 5.1
  • locale data: Common Locale Data Repository (CLDR) 1.6
  • charset converter file size improvement
  • date interval formatting (note only gregorian calendar is supported n this release)
  • improved plural support

specific icu4j changes include:

  • charset
    • ICU2022 converter
    • HZ converter
    • SCSU/BOCU-1 converter
    • charset converter callback
  • thai dictionary break iterator (yeah)
  • JDK TimeZone support (this is pretty decent as you can now share tz IDs between coldfusion/core java & icu4j)
  • locale service provider
  • more convenient formatting of year+month, day+month, and other combinations
  • simple duration formatting
i guess it's time to update the icu4j CFCs for the new formatting bits. as usual you can download the new version from here. btw you can still get a hold of the icu4j tools here.

July 11, 2008
icu4j 4.0 hits the streets
the latest version of the super cool icu4j i18n library has been released. the big changes (to me) are:
  • that it has upgraded it's resource data to Unicode 5.1 and CLDR 1.6
  • added date interval formatting (ie Jan 10, 2008 to Jan 20, 2008 becomes Jan 10-20, 2008, 10:10am to 11:10am becomes 10:10-11:10am, etc.). downside is that currently it's only gregorian calendar)
  • added DurationFormat so you can now format over a duration in time such as "2 days from now" or "3 hours ago".
  • added "Locale Service Provide" support for core java's new locale service--many folks just want the filthy-rich and frequently-updated locale data that icu4j has and not the whole library. i wonder if there is a way to backdoor this into coldfusion's locales?

you can grab the jar files/api docs and read more about the new stuff here.

December 13, 2007
icu4j 3.8.1 maintenance release
not much new stuff here:
  • updated to use CLDR version 1.5.1
  • updates to timezone formatting and parsing (haven't checked if the bananas tz update is included)
  • some bug fixes detailed here in the readme file

download the lib here: icu4j 3.8.1

September 15, 2007
icu4j 3.8 final released
the final version of icu4j version 3.8 has just been released. to recap what's in this release:

  • uses the latest cldr 1.5.0.1 locale data
  • the long discussed rule based timezone changes which gives us the ability to read and write timezone data in RFC2445 VTIMEZONE format as well as also providing access to Olson timezone transitions! this is something many people have been needing for quite some time now, this is going to be very useful
  • tawainese calendar (a flavor of gregorian calendar that numbers years since 1912AD)
  • the Indian National Calendar (more complicated flavor of the gregorian calendar, eg it's synched up with the gregorian calendar's leap years but the extra day is added to the first month, Chaitra which starts march 22 on gregorian calendar--so, yup, it's complicated)
  • charset conversion bugs were fixed and CESU-8, UTF-7 and ISCII converters have been added. also some conversion speed improvements. the UTF-7 one will be useful for email (bounce) handling
  • a new MessageFormat type for plurals was added
  • a pretty useful new DurationFormat class was added so you can format messages over a duration in time such as "2 days from now" or "3 hours ago"
  • also the MessageFormat class will now take named arguments instead of just arrays (too bad now that coldfusion 8's javacast got a shot of steroids)
  • new BIDI stuff (which i still need to investigate)

next month i'll be adding the new calendars as CFCs to the usual bits. i'll also be doing some significant changes to most of the i18n formatting methods to take better advantage of the calendar, etc. keywords (en_GB@calendar=indian,currency=EUR) on the ULocale class (icu4j's super cool locale class).

unfortunately the persian calendar still appears to be only in icu4c (C/C++) only.

August 9, 2007
icu4j 3.8 draft released
a draft of icu4j version 3.8 has just been released. what's so hot about this release? well a lot actually:

  • it uses the latest and greatest cldr 1.5 locale data
  • the long discussed rule based timezone changes which gives us the ability to read and write timezone data in RFC2445 VTIMEZONE format as well as also providing access to Olson timezone transitions! this is stuff many people have been looking for, this is going to be very useful
  • tawainese calendar (which i never knew existed, looks like a flavor of gregorian calendar that numbers years since 1912AD)
  • the Indian National Calendar (ditto though looks like a more complicated flavor of the gregorian calendar, eg it's synched up with the gregorian calendar's leap years but the extra day is added to the first month, Chaitra which starts march 22 on gregorian calendar--so, yup, it's complicated)
  • charset conversion bugs were fixed and CESU-8, UTF-7 and ISCII converters have been added. also some conversion speed improvements. i think the UTF-7 one looks pretty useful
  • a new MessageFormat type for plurals was added, looks like some eastern european languages have complicated rules for plurals
  • a new DurationFormat class so you can format messages over a duration in time such as "2 days from now" or "3 hours ago" (this one looks useful)
  • also the MessageFormat class will now take named arguments instead of just arrays (too bad now that coldfusion 8's javacast got a shot of steroids)
  • bunch of new BIDI stuff (which need some investigating)

i'll be adding the new calendars as CFCs to the usual bits as soon as i do enough background research on them to understand any "quirks". i'll also be doing some significant changes to most of the i18n formatting methods to take better advantage of the calendar, etc. keywords (en_GB@calendar=indian,currency=EUR) on the ULocale class (icu4j's super cool locale class).

looks like a persian calendar was also added but appears to be only in icu4c (C/C++) only for the time being.

wow, fun times in the old town tonite (it's actually in the AM in bangkok but you get the idea).

August 1, 2007
PHP i18n
normally i would say that PHP's unicode/i18n support is fairly lame compared to coldfusion (actually i'd call it a joke but i'm not trying to be controversial here). well i stumbled on an interesting line on the ICU site concerning how PHP 6 would be using the ICU library (icu4j's sister C/C++ library). i was sort of shocked that PHP was considering this (hey PHP is lame after all), so thinking maybe this was market-speak or just plain wishful thinking, i googled it and turned up plenty of references including this article.

first this article confirms that PHP's unicode/i18n support really is lame (also see this article for a bit older take on PHP's unicode/i18n support, i especially liked the Unicode should have been in PHP five years ago quote). but more importantly, and what's surprising to me, is that they're actually doing something about it by adopting ICU. going from being an i18n joke to fully supporting unicode/i18n via the ICU project. i know next to nothing about the PHP world so i have no idea if this is really happening (or has already happened) or is just hot air but it looks like they're on the right track with ICU.

wonder if there's a lesson here?

May 21, 2007
party like it's 1999
there was a recent aticle in Time that once again reminds me that the world is a big, complex place. while at one time Ethiopia was probably best known for famine & despair and LiveAid (though i prefer to recall their great long distance runners & links to bob marley), come september 11 (yes, 9/11) they'll literally be partying like it's 1999 because in the Ethiopic calendar (also known as the Ge'ez calendar) it is 1999. september 11 marks the end of the 20th century according to their calendar. the Ethiopic calendar, which is kind of based on the julian calendar, has twelve months of 30 days each plus a "13th" month consisting of five or six epagomenal days (fancy way of saying inserting a leap day, etc to make a calendar follow the seasons or moon phases). and since i know you're dying to know, today, 21-May-2007 (gregorian calendar) is 1999 Genbot 13 in the Ethiopic calendar (of course icu4j has an Ethiopic calendar component).

and yes, even though it helps "date" me, i am still a fan of Prince's 1999.

May 19, 2007
God helps those who help themselves
since it looks like they'll be playing ice hockey in hell before ColdFusion makes use of the very cool icu4j library, i figure we better start helping core java get it's locale resource act together. so lets start somewhere near my neighborhood, australia & new zealand.

core java's locale data for en_AU (Australia) and en_NZ (New Zealand) time formats is a bit off. it uses a format of H:mm:ss where the "H" stands for 24 hour clock, ie 5:00 PM would be formatted as 17:00. the CLDR (common locale data repository) however states that the time format for en_Au & en_NZ locales is h:mm:ss a (well actually it's proposed to include the timezone, "h:mm:ss a z" see the en_AU time format entry here). while most users in those locales are smart enough to get that 17:00 is 5:00 PM when your ColdFusion app outputs time values, it would play havoc when ColdFusion tries to parse what those same folks would normally input for a time value.

so hey en_AU and en_NZ locale people, time to start helping yourselves. Sun has accepted this as a new bug, go vote for it (you have to be a member of the Sun Developer Network to vote but these days, who isn't).

May 2, 2007
icu4j 3.6.1 maintenance release
if you're using icu4j 3.6 and supporting chinese locales you should grab this upgrade. specifically if you support zh_CN, zh_TW,zh_HK,zh_MO, or zh_SG you'll probably want this upgrade as the actual locale data is picked up from the ICU locale "zh" data bundle which does not contain any regional specific data such as currency. if i understand correctly, the locale data was keyed off scripts, for instance zh_Hans_CN or zh_Hant_TW.

if you're still on older versions of icu4j, you should be ok as this is a new bug introduced in 3.6.

December 16, 2006
more timezone: timezones by country
been way too busy to blog about anything lately but this might be useful to somebody, somewhere. the super cool icu4j lib has had a method to retrieve timezones by country for a couple of versions now. it's something i wish core java had, but here's the next best thing--a csv file of icu4j's timezone data along w/country. the data consists of "full" country name (Thailand), 2-letter ISO-3166 country code (TH) and timezone ID. while the timezone ID are from icu4j, these should be ok for use w/core java. frankly, i've only had time to test a few countries worth of data, so if you find any that don't work, let me know and i'll see about fixing it.

October 7, 2006
icu4j 3.6 hits the streets
was too busy to blog this when it was actually released but icu4j version 3.6 was released on 1-Oct-2006. the release notes can be found here. note that that are two new "supplemental" jars, one for XLIFF conversion tools and another for charsets. to recap the new bits for this release:

  • supports unicode 5.0
  • common locale data repository (CLDR) 1.4
  • globalization preferences, flexible container for locale data was added
  • a preview of the flexible date/time format generator (allowing multiple date and time format patterns to be generated) was added
  • a preview of the ICU4J implementation of the java.nio.charset.Charset API was added

and as the project site notes, be careful using the preview stuff in production.

September 22, 2006
icu4j 3.6 hits beta
get it while it's hot, icu4j 3.6 has just had a beta release. see the read me for more info.

September 9, 2006
icu4j 3.6 hits alpha
the ICU project has announced the release of an alpha version of icu4j 3.6. you can grab this cool java library here. so what's new for 3.6? according to the brief release notes:
  • support for Unicode 5.0
  • 25% more CLDR locale data in 245 locales in ICU
  • a flexible date/time format generator has been added, allowing for multiple date and time format patterns to be generated that are valid for specific locales (sounds interesting)
  • under "Globalization Preferences", a new flexible container for locale data was added
  • for more charset conversion bang-for-your-buck, a preview of the ICU4J implementation of the java.nio.charset.Charset API was added

addendum: apparently the nifty timezone bits proposed earlier this year didn't make it into this release. too bad, so sad, could have been very useful.

May 15, 2006
BoardFusion's i18n bits
just in case you missed it, the BoardFusion project (BF) has released a preview of the user interface (UI). and while the project accumulated a lot of useful comments, i'm posting this to solict some specific i18n feedback before we close the books on the UI preview. so this is kind of the "last gas for 200 km" review.

to recap:

  • i18n is a zero level goal (that is the project won't leave home without it).
  • it will be based on icu4j java library and by based i mean every single i18n function, except some parts of the resource bundle CFC and (probably) the Gregorian calendar will be derived from it.
  • besides the basic Gregorian calendar most ColdFusion developers are familiar with, this project will also include Buddhist, Chinese, Japanese, Islamic, and Hebrew calendars to handle that tricky calendar math.
  • user centric timezone, users will see datetimes in their individual timezones--and yes, even this functionality will come out of icu4j. by divorcing this functionality from core Java, the project will be able to take advantage of icu4j's more frequent updates.
  • locale based collation (sorting).
  • strict use of resource bundles (rb), you will be able to l10n skin this puppy, though we haven't 100% decided on the "recommended" rb management tool yet. besides icu4j's rb manager, any ideas?
  • standard localized date/numeric/currency formatting, all hail CLDR.
  • the project will make use of the super cool JavaLoader in order to load the icu4j from off the server classpath (shared hosts will not be a problem). this also allows for painless updating of the icu4j jar file.

so, have we missed anything? some i18n related functionality we've overlooked? any rb managemnet tool you particularly like? if you have any ideas please submit them here as comments or better yet via the UI preview. we'd really appreciate it. thanks.

for more information on the project see the "BoardFusion News Page" and the Project Wiki.

May 10, 2006
norwegian locale A-Go-Go
some recent work has me again turning over the rocks where core java locales are hiding and once again a closer look at what crawled out reveals just how sweet icu4j's locale support really is. according to several resources, such as ethnologue and the odin archive (gotta love that name), norway has two main written languages Bokmål and Nynorsk, with Bokmål being dominant. in core java there is one (well two if you include the plain norwegian langauge, no) locale and one variant for norway: no_NO and no_NO_NY. assumming core java meant Bokmål for plain Norwegian (no and no_NO), then i suppose the variant (no_NO_NY) is for Nynorsk. huh? but i thought Nynorsk was a language? why is it a variant here? in icu4j, which uses the CLDR for it's locale data, we can see two locales (four if you count the plain nb/Bokmål and nn/Nynorsk languages): nb_NO (Bokmål Norwegian) and nn_NO (Nynorsk Norwegian). neat and tidy.

one of the side effects of this core java locale is that ColdFusion's old locale name Norwegian (Nynorsk) actually produces no_NO locale data. any legacy apps still using this locale identifier are probably telling people the wrong thing, for example:

setLocale('Norwegian (Bokmal)');
writeoutput('#lsDateFormat(now(),"DDDD")#');
produces: mandag

while
setLocale('Norwegian (Nynorsk)');
writeoutput('#lsDateFormat(now(),"DDDD")#');
also produces: mandag

icu4j on the otherhand produces:
måndag for nn_NO
mandag for nb_NO

it looks like ColdFusion got tripped up on the "variant instead of language" locale.

taking this a step further, doing a "FULL" date format shows up even larger differences between core java and icu4j:

core java
8. mai 2006 for no_NO
8. mai 2006 for no_NO_NY

icu4j
måndag 8. mai 2006 for nn_NO
mandag 8. mai 2006 for nb_NO

oops. to my way of thinking, a "FULL" date format should include the day name as well as the rest of the date (date in month, month and year). i really wish ColdFusion would use icu4j.

and the "A-Go-Go" reference? nothing to with g11n or ColdFusion, just been listening to a lot of Dengue Fever lately and that song has just stuck in my head ;-)

April 5, 2006
proposed timezone changes for icu4j
in response to some RFEs, IBM's Yoshito Umaoka has proposed some interesting changes to ICU4J's timezone (tz) classes including methods to list tz rules as well as handle iCalendar's VTIMEZONE. to summarize from his email:

  • com.ibm.icu.util.ZoneRule: an abstract class representing a tz transition rule. this class represents basic properties of zone rule such as raw UTC offset and DST offset and abstract methods to access onset information.
  • com.ibm.icu.util.TimeListZoneRule: a concrete class extending ZoneRule. this class represents zone transition point(s) defined by UTC millis.
  • com.ibm.icu.util.RecurrentZoneRule: a concrete class extending ZoneRule. this class represents recurrent zone transitions defined by a rule, such as first Sunday in April. the way to define recurrent rule is pretty similar to SimpleTimeZone.
  • com.ibm.icu.util.RuleBasedTimeZone: a class extending TimeZone. this class aggregates one or more ZoneRule instances. using this class and ZoneRule instances, you can create a custom TimeZone which supports any historical zone transitions.
  • com.ibm.icu.util.VTimeZone: a class extending TimeZone, wraps either RuleBasedTimeZone or OlsonTimeZone (default TimeZone implementation used by ICU4J). this class would have two constructor methods for creating a new VTimeZone instance from 1) TZID such as "America/New_York" and 2) RFC2445 VTIMEZONE component. this class also provides some method to write out underlying zone rules into VTIMEZONE format.

in addtion to the new classes mentioned above, he also proposes some modifications to existing classes:

  • com.ibm.icu.util.TimeZone: an additional method - "List getZoneRules()", which returns a list of ZoneRule instances for the TimeZone. the implementation in TimeZone class just throws UnsupportedOperationException.
  • com.ibm.icu.util.SimpleTimeZone / com.ibm.icu.impl.OlsonTimeZone: overrides "List getZoneRules()" to return actual ZoneRule instances for these TimeZone implementation.

the javadocs for the proposed changes have been (temporarily) put up here. if you want to participate in the discussion regarding these changes hop on over to the ICU sourceforge site and subscribe to the mailing list.

jitter bug references: 4577, 5012

to me these seem like some decent improvements and i know several folks in the ColdFusion community are interested in timezones, especially their rules.

March 29, 2006
heads up: timezone CFC updated
well, the icu4j versions were anyway. dan switzer seems to have turned up a problem with the icu4j version that i also encountered over the weekend. the icu4j version extended the core java version by simply substituting com.ibm.icu.util.TimeZone for the core java TimeZone class. unfortunately if you didn't explicitly pass in a timezone (tz), you were supposed to get the server's tz. however icu4j differs in the way this is done:
core java:
default="#tzObj.getDefault().ID#"

icu4j:
default="#variables.timeZone.getDefault().getDisplayName()#"

the tz that the core java default method was returning wasn't understood by icu4j but it didn't throw an error but silently returned the UTC tz instead. whoops.

you can pick up the new version here.

March 26, 2006
stealth seems to be icu4j's middle name
once again, icu4j has quietly slipped out another stealthy upgrade to version 3.4.4. this update fixes "crashing bugs in the data". i'm not really sure how critical this update is but better safe than sorry.


Australian DST change: a day late and a dollar short?
while i should have been more than vaguely aware of this issue, it seems even Sun was laying down on the job a bit. Australia observes DST (Daylight Saving Time or Summer Time as they say down under) just like the US and other countries. DST in Australia normally ends March 26, 2:59AM (local time). however this year, to accomodate the Commonwealth games, the DST end date was pushed back to April 2. most older JRE's (like the version that coldfusion runs on, even the updated JRE that the flex/coldfusion connector "installs") still run off the older Olsen data with Australian DST ending March 26. on March 25th i got an email from the Sun Developer Network pointing at this article about the issue including links to updated JREs. talk about cutting it close.

icu4j on the other hand, has had this and other updated timezone info for some time now.

March 6, 2006
"remote" classpath revisited
i seem to have gotten myself into the habit of calling spike's cool "Loading java class files from a relative path" technique as the "remote classpath" technique--i guess i can blame christian cantrell for that. in any case, this technique works very well in most cases where you don't have access to a server's classpath (most shared hosts for example). where it tends not to work is, from my experience, with java classes that don't have "blind" constructors, ie where no arguments are required to initialize that class. classes like icu4j calendars, formatters, etc. usually work just fine but classes like icu4j's ULocale or MessageFormat don't as these require something to be passed to their constructors. for these classes (which are darned important to me) something like this fails:

<cfscript>
// remote init jarFile=jarLocation & "icu4j.jar";
URLObject = createObject('java','java.net.URL');
URLObject.init("file:" & jarFile);
URLArray = createObject("java","java.lang.reflect.Array").
newInstance(URLObject.getClass(),1);
arrayClass = createObject("java","java.lang.reflect.Array");
arrayClass.set(URLArray,0,URLObject);
loader = createObject("java","java.net.URLClassLoader");
loader.init(URLArray);
uLocale=loader.loadClass("com.ibm.icu.util.ULocale").newInstance();   
</cfscript>
<cfdump var="#uLocale#">

while i've managed to workaround this issue (ULocales are everywhere in icu4j, most classes that deal with locales have a getAvailableULocales() method) it's always kind of nagged at me. after a bit of poking and prodding i started looking into ways to get at the actual constructors for a given class:

// remote init
jarFile=jarLocation & "icu4j.jar";
URLObject = createObject('java','java.net.URL');
URLObject.init("file:" & jarFile);
URLArray = createObject("java","java.lang.reflect.Array").
newInstance(URLObject.getClass(),1);
arrayClass = createObject("java","java.lang.reflect.Array");
arrayClass.set(URLArray,0,URLObject);
loader = createObject("java","java.net.URLClassLoader");
loader.init(URLArray);
uLocale=loader.loadClass("com.ibm.icu.util.ULocale"); // don't init c=uLocale.getConstructors();
for (j=1; j LTE arrayLen(c); j=j+1) {
   params=c[j].getParameterTypes();
   for (i=1; i LTE arrayLen(params); i=i+1) {
      writeoutput("ULocale[#j#]: #i# #params[i].getName()#<br>");
   }
   writeoutput("<br>");
}   
</cfscript>

which in this case returned 3 constructors (just like the API says but not in the javadocs order):

ULocale[1]: 1 java.lang.String ULocale[1]: 2 java.lang.String ULocale[1]: 3 java.lang.String

ULocale[2]: 1 java.lang.String

ULocale[3]: 1 java.lang.String ULocale[3]: 2 java.lang.String

which i can easily match to the one i want (ULocale("th_TH")):

<cfscript>
// remote init jarFile=jarLocation & "icu4j.jar";
URLObject = createObject('java','java.net.URL');
URLObject.init("file:" & jarFile);
URLArray = createObject("java","java.lang.reflect.Array").
newInstance(URLObject.getClass(),1);
arrayClass = createObject("java","java.lang.reflect.Array");
arrayClass.set(URLArray,0,URLObject);
loader = createObject("java","java.net.URLClassLoader");
loader.init(URLArray);
uLocale=loader.loadClass("com.ibm.icu.util.ULocale");   
c=uLocale.getConstructors();
// the newInstance method wants an array
obj=listToArray("th_TH");
// we want the 2nd constructor
thaiLocale=c[2].newInstance(obj.toArray());
</cfscript>

<cfdump var="#thaiLocale#">

which indeed returns an object of com.ibm.icu.util.ULocale.

since in most cases, i only use one way to init a given class, this technique will work OK for us. my only question is will the order of constructors remain the same? can i always count on the 2nd constructor to be ULocale("th_TH")? or should i build metadata functionality to probe the constructors to see which one matches?

ps: i did indeed learn my lesson, notice how i passed the coldfusion array using toArray() ;-)


MessageFormat or how not to read error messages
in an earlier post i was babbling on about how neat the com.ibm.icu.text.MessageFormat class was. i was also on about how you'd need a java wrapper class to really make use it. i thought that because whenever i tried something like:

<cfscript>
ozLocale="en_AU@calendar=gregorian";
thisPattern="On {0,date,short} at {0,time,short}, I left {1} for the {2}. I took {3,number,currency}";
thisLocale=createObject("java","com.ibm.icu.util.ULocale").init(ozLocale);
args=arrayNew(1);
args[1]=now();
args[2]="the office";
args[3]="microbrewery";
args[4]=javacast("int",100);
mf=createObject("java","com.ibm.icu.text.MessageFormat").
init(thisPattern,thisLocale);
thisMsg=mf.format(args);
</cfscript>

<cfdump var="#thisMSG#">

coldfusion would always throw an error at the thisMsg=mf.format(args) bit along the lines of: Error casting an object of type to an incompatible type. This usually indicates a programming error in Java, although it could also mean you have tried to use a foreign object in a different way than it was designed. which for some reason made me think it was because the format() method is overloaded and i couldn't figure out the right combination of argument classes to get it to work. my knee jerk reaction to this is to build a wrapper class and move on, which i promptly did.

i was puttering around with something this weekend (a method to count business days using icu4j's Holiday class) when i actually got the overloaded method error (while trying to add my birthday as a national holiday in the US virgin islands, en_VI). re-visiting the format() method errors it finally dawned on me that the error message was perfectly accurate and the real issue (besides me being a knee jerk reactionist and thick as a brick) was with the args array. coldfusion arrays aren't exactly java Arrays (if i recall correctly they're java.util.Vectors). back in the Triassic era, christian cantrell's blog had an entry concerning this problem where he pointed out a simple solution using the inherited toArray() method. so changing thisMsg=mf.format(args) to thisMsg=mf.format(args.toArray()) made that method work plenty fine. initial benchmarks show this java-based method to be considerably faster than our in-house one, not to mention saving all the locale formatting code we had to use prior to substituting the actual data. we'll be releasing updates to our resource bundle CFCs incorporating this new method sometime this week.

the sharp-eyed among you probably noticed the peculiar way i defined the locale en_AU@calendar=gregorian. icu4j locales (ULocales to be precise) have, besides the usual language, country, variant identifiers, keywords. keywords allow you to create a locale using a specific calendar, collation or currency (see the ICU user guide for details). in practice that means you can control the way MessageFormat formats your dates and currencies without having to mess around with them prior to submitting the data to the format() method. you can use any of the seven odd calendars that icu4j knows about, for instance en_AU@calendar=buddhist would produce dates formatted using the Buddhist calendar (BE), en_AU@calendar=islamic-civil would format dates using the civil version of the Islamic calendar, etc. very cool if you ask me. this is another area where icu4j kind of glances in the rear-view mirror as it blows by core java's i18n bits ;-)

March 1, 2006
an unstealthy icu4j upgrade
IBM has announced a maintenance release for icu4j, version 3.4.3. among the goodies for this version are:
  • Olson 2006a time zone data (just in time to get ready for the new DST in the US)
  • corrects mistakes in the CLDR data found in icu4j 3.4.2
  • MessageFormat (like core java's but it can use icu4j's super cool ULocale class) upgraded to @stable"
  • fixed bugs in DateFormat, SimpleDateFormat, etc.
  • and a bit more trivial (to me) but should make some folks happy this release no longer tags "@draft" APIs with "@deprecated" by default--though why they ever did that in the first place is a bit of a mystery to me

the MessageFormat class is kind of cool in that it handles compound rb strings (which i'd rather have never learned about) such as: "At {1} on {2}, there was {3} on planet {4}". in the past, we normally handled this with in-house methods which are somewhat cumbersome in that we needed to do any date/numeric/currency formatting on the substituted values for the message's placeholders (the bits in between the {}) prior to formatting the message. now using the com.ibm.icu.text.MessageFormat you could do something like:

<cfscript>
   mfObject=createobject("java","com.ibm.icu.text.MessageFormat");
   args=arrayNew(1);
   args[1]=now();
   args[2]="the office";
   args[3]="microbrewery";
   // pass in the message string and substitution arguments    thisMsg=mfObject.format("On {0,date,full} at {0,time,full}, I left {1} for the {2}.",args);
   writeoutput(thisMsg);
</cfscript>

which would produce something like (in the en_US locale) "On Wednesday, March 1, 2006 at 8:44:22 PM GMT+07:00, I left the office for the microbrewery.".

to explain a bit more : {0,date,full} is a placeholder that takes the first element in the args array (java arrays start at 0) and applies localized date formatting with the "full" style. {0,time,full} ditto but uses time formatting and {1} and {2} are placeholders for simple strings.

however in order to make this more flexible (ie. use locales other than the server's default), you'll have to use a simple java wrapper class--the MessageFormat format method is overloaded and coldfusion can't easily use it's other "flavors" which require StringBuffer and FieldPosition classes.

February 19, 2006
BIG numbers in coldfusion
mark kruger has an interesting post on his blog concerning formatting big numbers in coldfusion. in that post's comments sean corfield points out that the real issue is the precision of the float datatype that coldfusion uses under the covers. this issue has also come up a few times on the support forums and probably the best answer is (as usual) to dip down into the java underlying coldfusion to use one of the Big math classes (java.math.BigInteger, java.math.BigDecimal) to handle the math on the special occasions that you really need that kind of precision. as sean pointed out, float is faster than BigDecimal for calculations so you should use those classes only when they are really needed.

what has this got to do with g11n? well even if you do use those Big math classes, core java's NumberFormat class doesn't understand it's own BigDecimal/BigInteger classes (ie it casts everything back to double/long). so when you come to display these values you're back in the same situation that mark's post describes. what to do? use icu4j of course (everybody knew that was coming). it's NumberFormat class understands BigDecimal/BigInteger plenty fine. for example:

<cfscript>
theNumber="9123456789123456789.123";
//use server default locale nF=createObject("java","com.ibm.icu.text.NumberFormat").getInstance();
cNF=createObject("java","java.text.NumberFormat").getInstance();
bigDecimal=createObject("java","java.math.BigDecimal").init(theNumber);
formattedNumber=nf.format(bigDecimal);
coreJavaFormattedNumber=cNF.format(bigDecimal);
writeoutput("original number:=#theNumber#<br>
   big decimal representation:=#bigDecimal#<br>
   icu4j number Formatted:=#formattedNumber#<br>
   core java number Formatted:=#coreJavaFormattedNumber#"
);
</cfscript>

which outputs:

original number:=9123456789123456789.123
big decimal representation:=9123456789123456789.123
icu4j number Formatted:=9,123,456,789,123,456,789.123
core java number Formatted:=9,123,456,789,123,457,000

i really wish coldfusion would use icu4j. it would make i18n work much easier and as a side effect help w/problems like this.

February 15, 2006
another icu4j stealth upgrade
i seem to keep missing these....the super cool icu4j lib was updated 20-jan-2006 to version 3.4.2. it contains a few bug fixes (Chinese date format/calendar, currency rounding bug for de_CH locale, etc.) but the biggest deal is that this release dumps the dependency on core java timezone data. while i normally use core java's timezone classes this puppy has several methods that i find pretty cool. for instance, one of the biggest headaches w/using timezone data is that there is just so darned many of them. filtering these down into something reasonable often results in some compromises that always leave me feeling like we're missing something. now we can do filtering that at least looks more reasonable, say like using a user's country:

<cfscript>
tz=createObject("java","com.ibm.icu.util.TimeZone");
//get TZ based on country
zones=tz.getAvailableIDs("TH");
</cfscript>

<cfdump var="#zones#">

how cool is that?

October 31, 2005
icu4j stealth upgrade
no idea why i missed this (i guess it was never publically announced) but icu4j was recently updated to version 3.4.1 (a maintainance release from 3.4). not a whole lot of changes, i guess the most significant is a fix to the new CharsetDetector class.

anyway, grab it to keep in lock step w/IBM's ICU project.

October 24, 2005
g11n gotchas
a couple-three emails i got recently prompted me to think (again) about what globalization means to the average coldfusion developer. coincidentally mark davis, IBM's front man for g11n and president of the Unicode Consortium, is putting together a presentation for the next Unicode conference dealing with "Globalization Gotchas". i highly recommend cf developers doing i18n/g11n work to review these, it's certainly worth the effort.

among my favorites that apply in one way or another to coldfusion (i've yakked about these in various articles/books/blog entries but good stuff usually bears repeating):

  • Unicode encodes characters, not glyphs: U+0067 » ggggggg
  • Unicode does not encode characters by language: French, German, English j have the same code point even though all have different pronunciations; Chinese 大 (da) has the same code point as Japanese 大 (dai).
  • Length in bytes may not be N * length in characters
  • Not all text is correctly tagged with its charset, so character detection may be necessary. But remember, it's always a guess.
  • Use properties such as Alphabetic, not hard-coded lists: isAlphabetic(), /p{Alphabetic} in regex
  • Transliteration (Ελληνικά ↔ Ellēniká) is not the same as Translation (Ελληνικά ↔ Greek)--users of my transliteration CFC please take note
  • Unicode ≠ Globalization. Unicode provides the basis for software globalization, but there's more work to be done...
  • Don't simply concatenate strings to make messages: the order of components different by language. Use Java MessageFormat or equivalent. (like the rbJava or javaRv CFCs)
  • Don't put any translatable strings into your code; make sure those are separated into a resource file.
  • Don't assume everyone can read the Latin alphabet. Don't assume icons and symbols mean the same around the world.
  • Tag all data explicitly. Trying to algorithmically determine character encoding and language isn't easy, and can never be exact.
  • Formatting and parsing of dates, times, numbers, currencies, ... are locale-dependent. Use globalization APIs that use appropriate data.
  • If you heuristically compute territory IDs, timezone IDs, currency IDs, etc. make sure the user can override that and pick an explicit value. (ie be automagical about locale choice, etc. but allow the user to manually pick what they want)
  • Don't assume the timezone ID is implied by the user's locale. For the best timezone information, use the TZ database; use CLDR for timezone names.
  • Java globalization support is pretty outdated: use ICU to supplement it. (cf developers should use ICU4J)

August 8, 2005
i18n calendars updated
i've updated the i18nCalendars CFCs to include the new coptic and ethiopic calendars added to icu4j 3.4. and that makes a total of nine calendars. i don't imagine having any use for either of these calendars right now but given the recent focus on africa and ethiopia in particular, you never know.

if you're interested in using icu4j's new AcceptLanguage method, you'll need to wrapper it. this method makes use of an 'out-parameter' method to return a boolean as to whether the method used a fallback locale (ie. it couldn't find a suitable locale among the server's installed locales, so it returns a fallback locale instead). coldfusion won't pick up on that returned boolean array. below find some java code for this (it returns a structure with the selected locale and whether or not it was a fallback locale):

import java.util.*;
import com.ibm.icu.util.ULocale;

public class ULocaleAcceptLanguage {
/*   
   class:      ULocaleAcceptLanguage
   version:   15-jul-2005
   author:      Paul Hastings paul@sustainableGIS.com
   notes:      simple wrapper class for ICU4J acceptLanguage
*/

   public final static HashMap getULocale(String httpAcceptLanguage){
      HashMap results = new HashMap();
      boolean[] fallback = new boolean[1];
      ULocale thisLocale = ULocale.acceptLanguage(httpAcceptLanguage,fallback);
      Boolean fallB= new Boolean(fallback[0]);
      results.put("locale",thisLocale.toString());
      results.put("fallback",fallB.toString());
      return results;      
   }
   
}

compile this and drop it in your cfinstall classes dir. you can then make use of it:

<cfscript>
   aL=createObject("java","ULocaleAcceptLanguage");
   acceptLanguageStr="en-us,th;q=0.7,ar;q=0.3";
   uL=al.GetULocale(acceptLanguageStr);
</cfscript>

<cfdump var="#uL#">

August 1, 2005
hot hot hot: icu4j 3.4 released
version 3.4 of icu4j, the super cool i18n java library, has just been released. if you do i18n work in coldfusion or java, this is the library. you can download it from here, it's readme file can be found here. and since i'm on a timezone craze this week, i also noticed that the timezone class has added generic timezones (like "Pacific Time", "United Kingdom", etc.) that should help simplify things a bit.

do youself a favor, get this library.

July 2, 2005
get it while it's hot: icu4j 3.4 beta
IBM just announced the beta release of icu4j 3.4. some of the nifty new stuff in this release includes:
  • updated to Unicode 4.1
  • collation engine updated to UCA 4.1
  • fully conformant with CLDR 1.3
  • charset detection framework (which looks very useful)
  • message formatting apostophe solution
  • additional usability APIs
  • new currency listing API
  • more API for accessing CLDR data
  • Coptic and Ethiopic calendars (that makes 8 icu4j calendars and Dr. Ghasem Kiani's persian calendar for a total of 9, count 'em 9, calendars)
  • more efficient data loading
you can download the beta release here. report any bugs by july 17th.

and in case you were wondering, today (2-jul-2005) is October 25, 1721 in the Coptic calendar and October 25, 7497 (Amete Alem Era) in the Ethiopic calendar system.

June 4, 2005
eat your heart out core java
the unicode consortium has announced the release of version 1.3 of the Common Locale Data Repository (CLDR). this release pumps up the locale data from 230+ to 296 locales (96 languages and 130 territories). this release's highlights include:
  • a complete set of POSIX-format data generated, along with a tool to generate different platform versions.
  • the addition of new data to support localization of timezones
  • the addition of data for UN M.49 regions, including continents and region
  • the canonicalization (data in many forms converted to a "standard" form) of the data files, including the consolidation of inherited data
  • currency codes are restricted to ISO 4217 codes (historical as well)
  • number and data tests to verify LDML implementations
  • metadata for LDML
  • mappings from language to script and territory
  • various other fixes and additions of data, and extensions to the specification

for more details see the press blurb and the version information page.

as a reminder, icu4j makes use of the CLDR for it's locale data. hubba hubba.

March 7, 2005
cultural bias, leaping leap years batman!
pretty much everybody knows what a leap year is and when one occurs. and in case you don't, coldfusion has a function isLeapYear() that will tell you if a given year is a leap year in the gregorian calendar. in fact most calendars have the concept of a leap "something". the chinese and hebrew calendars have a "leap month" but apparently no concept of a leap year (though the icu4j HebrewCalendar class API are full of references to leap years). the civil version of the islamic calendar has a "leap day" which is added to the last month of 11 out of every 30 years but again no leap year. the persian calendar does have the concept of a leap year, handled via the PersianCalendarHelper class isLeapYear method.

which brings us to the point of this blog entry, this method expects the year argument to be a persian calendar "year" (right now its 1383 in the persian calendar). which i didn't quite grasp at first, as the other calendars (gregorian, buddhist and japanese) with leap years have an isLeapYear method that expects a gregorian year (yes, even the buddhist and japanese calendar classes expect a gregorian year, i imagine this is because these calendars extend the gregorian calendar class). and that's the way i expected the new persian calendar to behave (my own cultural bias--i use the buddhist and gregorian calendars on a daily basis). but it doesn't and why the heck would it? it is a persian calendar after all. so that got me to thinking about the other calendars and the way these "should" work and what other cultural biases have leaked into our code and test harnesses--especially the tests.

first thing i did was to rewrite the i18nIsLeapYear functions across all the calendars to expect a year argument in that calendar's system (it converts to gregorian year as needed and now automagically returns false for calendars lacking the concept of a "leap year").

then i went a hunting for any other places where my cultural bias might have leaked thru....and promptly found it in the getYear function. the getYear function takes a gregorian year value and returns the year in that calendar's system. i was doing that by creating a date:

thisDate=createDate(arguments.thisYear,1,2);

(and just in case you were wondering, the 2 for the day value is to make sure the date value created fell into that year, given that we're using UTC as the time zone standard for all the calendars). and then setting the calendar object to that date and returning the value for that calendar object's YEAR field:

tCalendar.setTime(thisDate);
return tCalendar.get(tCalendar.YEAR);

simple and worked swell for the gregorian, buddhist and japanese calendars because these calendars' year started at the same time. but after looking at the year values of formatted dates from the other calendars i realized that the getYear function was returning horrible nonsense for the other 4 calendars. without realizing it, i'd let my calendar bias creep in and assumed the calendar's were all the same as far as years were concerned. gregorian 2-jan actually falls into different calendar years depending on the calendar (of course, they're different freaking calendars). and the tests were only reporting whether the getYear function "worked" by checking if the year was a positive integer, no eyeball comparisons against the year bits of the formatted date strings. there's a lesson here some where.

so better grab the new code and maybe give the calendars a good poking at to make sure no other cultural bias is left in it.

March 5, 2005
persianCalendar update
a few days ago Dr. Ghasem Kiani updated his persianCalendar class to be "more" icu4j like. i wrapped it up in CFC and added it to the i18nCalendars package (which now contains 7, count 'em, 7 calendars). you can see it on it's own in a simple testbed here. you can download the persian calendar class from Dr. Ghasem's sourceforge project.

note that this version of the persian calendar uses a "well-known arithmetic algorithm for calculating the leap years" rather than astronomical calculations.

i'd like to publicly thank Dr. Ghasem Kiani for his work on this project, we've been waiting quite a while for a persian calendar to round off our i18n calendars. thanks.

February 22, 2005
rokuyo
i seem to have datetime on the brain this month. one of the trickier things i've been trying to get a handle on was how to calculate japanese "rokuyo". what's "rokuyo"? well, let me tell you....

a lunar calendar was used in japan from the 14th to the 19th century. that calendar had a six day week and those six days were known as rokuyo. and like any other calendar system, each day had a name and a particular meaning (you do know that the english weekdays are named after one of the seven "planets" of ancient times?). and of course, each day had a significance:

  • sakigachi good luck in the morning, bad luck in the afternoon
  • tomobiki good luck all day, except at noon
  • sakimake bad luck in the morning, good luck in the afternoon
  • butsumetsu Unlucky all day, as it is the day Buddha died
  • taian 'the day of great peace', a good day for ceremonies
  • shakku bad luck all day, except at noon
source

while i'd guess few people would admit to closely adhering to this system, it does invoke some strange "better safe than sorry" behaviors. for instance, some hospital patients in japan won't agree to be discharged on butsumetsu day, as it's regarded as being very unlucky. rather they'd stay the extra 24 hours to be discharged on a lucky taian day.

the calculations for determining rokuyo turn out to be surprisingly difficult. in fact, the only published code i ever saw for this was developed by Eirik Rude, a cf developer (at that time living in japan). the complexity comes from the need to calculate lunar months (remember the old japanese calendar?). since i wanted to integrate this functionality with our existing icu4j-based calendars, i poked thru the lunar calendars (chinese, islamic and hebrew) that i knew about to see if we could use any of these. of course, the old japanese lunar calendar was basically the lunisolar chinese calendar. using Eirik's basic logic and the icu4j library i was able to considerably reduce the code's complexity (the complexity's still there, but i pushed it down into the icu4j java library where smarter people than i have already dealt with it).

the rokuyo testbed is here and the i18n calendars package incorporates this new functionality (pick japanese calendar from the select). and this is a good resource if you want to read more about rokuyo.

February 20, 2005
universal time
all this poking and prodding into cf's datetime i did lately shone a bright light on the usefulness of something like icu4j's universal time class. if you have to swap back and forth between time scales (for instance some java classes require a long instead of a date type) or even if you do "simple" date manipulations (say averaging two java datetimes could cause overflow even with current dates), you've got good candidates for using universal time. to make a long story short, i built a universalTime CFC to help handle this. below is some output from this CFC (we'd normally have a testbed on our site but for some reason this class won't load via spike's remote classpath technique):
time:= {ts '2005-02-20 16:56:03'}
cf epoch:=38403.7055903 (days since 31-dec-1899)
universal time from cf time:=632,447,745,630,000,000
universal time to cf time:= 38403.7055903

coldfusion timescale:=38403.7055903 (days since 31-dec-1899)
excel timescale:=38403.7055903 (days since 31-dec-1899)
db2 timescale:=38403.7055903 (days since 31-dec-1899)
windows timescale:=6.3244774563E+017 (ticks (100 nanoseconds) since 1-jan-0001)
windowsfile timescale:=1.2753478563E+017 (ticks (100 nanoseconds) since 1-jan-1601)
mac timescale:=130697763 (second since 1-jan-2001)
oldmac timescale:=3191849763 (seconds since 1-jan-1904)
unix timescale:=1109005407 (seconds since 1-jan-1970)
java timescale:=1.109004963E+012 (milliseconds since 1-jan-1970)

the CFC will be in the usual places in a bit.

February 19, 2005
icu4j has moved
just in case you haven't been notified, the icu4j sites have moved.

on the topic of icu4j, i knocked off a couple of pages to explore it's new ULocales class (after somebody asked me how many new locales for India and i had no idea). i was surprised by the answer.

if that doesn't surprise you, try the United Kingdom or Ethiopia.

February 13, 2005
new and improved i18n calenders
i've completely re-worked the individual I18N calendar CFCs. these are now consolidated into one package. most folks using these calendars (at least the ones talking to us) tend to use more than one, so this made some sense, especially as we re-worked the codebase so the 5 non-Gregorian calendars (Buddhist, Chinese, Hebrew, Islamic,Japanese) now extend the "base" Gregorian calendar CFC. our little hand waving at the OO bandwagon currently rolling around CFville. we think it actually will make some improvements in at least code maintenance. previous versions were distributed standalone, with most of the CFC code being duplicated across calendars, the major difference being which ICU4J calendar class the CFC rode. the common functions are now in the base gregorianCalendar CFC, with the other calendars extending that and initializing their own ICU4J calendar class. the codebase went from 7k lines down to 2k lines (and almost half of that being comments).

the code is also considerably improved, its now based on ICU4J version 3.2 and it's ULocale class (232 locales, 100 more than blackstone). several of the more commonly used functions have been re-written and we're seeing 3x-4x speed improvement over the older versions. frankly, i'm a bit baffled why, for instance:

following the ICU4J API and some examples, we initialized date formatting objects with the calendar class (Buddhist, Chinese, Gregorian, Hebrew, Islamic,Japanese) we were working with:

// init calendar with timezone and locale
var thisCalendar=aCalendar.init(utcTZ,thisLocale);
// return formatted date
return aDateFormat.getDateInstance(thisCalendar,tDateFormat,
thisLocale).format(dateConvert("utc2local",arguments.thisDate));

was reworked into this:

// init date formatter object with date format, locale and default calendar
var tDateFormatter=aDateFormat.getDateInstance(tDateFormat,thisLocale);
// swap calendars tDateFormatter.setCalendar(aCalendar.init(utcTZ,thisLocale));
return tDateFormatter.format(dateConvert("utc2local",arguments.thisDate));

this builds the date formatter object with the default calendar, then we swap it to the calendar we want to use (the tDateFormatter.setCalendar bit). that sped up this function 3x-4x! while it "seems" less efficient it actually worked quite a bit faster.

you can see the testbed and download the CFC package here. any comments appreciated.

November 24, 2004
icu4j dirty secret exposed!
recent icu enhancement requests have exposed a long suspected secret of IBM's Unicode Technology group folks--they have a wicked sense of humor;-) either that or they are truly the geekiest people on the planet. for example:

and now we all know why there's no persian calendar in icu4j....those rotten klingons are blocking it.

November 23, 2004
icu4j 3.2 released
talk about fast, IBM has just announced the release of icu4j version 3.2-- it was just in alpha first week of this month. they make the rest of us look like pikers ;-) and just in case you haven't been paying attention, this is a pretty significant release:
  • icu4j locale data is now 100% built from the CLDR 1.2 data, and has data for 232 locales!
  • the user guide got a major overhaul (not that anybody reads user guides but hey, they did overhaul it)
  • Universal Timescale conversions have been added that allow you to swap between binary datetimes on different platforms
  • Accept-Language, icu4j now provides a mechanism for parsing http_accept_language vars and matching against locales--no more parsing these ourselves, and i can tell you the ones from Apple boxes used to give me the dry heaves oops, this didn't make it into the final release (so apple http_accept_language vars are still making me sick)
  • RFC 3066 locale ID support has been added
  • and of course bug fixes

if you do any i18n work, you should pick up this release. you'll find it here.

November 10, 2004
icu4j 3.2 in alpha
IBM just anounced that icu4j version 3.2 in now in alpha. you can read more about it and download here. you can see API changes (tool generated) between 3.0 and 3.2 here.

this is a pretty significant release. to the already nifty features it adds:

  • icu4j locale data is now completely built from the CLDR 1.2 data which includes interesting locales like en_US_POSIX English (United States, Computer), eo Esperanto, fa_AF Persian (Afghanistan), kl_GL Kalaallisut Greenland), kw_GB Cornish (United Kingdom) and a whole bunch more. that's 230 icu4j locales vs 134 locales in core java!
  • icu4j now overloads it's methods that accept locales to take either java locales or it's own uLocales
  • Universal Timescale conversions
  • DateTimeFormat object initialization performance improvement!!
  • and of course bug fixes ;-)

there's also an eclipse how-to for icu4j.

all in all, its pretty cool.

October 21, 2004
cldr 1.2 in beta
the latest version of the cldr (1.2) has entered beta. of particular interest are the 'interim vetting charts' which gives you a sneak preview of what's been changed & what's coming for the release version. many of these are "common" changes such as localized territory names, etc. but there are some local stuff that's been "fixed".

in case you're interested, there's also a cldr wiki.

October 18, 2004
when a locale isn't a Locale
there was a recent discussion concerning using farsi (persian) language with cf. my first reaction was to point out that farsi locales (fa_IR iran and fa_AF afghanistan) weren't supported java locales, so that was that.

at about the same time there was an announcement on the icu4j mailing list about the next version being built on CLDR data. so i asked if that meant that we'd be able to make use of all the "new" locales in CLDR like farsi, etc. one of the icu4j guys (steven loomis) replied "yes" and further pointed out that icu4j 2.8 was already making use of icu4c's locale data. further discussion with steven helped debunk one of my long held misconceptions, that a java "locale" was a real world "Locale" (ie. the locale bundled up with all it's attendant resource data such as day/month names, etc.). "Locales are just identifiers" says steven, "duh!" says i. while it's convenient to think locales == Locales, formally in java "locale" refers to the identifier and not the data.

so what? what that means, if you're using icu4j for your i18n work (and you should), is that you have access to all the nifty locales that icu4j has no matter what core java supports (or doesn't support in this case). so something like this becomes possible (and easy):

<cfscript>
fullFormat=javacast("int",0);
farsiLocale=createObject("java","java.util.Locale").init("fa","IR");
utcTZ=createObject("java","com.ibm.icu.impl.JDKTimeZone").getTimeZone("UTC");
aDateFormat = createObject("java","com.ibm.icu.text.DateFormat");
aCalendar =createObject("java","com.ibm.icu.util.GregorianCalendar").init(utcTZ,farsiLocale);
dF=aDateFormat.getDateInstance(aCalendar,fullFormat,farsiLocale);
writeoutput("#farsiLocale.getDisplayName(farsiLocale)# #dF.format(now())#<br>");
</cfscript>

which produces:

Persian (Iran) دوشنبه، ۱۸ اکتبر ۲۰۰۴

note that the core java getDisplayName method falls back on "Persian (Iran)" which while not perfect is better than nothing. icu4j 3.0 ULocale class would actually produce the correctly localized name.

the more i work with icu4j, the more impressed i am with how well-thought it is. it really is the bees' knees for i18n work.

thanks to steven for enlightening me.

July 15, 2004
ic4uj news
i meant to get this public sooner but got busy. there's an issue with date formatting in the latest version (3.0) of icu4j. in version 2.8, you could normally get fully localized date formats including month/day names and localized digits (الخميس, ٢٧ جمادى الأولى, ١٤٢٥) in version 3.0 the digits aren't localized (الخميس, 27 جمادى الأولى, 1425). it seems the numberFormat class was using the default locale rather than the calendar's to format numbers. you can read more about these if you care to here:

IBM's found & fixed these, but not yet updated the jar.

you can see the bug in action here. beyond that bug, that page also shows off spike's oh so cool relative classpath technique. its actually loading & using two different versions of icu4j, none of which are in mx server's classpath. yeah i know, i'm easily impressed, but to my mind spike's technique is cool. it works around a whole lot of dependency issues we have had to live with.

in more icu4j news, IBM's also just announced the release of a new version of rbManager. we use this tool a lot--it's the cat's pajama's of rb tools.

June 19, 2004
icu4j 3.0 released
ibm's oh-so-cool icu4j i18n lib version 3.0 is out. you can read what's new from the press blurb. download here.

May 26, 2004
icu4j beta/collation
ibm has released another beta version of its supercool icu4j. these betas are also released as an executable JAR (i only noticed this with the first beta for 3.0), so you can jump right into testing.

while i was perusing the icu4j site i stumbled across this interesting page: collation performance comparison. wow! icu4j beats the snot out of the plain java JDK for collation over most locales (except for ja_JP and ko_KR locales, note that locales <> collation). i know that collation is of some interest to many i18n folks, so this is kind of interesting news.

May 11, 2004
oh how time flies...
IBM's mark davis has a proposal about "handling different binary formats of datetime". this is something i'd never given any thought to but one glance at table 1 (reproduced below) in the proposal makes me wonder why this hasn't come up before.

Table 1: Binary Time Scales

Source Datatype Unit Epoch
JAVA_TIME int64 milliseconds Jan 1, 1970
UNIX_TIME int32 seconds Jan 1, 1970
ICU4C double64 milliseconds Jan 1, 1970
WINDOWS_FILE_TIME int64 ticks (100 nanoseconds) Jan 1, 1601
WINDOWS_DATE_TIME int64 ticks (100 nanoseconds) Jan 1, 0001
MAC_OLD_TIME int32 seconds Jan 1, 1904
MAC_TIME ? seconds Jan 1, 2001
EXCEL_TIME ? days Dec 31, 1899
DB2_TIME ? days Dec 31, 1899

java and Unix while having the same epoch (origin) differ in datatype and units so they differ in accuracy and range. windows' time scales differ internally for OS vs file system (no snickering). at the current state of this proposal, he's chosen to use Windows datetime as a "universal 'pivot'". that gives a time scale range from 29,000 BC to 29,000 AD. i guess IBM really does take the long term view ;-)

if you want to provide feedback i guess you'll have to join the ICU mailing list.

so now you know.

January 30, 2004
ICU4J Version 2.8 released
IBM has released the latest version of its excellent ICU4J java lib. new good stuff includes:

  • historical timezones: "where daylight savings time rules or other related data have changed after the date in question". cool.
  • updated locales and more locale methods (to access stuff like paper page sizes, measurement systems, etc.). cool.
  • improved sorting (now does proper Thai Royal Dictionary order). way cool for me ;-)
  • XLIFF conversion tool (in case you're developing your own resource data)
  • a how-to for using eclipse with ICU4J
  • bug fixes, performance improvements, etc.

its available from this page.

December 12, 2003
ICU4J RuleBasedNumberFormat spellOut
one of the other bumps we encountered during the move from cf5 to mx for our municipal info system was a c++ cfx tag that spelled out numbers for reporting, reciepts, etc. that started going nutso with values greater than 65 million when we moved to mx (not to brag much but since we started working with this particular municipality its annual tax base has increased from 28 million baht to over 67 million baht). not sure if this was a side effect of the multi-byte cfxNeo.dll we introduced or that the original c++ code used datatypes that fell over after 64 million (we're still hunting for that code) but since we're porting to java based i18n functionality anyway i thought i'd see if there were any "stock" spellout methods around.

once again, ibm's icu4j comes through. its com.ibm.icu.text.RuleBasedNumberFormat class has a nifty format method with spellout rulesets for some locales (in this case we're only interested in thai but there are others available in the class). once i slapped a wrapper class around it's format method it was good to go. you can see it in action on this testbed. i'll make it and the wrapper class available once i get currency formatting setup and tested as well as figure out how to add other locale's rulesets (as well as get other rulesets' data, for instance i'd really like to see arabic locales rulesets').

one bone i have to pick w/mx's java support is the constant need to write wrapper classes to handle (dumb down) various format() methods. it makes distributing and maintaining some i18n CFCs more of a pain than need be. i was hoping some java guru might explain the whys and the wherefores, any takers?

November 9, 2003
ICU4J 2.6.1 released
IBM's has done a maintenance release for ICU4J 2.6.1. you can pick it up here.

quoting the ICU4J site:

list of significant changes for the 2.6.1 release:

-UCA 4.0 ICU has been updated to use the latest version of UCA - 4.0.

-Thai Royal Dictionary Collation: Thai collation tailoring has been updated to reflect the Thai Royal Dictionary ordering. Changes have been made to collation code in order to properly support invalid Thai sequences.

-Collation: parser/builder bug fixes: Several bugs in collation rule parser and builder have been fixed.

-Unicode character properties data has been synched with ICU4C

-Other bug fixes: Bugs have been fixed in layout engine (jitterbug number 3041), BiDi (3174), string functions (3243) and platform support (3097).