Viewing By Category : resource bundles / Main
March 11, 2006
javaRB/RBjava CFCs updated
i've added the new messageFormat method to the existing CFCs and re-arranged the versions a bit. you can download this tool here with a simple testbed here and a testbed for the new messageFormat method here.

there are now six versions of the resource bundle (rb) tool, the three major versions include:

  • coreJava: if you don't need other calendars, locales, etc. offered by IBM's ICU4J library. this version uses core Java's Locale and MessageFormat classes. It will operate on any coldfusion host that permits createObject().
  • icu4j: Requires the installation of IBM's ICU4J java library which can be obtained here. this version uses the library installed on cf's classpath. it makes use of ICU4J's ULocale, UResourceBundle, and MessageFormat classes. this allows for more locales than are supported by core Java as well as additional locale "keywords" such as calendar, currency and collation (for example, th_TH@calendar=buddhist).
  • remoteICU4J: also requires the installation of IBM's ICU4J java library. this version uses a slightly modified "remote" classpath technique for installations where you don't have access to the classpath. you will need to specify the full path to a copy of the icu4j.jar file.

In each of these versions you will find two CFCs:

  • javaRB which handles rb files that aren't on coldfusion's classpath (usually deployed on shared hosts)
  • rbJava which uses rb files that are on coldfusion's classpath, this is usually the more robust form of this tool

You will also find:

  • javaRB.cfm a simple testbed for the javaRB CFC
  • rbJava.cfm a simple testbed for the rbJava CFC
  • messageFormat.cfm a simple testbed demonstrating the messageFormat method
  • testJavaRB.properties base rb file
  • testJavaRB_en_US.properties en_US locale rb file
  • testJavaRB_th_TH.properties th_TH locale rb file

public methods in the CFCs:

  • getResourceBundle returns a structure containing all key/messages value pairs in a given resource bundle file. required argument is rbFile containing absolute path to resource bundle file. optional argument is rbLocale to indicate which locale's resource bundle to use, defaults to us_EN (american english)
  • getRBKeys returns an array holding all keys in given resource bundle. required argument is rbFile containing absolute path to resource bundle file. optional argument is rbLocale to indicate which locale's resource bundle to use, defaults to us_EN (american english)
  • getRBString returns string containing the text for a given key in a given resource bundle. required arguments are rbFile containing absolute path to resource bundle file and rbKey a string holding the required key. optional argument is rbLocale to indicate which locale's resource bundle to use, defaults to us_EN (american english)
  • formatRBString returns string w/dynamic values substituted. performs messageFormat like operation on compound rb string: "You owe me {1}. Please pay by {2} or I will be forced to shoot you with {3} bullets." this function will replace the place holders {1}, etc. with values from the passed in array (or a single value, if that's all there are). required arguments are rbString, the string containing the placeholders, and substitute. Values either an array or a single value containing the values to be substituted. note that the values are substituted sequentially, all {1} placeholders will be substituted using the first element in substitute. Values, {2} with the second, etc. DEPRECATED. only retained for backwards compatibility. please use messageFormat method instead
  • messageFormat returns string w/dynamic values substituted. performs MessageFormat operation on compound rb string. required arguments: pattern string to use as pattern for formatting, args array of "objects" to use as substitution values. optional argument is locale, java style locale ID, "th_TH", default is "en_US". for details about format options please see http://java.sun.com/j2se/1.4.2/docs/api/java/text/MessageFormat.html
  • verifyPattern verifies MessageFormat pattern. required argument is pattern a string holding the MessageFormat pattern to test. returns a boolean indicating if the pattern is ok or not

In addition, the remoteICU4J CFCs also have another public method:

  • getAvailableLocales returns an array of available locales. note that this method is only supplied as a convenience

PS: i've finally added a license.

March 6, 2006
MessageFormat or how not to read error messages
in an earlier post i was babbling on about how neat the com.ibm.icu.text.MessageFormat class was. i was also on about how you'd need a java wrapper class to really make use it. i thought that because whenever i tried something like:

<cfscript>
ozLocale="en_AU@calendar=gregorian";
thisPattern="On {0,date,short} at {0,time,short}, I left {1} for the {2}. I took {3,number,currency}";
thisLocale=createObject("java","com.ibm.icu.util.ULocale").init(ozLocale);
args=arrayNew(1);
args[1]=now();
args[2]="the office";
args[3]="microbrewery";
args[4]=javacast("int",100);
mf=createObject("java","com.ibm.icu.text.MessageFormat").
init(thisPattern,thisLocale);
thisMsg=mf.format(args);
</cfscript>

<cfdump var="#thisMSG#">

coldfusion would always throw an error at the thisMsg=mf.format(args) bit along the lines of: Error casting an object of type to an incompatible type. This usually indicates a programming error in Java, although it could also mean you have tried to use a foreign object in a different way than it was designed. which for some reason made me think it was because the format() method is overloaded and i couldn't figure out the right combination of argument classes to get it to work. my knee jerk reaction to this is to build a wrapper class and move on, which i promptly did.

i was puttering around with something this weekend (a method to count business days using icu4j's Holiday class) when i actually got the overloaded method error (while trying to add my birthday as a national holiday in the US virgin islands, en_VI). re-visiting the format() method errors it finally dawned on me that the error message was perfectly accurate and the real issue (besides me being a knee jerk reactionist and thick as a brick) was with the args array. coldfusion arrays aren't exactly java Arrays (if i recall correctly they're java.util.Vectors). back in the Triassic era, christian cantrell's blog had an entry concerning this problem where he pointed out a simple solution using the inherited toArray() method. so changing thisMsg=mf.format(args) to thisMsg=mf.format(args.toArray()) made that method work plenty fine. initial benchmarks show this java-based method to be considerably faster than our in-house one, not to mention saving all the locale formatting code we had to use prior to substituting the actual data. we'll be releasing updates to our resource bundle CFCs incorporating this new method sometime this week.

the sharp-eyed among you probably noticed the peculiar way i defined the locale en_AU@calendar=gregorian. icu4j locales (ULocales to be precise) have, besides the usual language, country, variant identifiers, keywords. keywords allow you to create a locale using a specific calendar, collation or currency (see the ICU user guide for details). in practice that means you can control the way MessageFormat formats your dates and currencies without having to mess around with them prior to submitting the data to the format() method. you can use any of the seven odd calendars that icu4j knows about, for instance en_AU@calendar=buddhist would produce dates formatted using the Buddhist calendar (BE), en_AU@calendar=islamic-civil would format dates using the civil version of the Islamic calendar, etc. very cool if you ask me. this is another area where icu4j kind of glances in the rear-view mirror as it blows by core java's i18n bits ;-)

March 1, 2006
an unstealthy icu4j upgrade
IBM has announced a maintenance release for icu4j, version 3.4.3. among the goodies for this version are:
  • Olson 2006a time zone data (just in time to get ready for the new DST in the US)
  • corrects mistakes in the CLDR data found in icu4j 3.4.2
  • MessageFormat (like core java's but it can use icu4j's super cool ULocale class) upgraded to @stable"
  • fixed bugs in DateFormat, SimpleDateFormat, etc.
  • and a bit more trivial (to me) but should make some folks happy this release no longer tags "@draft" APIs with "@deprecated" by default--though why they ever did that in the first place is a bit of a mystery to me

the MessageFormat class is kind of cool in that it handles compound rb strings (which i'd rather have never learned about) such as: "At {1} on {2}, there was {3} on planet {4}". in the past, we normally handled this with in-house methods which are somewhat cumbersome in that we needed to do any date/numeric/currency formatting on the substituted values for the message's placeholders (the bits in between the {}) prior to formatting the message. now using the com.ibm.icu.text.MessageFormat you could do something like:

<cfscript>
   mfObject=createobject("java","com.ibm.icu.text.MessageFormat");
   args=arrayNew(1);
   args[1]=now();
   args[2]="the office";
   args[3]="microbrewery";
   // pass in the message string and substitution arguments    thisMsg=mfObject.format("On {0,date,full} at {0,time,full}, I left {1} for the {2}.",args);
   writeoutput(thisMsg);
</cfscript>

which would produce something like (in the en_US locale) "On Wednesday, March 1, 2006 at 8:44:22 PM GMT+07:00, I left the office for the microbrewery.".

to explain a bit more : {0,date,full} is a placeholder that takes the first element in the args array (java arrays start at 0) and applies localized date formatting with the "full" style. {0,time,full} ditto but uses time formatting and {1} and {2} are placeholders for simple strings.

however in order to make this more flexible (ie. use locales other than the server's default), you'll have to use a simple java wrapper class--the MessageFormat format method is overloaded and coldfusion can't easily use it's other "flavors" which require StringBuffer and FieldPosition classes.

December 15, 2004
two new i18n tidbits
first, the latest version of the Unicode Standard (4.1.0) which is due out in march, 2005 is now in beta. some of the new stuff i find interesting are:
  • newly added complete scripts such as new Tai Lue script (it's used in the yunnan area of southern china and south to northern thailand) among others
  • "very significant extensions to the repertoire for the Arabic script"
  • new chars were added to support "roundtrip mapping support for HKSCS and GB 18030"
  • i also find it interesting that "106 CJK compatibility ideographs has been added to support roundtrip mapping to the DPRK standard"--you know, north korea

now, i guess i'm going to have to rework my uBlock CFC. you can read more about the new unicode beta here.

next since i'm always ragging on core java's i18n support, i'd thought i'd point out a nifty new tech tip at Core Java Technologies Tech Tips dealing with resource bundles. this tech tip examines when and where you should be using ListResourceBundle vs PropertyResourceBundle. we normally use PropertyResourceBundle when applications can't access the classpath (ala the javaRB CFC) and plain ResourceBundle when it can (with rbJava CFC). as an added benefit this article gets into some testing using java 5.0 (or 1.5) new nanoTime() method (as in nanoseconds) as well as offering a link to a java one presentation on how not to write a benchmark.

both are pretty good reading.

September 17, 2004
new version of rbManager
i just discovered IBM's released a new (minor version upgrade) version of it's nifty rbManager tool. you can pickup version 0.7.1 here (scroll to the bottom of the page).

i'm not exactly sure what was changed but i suspect it was a few bugs we encountered with the initial 0.7 release. anyway's its "new".

May 31, 2004
i18n good practices: resource bundles
one of the dreariest bits of i18n work is dealing with strings, especially for retro-fitting existing apps. you'll have to comb thru the existing code substituting resource bundle (rb) keys for existing strings. while regex filters, etc. help, nothing beats a pair of "mark IV eyeballs". in order to keep this task within the bounds of tolerable cruelty, there are a few simple things you might keep in mind when developing cf applications:
  • case: not ever language has case, Thai for instance doesn't, so PERMISSIONS, Permissions and permissions would be represented by the same string. in languages that do have case, those kinds of case permutations are plainly cosmetic (i was going to say cosmetic nonsense but thought better). if there's a real application need for this sort of thing, say to accent some heading, it should be handled via CSS and not hardcoded. hardcoded case strings make the difficult i18n process even more so. think twice before you get carried away with case, especially if you find yourself writing complex <cfif> blocks to handle it.
  • pluralization: not every language deals with plurals the same as English, simply adding a letter ("s" for instance) hardly ever cuts it and in some instances the language structure is completely different (the English phrase "five wood blocks" becomes something like "block of wood five units" in Thai). while you can blow off quite a few CPU cycles with complicated logic to handle plurals, i contend that item(s) is just as understandable as

    <cfif someQ.recordCount GT 1>items<cfelse>item</cfif>

    and has the added benefit of i18n simplicity. otherwise you'll have to add another set of rb keys (plural forms vs singular forms) and logic to handle pluralization.

  • compound strings: compound strings are, besides being my pet peeve, strings that contain substituted values. for example, "You owe me #dollarFormat(amountDue)#. Please pay by #dateFormat(normalDueDate)# or I will be forced to shoot you with #numberFormat(budgetQ.bulletsPerDeadbeat)# bullets. Thank you." if you do much i18n research you'll often see folks recommending you avoid compound strings like the plague (for instance, the API for the messageFormat java class comes right and says this). why? because they're hard to handle. first you have to figure out the logic and in some cases its not going to be trivial. then you have to rework the rb string to use place holders for localization ("You owe me {1}. Please pay by {2} or I will be forced to shoot you with {3} bullets. Thank you.") . finally you have to substitute the intended values at runtime--newer versions of my javaRB and RBjava CFC have methods for this. its often much easier to simply rewrite the compound string.
  • floating prepositions: these are perhaps a form of compound string but often can't be handled like them. i sometimes encounter extremely complicated output logic/displays or HTML form elements separated by a preposition (usually "at", "by" or "in"). in its simplest form it might be "dateValue at timeValue" (which actually can be handled as a compound string) but more often then not it's much more complicated. if i can get my way, we normally send floating prepositions to the garbage dump, i mean most folks would have no problem understanding "dateValue timeValue".

i suppose many folks might find this trivial but it adds time and complexity to an already time-consuming and complicated process.

March 7, 2004
javaRB CFC updated and some milk and cookies
i was working on a project the last week where we used the javaRB cfc to handle resource bundles. after we implemented some Thai language bundles i noticed that the actual way it was finding files was not quite "standard" (ie it didn't exactly follow the java way of things). so i re-jigged the file finding logic to better mimic java's logic, it now searches for rbFile and locale, then rbFile and language, and finally just the base rbFile. you can find the updated CFC on its testbed. it will eventually bubble up on the devnet exchange.

i almost never rely on cookies, so when i read something in the support forums last week it was an eye-opener (actually more like, "geez! how did i ever overlook that!"). laurent (a frenchman transplanted to the land of vegemite, can you imagine;-) made an excellent catch of this issue by reminding us that cookies are also a scope in mx just like url & form and should have their encoding set as well, for instance (something that might go in your application.cfm along with setEncoding for url and form scopes):

setEncoding("cookie","utf-8")

so now you know.

January 16, 2004
resourceBundle gotcha
continuing in the same week long obsession with resource bundles, i thought i'd point out a potential "gotcha" concerning the java flavored resourceBundles before it induced any psychotic episodes--cf folks used to dealing with structures (or those people using the cf-based UTF-8 resourceBundles) might be particularly susceptible to this.

using cf structures you could always build a key value pair like (i don't think its such a hot idea but you could):

montyPython=structNew(); montyPython["ministry of silly walks"]="too funny for words";

as long as you referenced the montyPython structure using this sort of syntax montyPython["ministry of silly walks"] all was well with the world. you could just as easily use this style in cf-based UTF-8 resourceBundles (again not a good idea but you could):

ministry of silly walks=too funny for words

because the resourceBundle CFC (or whatever you're using but should behave similarly) would simply parse this as a list delimited with an "=", stuffing the left side into a structure as a key with the right side as that key's value.

fine and dandy but this won't cut it with java flavored resourceBundles. "why" you ask? because java resourceBundles' keys are defined (according to the java.util.Properties API) as:

"The key consists of all the characters in the line starting with the first non-whitespace character and up to, but not including, the first ASCII =, :, or whitespace character."

so "ministry of silly walks=too funny for words" would be equivalent to "ministry=of" when parsed by either of the two java resourceBundle classes i've been going on about lately. and that of course might cause a bit of head scratching and finger pointing....

so now you know.

January 9, 2004
cf resource bundle flavors
last week (02-jan-04) i droned on about the three types of resourceBundle (rb) methods that can be used in cfmx. this week i'd thought i'd flap my lips about the two flavors of resourceBundle files used with these three methods.

let's deal with the simplest one first (for use with resourceBundleCFC). its nothing more than a utf-8 encoded text file of key/value pairs. something like the following:

englishFive=5

thaiFive = ๕ (you will need a thai or unicode font to read this)

these types of rb files can be easily created using notepad (yes notepad), dreamweaver, or any sort of text editor capable of producing utf-8 encoded files (unfortunately not cfstudio, in case you were wondering). as you can see, these are human readable. this flavor of rb files are easily and directly accessible by cf. the downside to all this goodness is that it can spiral out-of-control with large, complex rb files covering many locales (languages).

the other rb flavor is based on java style rb files (because it makes use of java resourceBundle or PropertyResourceBundle classes) and similarly consists of key/value pairs in a text file but the "value" text is ASCII escaped unicode (\uXXXX where XXXX is the unicode code point expressed as a hexadecimal value). for instance:

loatianFive=\u0ED5

bengaliFive=\u09EB

thaiFive=\u0E55

the javaRB CFC can handle this type of rb file. creating these types of files is a bit more complicated (unless you are one of those very rare individuals who have the whole of the unicode in your head) and is usually handled by external tools such as the command line native2ascii supplied with normal java installs (in the bin dir) or the nifty rbManager tool from IBM.

recent experience tells me that this might be a concept some folks will have trouble understanding so here's a snippet that actually builds and reads this flavor of rb file (its part of the guts of an rbManager cf clone i've been building off and on):

<cfscript>
// set up some constants
thaiFive=chr(3669);
tibetianFive=chr(3877);
loatianFive=chr(3797);
tamilFive=chr(3051);
bengaliFive=chr(2539);
arabicFive=chr(1637);
malayamFive=chr(3432);

// java objects
prop=createObject("java","java.util.Properties");
fos = CreateObject("java", "java.io.FileOutputStream");
fis = CreateObject("java", "java.io.FileInputStream");

// resourceBundle
rbFile=getDirectoryFromPath(expandpath("*.*")) & "test.properties";

// build test property file (as a basis for resourceBundle)
fos.init(rbFile);
prop.setProperty("thaiFive","#thaiFive#");
prop.setProperty("loatianFive","#loatianFive#");
prop.setProperty("tibetianFive","#tibetianFive#");
prop.setProperty("tamilFive","#tamilFive#");
prop.setProperty("bengaliFive","#bengaliFive#");
prop.setProperty("arabicFive","#arabicFive#");
prop.setProperty("malayamFive","#malayamFive#");
prop.store(fos,"test: brought to you by the number five");
fos.close(); // done close output file

//get property file & dump keys
fis.init(rbFile);
prop.load(fis);
fis.close(); // done close input file
keys=prop.propertyNames();
writeoutput('<font face="Arial Unicode MS">');
while (keys.hasMoreElements()) {
thisKEY=keys.nextElement();
thisMSG=prop.getProperty(thisKey);
writeoutput("#thisKEY# = #thisMSG#<br>");
}
writeoutput("</font>");
</cfscript>

the rb file produced by this snippet would be something like (note that its a bunch of locales jumbled together, absolutely NOT what you'd do in production but you get the idea) :

#test: brought to you by the number five
#Thu Jan 01 19:04:53 GMT+07:00 2004
malayamFive=\u0D68
loatianFive=\u0ED5
bengaliFive=\u09EB
thaiFive=\u0E55
arabicFive=\u0665
tibetianFive=\u0F25
tamilFive=\u0BEB

output would be something along these lines (again you'll need some unicode capable font to properly read these):

malayamFive = ൨
loatianFive = ໕
thaiFive = ๕
bengaliFive = ৫
arabicFive = ٥
tibetianFive = ༥
tamilFive = ௫

so now you know.

January 1, 2004
quick review of resource bundles methods for cf
first let me dispatch the notion of using cf code in lieu of resourceBundles (rb). its a bad habit that might work with very small files for a couple of languages but will eventually breakdown as your g11n apps become more complex and cover more and more languages (locales). so if you're just beginning g11n work, don't start with this method no matter how tempting it looks. and if you're already using this approach, quit while you're ahead. mingling code and text like that is just a bad idea.

last year (well last week) i was mildly berated by some java folks for suggesting using either utf-8 based cf "resourceBundles" or using the PropertyResourceBundle java class instead of the more typical ResourceBundle. oh the shame, but from a cf prespective though, those java folks were just being sort of snobbish. depending on your cf app needs it seems acceptable to use either rb method. below you'll find a quick and dirty comparison between the two less "normal" methods and the more traditional java method. each has their pros and cons however for me the biggest negative associated with using the "pure" java ResourceBundle approach is it's requirement that rb always be in a classpath. thats a show stopper for many shared hosts. though it won't stop me from releasing an rb CFC using that style ;-)

resourceBundle style pro con
CFMX UTF-8
  • human readable
  • easy to manage (notepad, etc.)
  • simple to implement in MX
  • quite fast
  • complex rb quickly become hard to manage
  • can't easily use standard rb tools
java ResourceBundle
  • pure standard java rb solution
  • handles rb from standard tools
  • self determines rb for locale
  • handles complex rb quite easily
  • not human readable
  • requires rb be somewhere in classpath
  • requires createObject permission
  • some overhead in using java object
java PropertyResourceBundle
  • rb can be anywhere
  • pure standard java rb solution
  • handles rb from standard tools
  • handles complex rb quite easily
  • not human readable
  • requires caller to determine rb from locale
  • requires createObject permission
  • some overhead in using java object

i'd appreciate any feedback on this.

December 12, 2003
another nifty resource bundle tool
if you do i18n work you should probably already know about IBM's cool resource bundle manager. while plumbing the depths of the java i18n forums, i stumbed onto another one, attesoro that looks equally functional. like IBM's tool, its a pure java solution and produces proper java resourceBundles (ie. unicode chars are encoded using escaped ascii, \u0000 style). these are a little difficult to deal with in cf as you have to spend some resources to parse the data--i normally save resource bundles as utf-8 to get around this, it also helps with managing translations as humans can see human readable text data. in any case this looks like another decent weapon for your i18n arsenal.


java resource bundles
while i've been using UTF-8 based resource bundles for some time now, larger, more complex projects really need tools like IBM's rbManager to help manage resource bundle creation/translation. the problem with these are that their text messages are stored as ANSI escaped chars: Go=\u0E44\u0E1B (in thai, ??). this requires quite a bit of extra cf processing to parse these types of "pure" java resource bundles (rb).

i've been trying off and on for some months now to make use of the underlying java resourceBundle classes to handle rb files but haven't had much success (mainly because java expects rb files in class paths and thats not something i can live with on some projects nor could i find a simple workaround). while staring at some limestone rocks on saturday i had a micro epiphany about java.util.PropertyResourceBundle class. this class handles rb files from an input stream (ie you can pump in the rb file content from anyplace on the server). badda bing (i actually thought that at the time ;-) here's some test code i whipped up:
<cfscript>
thisDir= GetDirectoryFromPath(expandpath("*.*"));
rbFile=thisDir & "test_th_TH.properties";
rb = createObject("java", "java.util.PropertyResourceBundle");
fis = CreateObject("java", "java.io.FileInputStream");
fis.init(rbFile);
rb.init(fis);
keys=rb.getKeys();
writeoutput("resourceBundle=#rbFile#<br>");
while (keys.hasMoreElements()) {
thisKEY=keys.nextElement();
thisMSG=rb.handleGetObjectthisKey);
writeoutput("#thisKEY#=#thisMSG#<br>");
}
</cfscript>

as you can see its quite simple, so simple i built it into a javaRB.cfc. you can see it in action here.

limestone rocks, who would have thought?