- Unicode 5.1
- locale data: Common Locale Data Repository (CLDR) 1.6
- charset converter file size improvement
- date interval formatting (note only gregorian calendar is supported n this release)
- improved plural support
specific icu4j changes include:
- charset
- ICU2022 converter
- HZ converter
- SCSU/BOCU-1 converter
- charset converter callback
- thai dictionary break iterator (yeah)
- JDK TimeZone support (this is pretty decent as you can now share tz IDs between coldfusion/core java & icu4j)
- locale service provider
- more convenient formatting of year+month, day+month, and other combinations
- simple duration formatting
- locales
- date and time formatting
- numerical formatting
- resource bundles
- unicode, transliteration, and character sets
right now its all java content but as you know from reading this blog, it's still very applicable to coldfusion. so have a look, it's worth a visit.
ps: he's promised to add flex and coldfusion content. so let's all hold his feet to the fire ;-)
i was a bit perplexed by this, mainly as we usually deal with locales which have writing systems that don't have a concept of case but after poking around core java's String class it seems that cf wasn't using the overloaded versions of the toUpperCase()/toLowerCase() methods which pass in a locale to use to handle locale sensitive case. easy enough to fix in cf (i really love how easily coldfusion lets you workaround these little issues):
<cfargument name="inString" required="true" type="string" hint="string to lower case">
<cfargument name="locale" required="false" default="en_US" type="string" hint="java style locale identifier to use to lower case input string">
<cfscript>
var thisLocale="";
var l=listFirst(arguments.locale,"_"); // language
var c=""; // country, we'll ignore variants
if (listLen(arguments.locale,"_") GT 1)
c=uCase(listGetAt(arguments.locale,2,"_"));
// build locale
thisLocale=createObject("java","java.util.Locale").init(l,c);
return arguments.inString.toLowerCase(thisLocale);
</cfscript>
</cffunction>
<cffunction name="toUpperCase" output="false" returntype="string" access="public">
<cfargument name="inString" required="true" type="string" hint="string to upper case">
<cfargument name="locale" required="false" default="en_US" type="string" hint="java style locale identifier to use to upper case input string">
<cfscript>
var thisLocale="";
var l=listFirst(arguments.locale,"_"); // language
var c=""; // country, we'll ignore variants
if (listLen(arguments.locale,"_") GT 1)
c=uCase(listGetAt(arguments.locale,2,"_"));
// build locale
thisLocale=createObject("java","java.util.Locale").init(l,c);
return arguments.inString.toUpperCase(thisLocale);
</cfscript>
</cffunction>
<cfscript>
s="#chr(105)##chr(305)##chr(223)#";
upperS=toUpperCase(s,"tr_TR");
lowerS=toLowerCase(upperS,"TR_TR");
writeoutput("input string: #s#<br> upper case: #upperS#<br>lower case: #lowerS#");
</cfscript>
notice how i didn't have to mess with the core java String class, i could just use it's methods on a cf string.
even if you're not using tr_TR locale, you should note that "ß" (small letter sharp s) is also a special case, upper casing it actually turns it into 2 letters, "SS". i think there might also be some issues with some Greek characters as well.
- that it has upgraded it's resource data to Unicode 5.1 and CLDR 1.6
- added date interval formatting (ie Jan 10, 2008 to Jan 20, 2008 becomes Jan 10-20, 2008, 10:10am to 11:10am becomes 10:10-11:10am, etc.). downside is that currently it's only gregorian calendar)
- added DurationFormat so you can now format over a duration in time such as "2 days from now" or "3 hours ago".
- added "Locale Service Provide" support for core java's new locale service--many folks just want the filthy-rich and frequently-updated locale data that icu4j has and not the whole library. i wonder if there is a way to backdoor this into coldfusion's locales?
you can grab the jar files/api docs and read more about the new stuff here.
how on earth do you think you can coordinate a global project by not giving folks useful info? geez.
i've been hitting the download link throughout the day, thinking maybe the mozilla folks were all east US coasters (really no idea, just a WAG) & i'd see something around noon here in bangkok. nope. nothing. butkis. just version 2.0.0.14.
oh well. in case anybody's missed the link, go here: http://www.spreadfirefox.com/.
- uses the latest cldr 1.5.0.1 locale data
- the long discussed rule based timezone changes which gives us the ability to read and write timezone data in RFC2445 VTIMEZONE format as well as also providing access to Olson timezone transitions! this is something many people have been needing for quite some time now, this is going to be very useful
- tawainese calendar (a flavor of gregorian calendar that numbers years since 1912AD)
- the Indian National Calendar (more complicated flavor of the gregorian calendar, eg it's synched up with the gregorian calendar's leap years but the extra day is added to the first month, Chaitra which starts march 22 on gregorian calendar--so, yup, it's complicated)
- charset conversion bugs were fixed and CESU-8, UTF-7 and ISCII converters have been added. also some conversion speed improvements. the UTF-7 one will be useful for email (bounce) handling
- a new MessageFormat type for plurals was added
- a pretty useful new DurationFormat class was added so you can format messages over a duration in time such as "2 days from now" or "3 hours ago"
- also the MessageFormat class will now take named arguments instead of just arrays (too bad now that coldfusion 8's javacast got a shot of steroids)
- new BIDI stuff (which i still need to investigate)
next month i'll be adding the new calendars as CFCs to the usual bits. i'll also be doing some significant changes to most of the i18n formatting methods to take better advantage of the calendar, etc. keywords (en_GB@calendar=indian,currency=EUR) on the ULocale class (icu4j's super cool locale class).
unfortunately the persian calendar still appears to be only in icu4c (C/C++) only.
- it uses the latest and greatest cldr 1.5 locale data
- the long discussed rule based timezone changes which gives us the ability to read and write timezone data in RFC2445 VTIMEZONE format as well as also providing access to Olson timezone transitions! this is stuff many people have been looking for, this is going to be very useful
- tawainese calendar (which i never knew existed, looks like a flavor of gregorian calendar that numbers years since 1912AD)
- the Indian National Calendar (ditto though looks like a more complicated flavor of the gregorian calendar, eg it's synched up with the gregorian calendar's leap years but the extra day is added to the first month, Chaitra which starts march 22 on gregorian calendar--so, yup, it's complicated)
- charset conversion bugs were fixed and CESU-8, UTF-7 and ISCII converters have been added. also some conversion speed improvements. i think the UTF-7 one looks pretty useful
- a new MessageFormat type for plurals was added, looks like some eastern european languages have complicated rules for plurals
- a new DurationFormat class so you can format messages over a duration in time such as "2 days from now" or "3 hours ago" (this one looks useful)
- also the MessageFormat class will now take named arguments instead of just arrays (too bad now that coldfusion 8's javacast got a shot of steroids)
- bunch of new BIDI stuff (which need some investigating)
i'll be adding the new calendars as CFCs to the usual bits as soon as i do enough background research on them to understand any "quirks". i'll also be doing some significant changes to most of the i18n formatting methods to take better advantage of the calendar, etc. keywords (en_GB@calendar=indian,currency=EUR) on the ULocale class (icu4j's super cool locale class).
looks like a persian calendar was also added but appears to be only in icu4c (C/C++) only for the time being.
wow, fun times in the old town tonite (it's actually in the AM in bangkok but you get the idea).
i guess we can expect to see this in JDK 1.6 update 4 (latest update is 2). i wonder if i should just pile all the CLDR vs core java locale differences (there's a lot) into a single java bug report?
first this article confirms that PHP's unicode/i18n support really is lame (also see this article for a bit older take on PHP's unicode/i18n support, i especially liked the Unicode should have been in PHP five years ago quote). but more importantly, and what's surprising to me, is that they're actually doing something about it by adopting ICU. going from being an i18n joke to fully supporting unicode/i18n via the ICU project. i know next to nothing about the PHP world so i have no idea if this is really happening (or has already happened) or is just hot air but it looks like they're on the right track with ICU.
wonder if there's a lesson here?
- zh_SG - Chinese (Simplified), Singapore
- en_MT - English, Malta
- en_PH - English, Philippines
- en_SG - English, Singapore
- el_CY - Greek, Cyprus
- id_ID - Indonesian, Indonesia
- ga_IE - Irish, Ireland
- ms_MY - Malay, Malaysia
- mt_MT - Maltese, Malta
- pt_BR - Portuguese, Brazil
- pt_PT - Portuguese, Portugal
- es_US - Spanish, United States
hopefully this trend will continue.
beyond the new locale data it also provides support for the Japanese Imperial Calendar which we can tap into for date conversion and formatting simply by setting coldfusion's locale to the new JP variant:
// set appropriate locale
setLocale("ja_JP_JP");
// Japanese Imperial Calendar date format writeoutput("#lsDateFormat(now(),"FULL")#");
</cfscript>
which should give you something like: 平成19年7月26日 how cool is that?
for more details on the new i18n bits in JDK 6 see this.
core java's locale data for en_AU (Australia) and en_NZ (New Zealand) time formats is a bit off. it uses a format of H:mm:ss where the "H" stands for 24 hour clock, ie 5:00 PM would be formatted as 17:00. the CLDR (common locale data repository) however states that the time format for en_Au & en_NZ locales is h:mm:ss a (well actually it's proposed to include the timezone, "h:mm:ss a z" see the en_AU time format entry here). while most users in those locales are smart enough to get that 17:00 is 5:00 PM when your ColdFusion app outputs time values, it would play havoc when ColdFusion tries to parse what those same folks would normally input for a time value.
so hey en_AU and en_NZ locale people, time to start helping yourselves. Sun has accepted this as a new bug, go vote for it (you have to be a member of the Sun Developer Network to vote but these days, who isn't).
i just hope it works for the sake of those using it.
- native resource bundles (heck flex 2.0 got them, but frankly that's about all it got in terms of i18n)
- setTimeZone() function that might allow me to find my way out of timezone hell
- use icu4j library (used in a modular/plugin fashion, one of the really sweet things aboout this project is how often it's updated with new functionality and improved locale data from the CLDR). this would buy us better locale data, offer easier access to non-gregorian calendars, etc.
and that's it.
i guess you can take this posting as a stealthy complement to the good work the CF team has done over the years to get ColdFusion to it's current i18n state.
// remote init jarFile=jarLocation & "icu4j.jar";
URLObject = createObject('java','java.net.URL');
URLObject.init("file:" & jarFile);
URLArray = createObject("java","java.lang.reflect.Array").
newInstance(URLObject.getClass(),1);
arrayClass = createObject("java","java.lang.reflect.Array");
arrayClass.set(URLArray,0,URLObject);
loader = createObject("java","java.net.URLClassLoader");
loader.init(URLArray);
uLocale=loader.loadClass("com.ibm.icu.util.ULocale").newInstance();
</cfscript>
<cfdump var="#uLocale#">
while i've managed to workaround this issue (ULocales are everywhere in icu4j, most classes that deal with locales have a getAvailableULocales() method) it's always kind of nagged at me. after a bit of poking and prodding i started looking into ways to get at the actual constructors for a given class:
jarFile=jarLocation & "icu4j.jar";
URLObject = createObject('java','java.net.URL');
URLObject.init("file:" & jarFile);
URLArray = createObject("java","java.lang.reflect.Array").
newInstance(URLObject.getClass(),1);
arrayClass = createObject("java","java.lang.reflect.Array");
arrayClass.set(URLArray,0,URLObject);
loader = createObject("java","java.net.URLClassLoader");
loader.init(URLArray);
uLocale=loader.loadClass("com.ibm.icu.util.ULocale"); // don't init c=uLocale.getConstructors();
for (j=1; j LTE arrayLen(c); j=j+1) {
params=c[j].getParameterTypes();
for (i=1; i LTE arrayLen(params); i=i+1) {
writeoutput("ULocale[#j#]: #i# #params[i].getName()#<br>");
}
writeoutput("<br>");
}
</cfscript>
which in this case returned 3 constructors (just like the API says but not in the javadocs order):
ULocale[1]: 1 java.lang.String ULocale[1]: 2 java.lang.String ULocale[1]: 3 java.lang.String
ULocale[2]: 1 java.lang.String
ULocale[3]: 1 java.lang.String ULocale[3]: 2 java.lang.String
which i can easily match to the one i want (ULocale("th_TH")):
// remote init jarFile=jarLocation & "icu4j.jar";
URLObject = createObject('java','java.net.URL');
URLObject.init("file:" & jarFile);
URLArray = createObject("java","java.lang.reflect.Array").
newInstance(URLObject.getClass(),1);
arrayClass = createObject("java","java.lang.reflect.Array");
arrayClass.set(URLArray,0,URLObject);
loader = createObject("java","java.net.URLClassLoader");
loader.init(URLArray);
uLocale=loader.loadClass("com.ibm.icu.util.ULocale");
c=uLocale.getConstructors();
// the newInstance method wants an array
obj=listToArray("th_TH");
// we want the 2nd constructor
thaiLocale=c[2].newInstance(obj.toArray());
</cfscript>
<cfdump var="#thaiLocale#">
which indeed returns an object of com.ibm.icu.util.ULocale.
since in most cases, i only use one way to init a given class, this technique will work OK for us. my only question is will the order of constructors remain the same? can i always count on the 2nd constructor to be ULocale("th_TH")? or should i build metadata functionality to probe the constructors to see which one matches?
ps: i did indeed learn my lesson, notice how i passed the coldfusion array using toArray() ;-)
eclipse (not cfeclipse) doesn't add a BOM to UTF-8 encoded files. why? well
- the BOM isn't actually required as part of the definition of UTF-8 (and i know of plenty of s/w that either doesn't write one out or in fact strips them from files)
- in the past (i think) the java compiler wouldn't compile a file w/a BOM & since that's what eclipse was originally meant for, NOT having a BOM makes perfect sense (from a very a quick test i just ran it seems this is no longer true, at least from within eclipse)
so why was our cfeclipse-edited UTF-8 encoded code working? because we follow our own good i18n practices and liberally use encoding hinting starting with the cfprocessingdirective. each of our coldfusion pages starts with:
BOM or no BOM, this ensures your code will be always be interpreted as UTF-8. for more good i18n practices grab a copy of the advanced coldfusion book.
see? good i18n practices really are good.
- how to make utf-8 HTML pages which is a good read even if it does contain a bizzare note about windows notepad and the BOM.
- determining a file's encoding most notable for it's advice, basically use a browser ;-)
- some i18n sun blogs (none of which i knew about):
- i18n G.A.L. For all things international, only some of them software...
- norbert lindenberg's blog sun's technical lead for java i18n (he doesn't like these kinds of abbreviations, which is too bad because i do)
- tim forster's blog mostly about translation tools
note that this version of the persian calendar uses a "well-known arithmetic algorithm for calculating the leap years" rather than astronomical calculations.
i'd like to publicly thank Dr. Ghasem Kiani for his work on this project, we've been waiting quite a while for a persian calendar to round off our i18n calendars. thanks.
that i18n guy about town, tex texin, has put together a good document concerning the use of RFC 3066 language identifiers. you might lend a hand by perusing the table for any funny business (maybe like sinhalese in thailand--but hey, what do i know).
and just when i thought i knew everything about encoding (maybe because i actually think all you really have to know is Just Use Unicode), i find out something new. while doing some research in the java i18n forums i stumbled onto a really nifty java encoding resource, part of a java and internet glossary. i especially liked the term armouring (which i had never heard used in this context before): Converting binary data into printable gibberish so that data transport systems will not corrupt it. so that's what it's called.
- newly added complete scripts such as new Tai Lue script (it's used in the yunnan area of southern china and south to northern thailand) among others
- "very significant extensions to the repertoire for the Arabic script"
- new chars were added to support "roundtrip mapping support for HKSCS and GB 18030"
- i also find it interesting that "106 CJK compatibility ideographs has been added to support roundtrip mapping to the DPRK standard"--you know, north korea
now, i guess i'm going to have to rework my uBlock CFC. you can read more about the new unicode beta here.
next since i'm always ragging on core java's i18n support, i'd thought i'd point out a nifty new tech tip at Core Java Technologies Tech Tips dealing with resource bundles. this tech tip examines when and where you should be using ListResourceBundle vs PropertyResourceBundle. we normally use PropertyResourceBundle when applications can't access the classpath (ala the javaRB CFC) and plain ResourceBundle when it can (with rbJava CFC). as an added benefit this article gets into some testing using java 5.0 (or 1.5) new nanoTime() method (as in nanoseconds) as well as offering a link to a java one presentation on how not to write a benchmark.
both are pretty good reading.
it doesn't do much except format/convert gregorian dates to the persian calendar and back again (right now it can only parse medium/short persian date formats). still lacks calendar math, real persian date string parsing, arabic-hindic digits date formats, etc.
so what's a persian (or iranian) calendar? why it's the formal calendar in general use in iran, also known as the solar hijri calendar and sometimes as the jalali calendar. i've also seen it described as the shamsi calendar. frankly i have no idea which is correct so i'll stick with "persian". since it's one the few calendars designed in the era of accurate positional astronomy, it's probably the most accurate solar calendar around. you can read more here or here.
i've also been looking at this java calendar class. it has a boatload of calendars (besides persian it has mayan, nepali, hindu, coptic and believe it or not a french revolutionary calendar).
there is an on-going discussion on the unicode list about "internationalization assumption" which simplistically goes something along the lines of if latin-1 is tested ok can we assume all latin-1 languages are "a-ok"? as it turns out, "no". some of the folks participating in this discussion have pointed out that, for example, not all french chars are found in latin-1. my first thought on reading that was, "oh yeah, the euro" but as it turns out there are a couple of french chars (no idea of their frequency of use but they are used in the french words for eye, egg, beef and heart) that are not in latin-1 but are in latin-9. for example see jukka korpela's excellent latin-1/latin-9 comparison page. these chars are also found in windows 1252 code page (which i guess helps support the idea that it's actually a superset of latin-1).
the moral of the story? just use unicode
i'm not exactly sure what was changed but i suspect it was a few bugs we encountered with the initial 0.7 release. anyway's its "new".
by way of web globalization news.
tex texin has pretty good explaination of the main issues. his article includes:
- an overview of Turkish characters and encodings,
- a brief discussion of the Turkish language problem and solutions,
- and just for fun, a brief history of the Turkish language is also included.
pretty good reading.
i urge you to double check your locale's data & report any bugs you find. i'd say this is pretty good news for i18n folks.
reported via the unicode mailing list.
- case: not ever language has case, Thai for instance doesn't, so PERMISSIONS, Permissions and permissions would be represented by the same string. in languages that do have case, those kinds of case permutations are plainly cosmetic (i was going to say cosmetic nonsense but thought better). if there's a real application need for this sort of thing, say to accent some heading, it should be handled via CSS and not hardcoded. hardcoded case strings make the difficult i18n process even more so. think twice before you get carried away with case, especially if you find yourself writing complex <cfif> blocks to handle it.
- pluralization: not every language deals with plurals the same as English, simply adding a letter ("s" for instance) hardly ever cuts it and in some instances the language structure is completely different (the English phrase "five wood blocks" becomes something like "block of wood five units" in Thai). while you can blow off quite a few CPU cycles with complicated logic to handle plurals, i contend that item(s) is just as understandable as
<cfif someQ.recordCount GT 1>items<cfelse>item</cfif>
and has the added benefit of i18n simplicity. otherwise you'll have to add another set of rb keys (plural forms vs singular forms) and logic to handle pluralization.
- compound strings: compound strings are, besides being my pet peeve, strings that contain substituted values. for example, "You owe me #dollarFormat(amountDue)#. Please pay by #dateFormat(normalDueDate)# or I will be forced to shoot you with #numberFormat(budgetQ.bulletsPerDeadbeat)# bullets. Thank you." if you do much i18n research you'll often see folks recommending you avoid compound strings like the plague (for instance, the API for the messageFormat java class comes right and says this). why? because they're hard to handle. first you have to figure out the logic and in some cases its not going to be trivial. then you have to rework the rb string to use place holders for localization ("You owe me {1}. Please pay by {2} or I will be forced to shoot you with {3} bullets. Thank you.") . finally you have to substitute the intended values at runtime--newer versions of my javaRB and RBjava CFC have methods for this. its often much easier to simply rewrite the compound string.
- floating prepositions: these are perhaps a form of compound string but often can't be handled like them. i sometimes encounter extremely complicated output logic/displays or HTML form elements separated by a preposition (usually "at", "by" or "in"). in its simplest form it might be "dateValue at timeValue" (which actually can be handled as a compound string) but more often then not it's much more complicated. if i can get my way, we normally send floating prepositions to the garbage dump, i mean most folks would have no problem understanding "dateValue timeValue".
i suppose many folks might find this trivial but it adds time and complexity to an already time-consuming and complicated process.
http://www.w3.org/TR/i18n-html-tech-char/
http://www.w3.org/TR/i18n-html-tech-lang/
http://www.w3.org/TR/i18n-html-tech-bidi/
pretty good reading.
the first thing to note is that i didn't translate anything into arabic, just told the blog that it was ar_EG locale ;-) you can clearly see some of the BIDI issues with neutral text like punctuation (parenthesis for instance). it also uses a gregorian calendar rather then an islamic one (and yes, non-gregorian calendars are on the top of my to-do list for this blog).
the original code for this blog can be found on ray camden's blog.
cool.
if you read this blog with any regularity, you know what's coming ;-) another dip in the java pool under cfmx. we built a quick and dirty (but hey it works) CFC that makes use of the locale currency info contained in java.util.Currency class. you can see it in action here.
i'd appreciate any feedback, note that this shouldn't be used to replace the currency formatting/parsing functions in the i18nFunction CFC. this CFC isolates the currency info for easier, specific access.
they are also converting measurements into the SI (metric) system, one of my Thai neighbor's laughingly asked me "when was the last time you heard an NFL linebacker referred to in kilograms and meters?" these guys are also peppering their announcing with references to that other football (soccer to us Americans) and even referring to this as "American" football. the local (Thai language) announcers are ignoring all that goop and announcing the game knowing their audience. there's a lesson here i guess.
one of the interesting things about watching sports "overseas" is that many of the NFL games we get here are raw live feeds. these are really raw, stripped down broadcasts without the special features (sideline interviews, half-time reports, etc.) you'd get from normal network broadcasts. the plus side to this is that we get to see the producer/director shots & hear live mics when they break for commericials (there are no ads permitted on our local cable TV) and during half-time. we'll see the cameras zooming in on hotties in the stands, preview in-game presentations (the replays, analysis, highlights, etc.) and hear what the announcers really think of the game, officiating, etc. (which can sometimes be exactly opposite of what they say when they're "officially live") and every once in a while hear some announcer going beserk (once heard one former QB announcer doing an expletive laden tirade at somebody over the phone). now that's good TV ;-)
- what you need to know about the bidi algorithm and inline markup (bidi is another nifty i18n abbreviation in this case for 'bidirectional')
- an associated test suite: I18N Test Suite: Inline bidi markup
- w3c working draft: Requirements for the Internationalization of Web Services while they may indeed exist, i've never seen a webservice with anything like these, in fact the few that i've built didn't have these either.
once again, i'd like to recommend the w3c internationalization activity website to i18n folks. well worth a bookmark.
its fixed and you can find the testbed here. the file in the devnet gallery will be available soon, in the meantime you can find the fixed CFC here .
i guess this would be all sort of ho-hum so i spiced up the CFC a bit by including over 2,500 locations world wide. the access database accompanying the CFC contains names, locality, country, longitude, latitude, and raw GMT offset. the actual timezone info (as used in java) is a bit harder to come by. the next version of this CFC should hopefully have that info plus more detailed data in the US and europe.
he points out one interesting issue about PHP, which i never knew because i don't use it, it doesn't natively support unicode. its got a couple of functions to encode/decode UTF-8 but all i can say about that is "bah, humbug".
the traditional Chinese calendar is a lunisolar calendar (the same type as the Hebrew calendar). months start with a new moon, with each month numbered according to solar events. why? to guarantee that month # 11 will always contains the winter solstice. how? leap months are inserted in certain years (i feel another non-gregorian calendar induced headache coming on). these leap months are numbered the same as the month they follow. which month is a leap month? depends entirely on the movements of the sun and moon (i.e. i can't follow the math very far) . the normal ERA field differs from other calendars as it holds the 60 year "cycle" number, right now we're in the 78th cycle which began in 1983 AD. years are counted sequentially, numbering from the 61st year of the reign of Huang Di, 2637 BC, which is designated year 1 on the Chinese calendar (yes, that's right, this calendaring system is over 4,000 years old). let's look at an example:
星期三 20x78-9-13
where 20 is the year in the current cycle, 78 is the cycle for this calendar (ERA in other calendars), 9 is the month and 13 is the day.
since ICU4J's ChineseCalendar defines an additional field (for leap month) and redefines the way the ERA field (no longer AD,BC, etc.) is used, this CFC has to use a different date format class, ChineseDateFormat.
this CFC adds 4 generic functions (i forgot that some calendars need special date logic):
- isBefore to compare two dates to tell if one is before the other - isAfter which compares two dates to tell if one is after the other - getJulianDay returns the true Julian day for a given date - getExtendedYear returns the extended year, i.e. years since calendar start (in this case, current year + 2637) i'll retrofit these to the other non-gregorgian calendars. the date logic is probably more useful to the calendars that use calendar math different from the gregorian calendar (chinese, hebrew, islamic).
and 7 functions that are specific to this calendar (though i guess some can be applied to other calendars): - isLeapMonth determines if a given date is in a leap month - getCycle returns cycle for given date - getCycleYear returns year in cycle for given date - getMonth returns month in cycle year for given date - getDay returns day in month for give date - getDayOfYear returns day of cycle year for given date - getWeek returns week of cycle year for given date
the CFC's testbed is here. posted to the devnet gallery where i guess it will become available sooner or later.
next the astronomical calendar. this is one is quite tricky, its also somewhat in a state of flux (the ICU4J team's working on this code) but since it forms the basis of some of the existing calendars might as well give it a shot.
astronomicalCalendarCFC, determines the positions of the sun and moon, the time of sunrise and sunset, moonrise and moonset, moon phases (full, new, etc.), vernal equinox, summer solstice, etc. for the most part, the CFC seems to work Ok but there are a few sticky issues or at least things i don't quite get. the getSunrise/getSunset functions are supposed to return the GMT time of sunrise/sunset on the local date to which this calendar is currently set (i construct each astronomicalCalendar object with a location, lat-long and then set a date). for Bangkok, where the testbed server is, the returned sunrise, etc. times seem reasonable enough. however for sites in north america, like Philly, Scranton or Saskatoon the sunrise/sunset times appear reversed. i can't tell (yet) whether these are the GMT times for their local sunrise/sunset or the local times for the testbed server (GMT+7). or something else entirely.
this ICU4J calendar class is sort of experimental, so the docs, etc. aren't the clearest. need more testing before this thing can be shipped to the devnet gallery.
the testbed is here. if you want to play around w/the CFC as it now stands, you can download it here.
ho hum. well at least this posting hasn't mentioned the EOLAS Patent ruckus (until now ;-)
the Japanese calendar, sometimes called the Japanese Emperor Era calendar, is identical to the Gregorian calendar except for the year and era (which is why it was so easy to turn into a CFC). each emperor's ascension to the throne begins a new era. each new era's years are numbered starting with 1 (the year of ascension). what could be simpler?
the "modern" eras:
- Meiji: January 8, 1868 AD
- Taisho: July 30, 1912 AD
- Showa: December 25, 1926 AD
- Heisei: January 7, 1989 AD (current era)
you can find the testbed here. note i've added a function to determine the day the week starts (for use in some calendaring components i'm working on). it actually depends on your locale. in Thailand & the US, a week starts on sunday. in France, Poland, etc. it starts on monday. the calendar used (as far as ICU4J is concerned) doesn't matter much. i'll update the other non-gregorian calendar CFC after i'm thru with the next two calendars: chinese and astronomical.
this CFC should appear on the devnet gallery soon enough.
the Islamic calendar (also known as "Hijri" since it starts at the time of Mohammed's emigration or "hijra" to Medinah on thursday, july 15, 622 AD ) is the civil calendar used by most of the Arab world and is the religious calendar of the Islamic faith. it is a strict lunar calendar. an Islamic year of twelve lunar months therefore does not correspond to the solar year used by the Gregorian calendar system. an Islamic year averages about 354 days, so viewed from the Gregorian calendar, each subsequent Islamic year starts about 11 days earlier.
the civil Islamic calendar uses a fixed cycle of alternating 29 and 30 day months, with a leap day added to the last month of 11 out of every 30 years (oh joy, 11 days shorter and now this--i've run out of fingers and toes). that makes the calendar predictable so it is used as the civil calendar in a number of Arab countries. the Islamic religious calendar is based on the observation of the crescent moon. sounds simple enough. but that observation varies from where you at when you look (your geography), when you look (sunset varies by season you know) , moon orbit "eccentricities" (i'll take the astronomer's word for that), and even the weather (too cloudy and you obviously can't see the moon). all this makes it impossible to calculate in advance, so the start of a month in the religious calendar might differ from the civil calendar by up to three days. that makes knowing which calendar variant folks use very important. in any case, ICU4J short cuts all this, for the sake of speed, by using approximations of the astronomical calculations.
the islamicCalendarCFC test bed is here. if you've looked at the other two calendar CFC you should notice i've tried to maintain function and argument conventions across these CFCs. the islamicCalendar CFC differs in that it has an optional boolean "useCivil" argument to tell the CFC which calendar variant to use. this CFC will bubble up in the devnet gallery soon enough.
next up is the Japanese calendar.
so if you're dealing with small dateparts and large date differences, watch out for this.
i have to thank andrew tyrone for first finding the bug in the hebrewCalendarCFC & steven r. loomis for working with me to track down the problem with icu4j.
on the off chance somebody's wondering, a resourceBundle is a file holding text label key/value pairs seperated into locale files--the reason for this is to completely seperate text from code and text presentation. for instance:
testMsg_th_TH.properties (thai locale) contains welcomeMSG=สวัสดีคะ while testMsg_en_US.properties (american locale) contains welcomeMSG=Well hello there.
the application would determine which locale was required (session or application based depending on how you rolled out your application) and then load the relevant resourceBundle. the welcomeMsg text label would then show up in the proper language. simple, easy, scalable.
in any case, i've put the resource file CFC in the devnet gallery (should be available sooner or later). you can see an example and download it here if you're in a hurry to trash it.
again wandering off-topic, i've been trying to make use of native java resourceBundle (getBundle, etc.) functionality with cf, no dice so far. getBundle never seems able to find the resourceBundle. no idea if this would function any better than the way i'm doing it now but i' sure like to find out. any ideas?

