Java internationalization (i18n) with proper plurals

11,446

Solution 1

Well, you already tagged the question correctly, so I assume you know thing or two about ICU.

With ICU you have two choices for proper handling of plural forms:

  • PluralRules, which gives you the rules for given Locale
  • PluralFormat, which uses aforementioned rules to allow formatting

Which one to use? Personally, I prefer to use PluralRules directly, to select appropriate message from the resource bundles.

ULocale uLocale = ULocale.forLanguageTag("pl-PL");
ResourceBundle resources = ResourceBundle.getBundle( "path.to.messages",
                               uLocale.toLocale());
PluralRules pluralRules = PluralRules.forLocale(uLocale);

double[] numbers = { 0, 1, 1.5, 2, 2.5, 3, 4, 5, 5.5, 11, 12, 23 };
for (double number : numbers) { 
  String resourceKey = "some.message.plural_form." + pluralRules.select(number);
  String message = "!" + resourceKey + "!";
  try {
    message = resources.getString(resourceKey);
    System.out.println(format(message, uLocale, number));
   } catch (MissingResourceException e) { // Log this } 
}

Of course you (or the translator) would need to add the proper forms to properties file, in this example let's say:

some.message.plural_form.one=Znaleziono {0} plik
some.message.plural_form.few=Znaleziono {0} pliki
some.message.plural_form.many=Znaleziono {0} plików
some.message.plural_form.other=Znaleziono {0} pliku

For other languages (i.e. Arabic) you might also need to use "zero" and "two" keywords, see CLDR's language plural rules for details.

Alternatively you can use PluralFormat to select valid form. Usual examples show direct instantiation, which totally doesn't make sense in my opinion. It is easier to use it with ICU's MessageFormat:

String pattern = "Znaleziono {0,plural,one{# plik}" +
                 "few{# pliki}" +
                 "many{# plików}" +
                 "other{# pliku}}";
MessageFormat fmt = new MessageFormat(pattern, ULocale.forLanguageTag("pl-PL"));
StringBuffer result = new StringBuffer();
FieldPosition zero = new FieldPosition(0);
double[] theNumber = { number };
fmt.format(theNumber, result, zero);

Of course, realistically you would not hardcode th pattern string, but place something like this in the properties file:

some.message.pattern=Found {0,plural,one{# file}other{# files}}

The only problem with this approach is, the translator must be aware of the placeholder format. Another issue, which I tried to show in the code above is, MessageFormat's static format() method (the one that is easy to use) always formats for the default Locale. This might be a real problem in web applications, where the default Locale typically means the server's one. Thus I had to format for a specific Locale (floating point numbers, mind you) and the code looks rather ugly...

I still prefer the PluralRules approach, which to me is much cleaner (although it needs to use the same message formatting style, only wrapped with helper method).

Solution 2

ChoiceFormat, as explained here seems flexible enough to deal with any sort of pluralization you might throw at it.

EDIT: as Dr.Haribo pointed out in his comment, ChoiceFormat is not sufficient for Polish pluralization. But a followup from the same blog suggests ICU4J that handles more complex pluralization rules

Share:
11,446
Dr.Haribo
Author by

Dr.Haribo

Updated on June 02, 2022

Comments

  • Dr.Haribo
    Dr.Haribo about 2 years

    I was going to use Java's standard i18n system with the ChoiceFormat class for plurals, but then realized that it doesn't handle the complex plural rules of some languages (e.g. Polish). If it only handles languages that resemble English, then it seems a little pointless.

    What options are there to achieve correct plural forms? What are the pros and cons of using them?

  • Dr.Haribo
    Dr.Haribo over 11 years
    Look in the comments of the post you linked to. There's an example from Polish that shows that ChoiceFormat doesn't cut it. There's a followup post at stuartgunter.wordpress.com/2011/08/14/… that shows how to fix this using ICU4J.
  • Paweł Dyda
    Paweł Dyda over 11 years
    @Peter: ChoiceFormat won't let you correctly handle floating point numbers (the fraction part) as well as repeated rules (with modulo arithmetics). I'm sorry to say that, but ChoiceFormat is useless for Polish or similar languages (and I really know what I am talking about).
  • Peter Elliott
    Peter Elliott over 11 years
    duly noted, I am not an expert in the Polish language, and should have known this seemed too simple. I added the link to the followup post to my answer to make it clearer that ChoiceFormat alone is not enough
  • Dr.Haribo
    Dr.Haribo over 11 years
    Thanks, lots of good info. No, I don't know ICU and gettext, I only read that they have better support for plural forms. I also wonder how they compare, if you have any experience with gettext. Perhaps ICU has an advantage as you are using resource bundles, which may work better with standard Java tools.
  • Paweł Dyda
    Paweł Dyda over 11 years
    @Dr.Haribo: This really depends how are you going to process the translations. Depending on your Translation Memory tool (if any), gettext might be better or worse solution. I'd consult translation provider first.