Saturday, September 24, 2011

Arabic Support in Java

So a few months back, I started working on an Arabic version of JavaScript, which required me to figure out how to write an Arabic Java program. To my surprise, Java actually has pretty lousy support for BIDI and Arabic.

Now, I always suspected that the old Sun severely understaffed its Java UI group and I always had this odd feeling that their UI programmers never actually programmed any real applications using their UI frameworks. The fact that Java, the major enterprise programming language of the last decade, still doesn't properly support Arabic even after all these years of development just tells you everything you need to know right there. IBM/eclipse, with its SWT UI framework for Java, has had proper Arabic support for a long time now. Instead of doing everything alone, why didn't Sun get help from others when developing Java? Why? Mind you, they rely heavily on proper OS support for Arabic while Java's Swing framework is designed in such a way that they have to reimplement Arabic support from scratch. And the Unicode character encoding was specifically designed so that text processing of Unicode is easy but the handling Unicode in a GUI is the most awfully complicated piece of code possible. But still, we're at Java 7 now, and they should have been able to get this basic piece of infrastructure supported by now.

It makes me wonder which programming languages computer science students are taught in the Arab world. It can't be Java. What's the point of learning a programming language that can't properly handle the input and display of your native language? I guess it must be C# or something.

Anyway, this is what I found. I was working in Windows in Java 6, so my observations may be specific to this combination.

In AWT, I tried setting the text direction using applyComponentOrientation(), but I could never get that thing to work. The text always stayed left-to-right using AWT widgets. In both AWT and Swing, there's a setLocale() method, but I have yet to figure out whether that method actually does anything. I suspect it doesn't do anything. So basically, I don't think there's any BIDI support in AWT. This is somewhat annoying because Windows has reasonable BIDI and Arabic support, so if Java simply passed on this text orientation information from AWT to Windows, then we could simply use AWT for our UIs, and everyone would be happy. Unfortunately, it doesn't.

In Swing, the applyComponentOrientation() method did seem to properly set the text orientation, so there's some good support for bi-directional text there. Unfortunately, for Arabic text, you also need support for text numeric shaping. Although some Arab nations use western number symbols (which, as you probably know, are called Arabic numerals, which makes things nice and confusing for everybody), most Arab nations have their own number symbols. Unicode, in its infinite wisdom, decided that these symbols will actually be encoded in character streams as western digits 0-9, but the UI will be responsible for automatically substituting in different number shapes when the characters are actually displayed. The Java graphics libraries actually have some support for numeric shaping when displaying text, but it's sort of broken. Java doesn't automatically extract region and locale information from the OS, so it defaults to guessing which numeric shapes it should use. If the string you're displaying starts with numbers, there's no initial context for the numeric shaper to use in guessing which number forms to use, so it will default to Western shapes. Also, the Java numeric shaper only reshapes numbers and forgets to reshape the decimal point and thousand separator. I tried a release candidate for Java 7, and this problem was still there. Does anybody at Sun/Oracle actually use their UI framework in real applications? Anyway, even this broken support for numeric shaping isn't actually enabled in Swing, so you can't display numbers in Arabic. In the Java 7 release candidate, programmers could manually enable Arabic numeric shaping in rich text widgets (i.e. JTextArea), but it wasn't clear whether that support was also added to other widgets like JButton or JLabel, etc. In any case, given the brokenness of Java numeric shaping, that isn't exactly a big win.

So in the end, if you want to build an Arabic GUI in Java, use SWT. Eclipse now has a free GUI builder available called Window Builder, and even though I didn't actually know any SWT, I was able to throw together a SWT GUI in a couple of hours with almost no work. GUI builders are awesome. Previously, I was always concerned that I would lose some flexibility in the code I could write, but the one in Eclipse is a real dream and saves a lot of time. I really should have paid money to buy one years ago.