Babel currently provides translations for some strings and dates. Useful additions are, for example, time, currency, addresses and personal names. I’m currently working on a new way to define languages in a descriptive way, more than in a programmatic one.
The idea is to create a set of ini file like those in the LaTeX repository (the keys are tentative). Why the ini syntax? — because it’s easy to create, read, edit, parse and process.
The main source are the
ldf files as well as the CLDR. However, the
latter in intended for displaying plain text, while TeX is about fine
typesetting, which is making things a bit harder than expected.
Some additional decisions must be taken in this regard – for example,
several languages have several names for a single caption, depending
on the class; there should be also keys for the treatement of labels
or the order of captions and their corresponding numbers (not all
languages place the number after the caption, as LaTeX assumes). And
\alph labels just the
exemplarCharacters in CLDR? (certainly
not, at least in general).
More interesting are changes in the sentence structure or related to it. For example, in Basque the number precedes the name (including chapters), in Hungarian “from (1)” is “(1)-ből”, but “from (3)” is “(3)-ból”, in Spanish an item labelled “3.o” may be referred to as “3.er ítem”, and so on.
Even more interesting is right-to-left, vertical and bidi typesetting. Babel provided a basic support for bidi text as part of the style for Hebrew, but it is somewhat unsatisfactory and internally replaces some hardwired commands by another hardwired commands (generic marks would be much better).
I don’t think the mechanism for loading hyphenation patterns in luatex
is satisfactory. Information is loaded three times (from
language.dat.lua when the format is built, and
language.dat.lua when typesetting the document), which can
lead to inconsistent data. I’m currently working on a revanped loader
based solely on
language.dat, read at run time.