Solving the world’s (localization) problems

igalia 17 views 39 slides Mar 11, 2025
Slide 1
Slide 1 of 39
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39

About This Presentation

Efficiently localizing user interfaces is an age-old problem that has haunted
programmers since the early days of software development. Many tools and
techniques have been employed over the years for this with differing levels of
success by organizations across the world.

A few years ago, stakehold...


Slide Content

Solving the world’s
(localization) problems
Eemeli Aro Mozilla), Ujjwal Sharma Igalia

Ujjwal Sharma
from New Delhi, India
based out of A Coruña, Galiza
OSS zealot, open web maximalist
love dogs, (masochistic) videogames
work at Igalia

Eemeli Aro
●From Helsinki, Finland
●Staff Software Engineer at Mozilla
●Maintainer of messageformat and yaml on npm
●Working on localization and localization
since 2012

What's wrong with
current solutions?

-L10n frameworks are primarily chosen based on their developer
front-end
-The message format/syntax is incidental
-Each framework provides all the answers
-Solution needs to get picked early (often doesn't…) and changing to
another can have a really high cost
A world of silos and monoliths

-Dynamic messages vary on many aspects of language
-plural case
-grammatical gender
-personal gender
-Vowel sounds: English a/an, French le/l'
-Prepositions: in a car, on a bus
-Messages often vary in more than one dimension
-Variance depends on language
-English he/she vs. Finnish hän (but oh so many suffixes)
A world of limitations

Inflection is hard.
Even in English.

Variance is
multi-dimensional

-Explicitly identified as an interesting problem to solve in 2013, but no
sufficiently good format was identified then or later.
-In 2019, TC39TG2 formed the MFWG to define a new format that could
be made available in JS via Intl.MessageFormat.
-WG moved under Unicode CLDR as a more appropriate host – a solution
for JS should be good for everyone else as well.
-After many meetings over five years, we think we're done.
Solving the problem for the web

Standards develop
very slowly.
Intl.MessageFormat was first proposed in 2013.

[Status]

Syntax
Hello, FOSDEM!

{ type: 'message',
declarations: [],
pattern: [ 'Hello, FOSDEM!' ] }

Placeholders
Hello, {$place}!

{ type: 'message', declarations: [], pattern: [
'Hello, ',
{ type: 'expression',
arg: { type: 'variable', name: 'place' } },
'!' ] }

Markup
Click {#link u:id=next}here{/link} to continue

interface Markup {
type: "markup";
kind: "open" | "standalone" | "close";
name: string;
options: Options;
attributes: Attributes;
}

Expressions
Hello, {$userName}!
Total: {$sum :number style=currency currency=USD}.

interface Expression {
type: "expression";
arg?: Literal | VariableRef;
function?: FunctionRef;
attributes: Attributes;
}

Functions
Today is {$date :date style=long}

interface FunctionExpression {
type: "expression";
arg?: never;
function: FunctionRef;
attributes: Attributes;
}

Patterns
This is a pattern. It can include expressions like {$v} and
{#b}markup{/b}.

type Pattern = (string | Expression | Markup)[]

Matchers
.input {$count :number}
.match $count
one {{You have {$count} week.}}
* {{You have {$count} weeks.}}
interface SelectMessage {
type: "select";
declarations: Declaration[];
selectors: VariableRef[];
variants: Variant[];
}
interface Variant {
keys: (Literal | CatchallKey)[];
value: Pattern;
}

What next?

The world beyond a single message
-Syntax & data model for a single message is good, but how do we put
together multiple messages?
-We need a new resource file format.
-Also, a metadata language – think JavaDoc/JSDoc for localization
-@locale
-@param
-@allow-empty
-…

A world of interoperability
-The message data model is not meant to be an abstract thing, but a tool
to be used
-This makes it possible to compare and convert messages across all
formats
-npm: messageformat, @messageformat/fluent,
@messageformat/icu-messageformat-1
-python: moz.l10n
-A better translation memory?

We're providing you with building blocks
-Your favourite L10n framework probably doesn't support MF2 yet.
-The tools you need to adopt MF2 probably aren't there yet.
-This is by design: We are not presuming to solve all the problems at
once, and we need your help.
-A key thought: Translatable human messages are not really that
complex, and the MF2 data model can represent all of them.
-MF2 isn't going to replace your current framework; it's trying to make it
better, and make it less of a silo.

Supporting localization in HTML
-Let's make localization declarative, and so web-native that you don't
need JavaScript to make it work.
-Declare in HTML your MF2 resources with <link rel="messages">, and
use them as <span msg="msg-id"></span>.
-This does depend on a message resource spec, and on the JS
Intl.MessageFormat spec.

<html>

You should tell us if we're wrong
-The 2.0 version of the spec is currently a Final Candidate, and it'll be
finalized with a month or so.
-If you think we're wrong about some part of this, you should tell us
ASAP, or we'll likely be stuck with our mistakes for the next decade or
three.

-Unicode MessageFormat WG
github.com/unicode-org/message-format-wg
-Unicode Inflection WG
github.com/unicode-org/inflection
-Intl.MessageFormat Proposal
github.com/tc39/proposal-intl-messageformat
-Message resources
github.com/eemeli/message-resource-wg
-JS messageformat
github.com/messageformat/messageformat/tree/main/mf2/messageformat
-Python: moz.l10n
github.com/mozilla/moz-l10n
-C & Java: ICU
icu.unicode.org