Home » Blog » ICU Messages, RTL, and Collation: The i18n Bugs Your English-Only Tests Miss

ICU Messages, RTL, and Collation: The i18n Bugs Your English-Only Tests Miss

Priya Ramanathan

May 9, 2026

ICU Messages, RTL, and Collation: The i18n Bugs Your English-Only Tests Miss

If your product “supports i18n,” you probably have JSON translation files, a CI job that fails when keys go missing, and maybe a pseudo-localization pass that stretches strings with accents. That is a respectable baseline. It is also nowhere near enough to catch the bugs that show up when real users run real locales—especially when those locales mix right-to-left scripts, different numeral systems, and collation rules your database never agreed to respect.

This article is about the class of failures that slip through English-centric test suites: ICU message shape, bidirectional layout, and collation/sorting surprises. None of these require you to speak twelve languages. They do require you to stop treating translation as a file-format problem and start treating it as a runtime contract between UI, data stores, and typography.

Developer workspace with code and translation resources on a laptop

Why ICU messages are more than “string substitution”

Most teams graduate quickly from naive concatenation (“You have ” + count + ” items”) to some form of templating. If you are on ICU MessageFormat (directly or through libraries such as FormatJS or vue-i18n’s ICU mode), you gain plural rules, select/gender branches, and offsets. You also gain ways to be wrong that unit tests written in English will never exercise.

English has two plural forms in practice for cardinal numbers (“1 file” vs “2 files”). Polish, Arabic, and several other languages have more buckets. A message like {count, plural, one {# file} other {# files}} looks fine until you realize “one” in ICU is not the same as “equals 1” in every language; fractional quantities, ranges, and CLDR’s plural rules interact in ways that are easy to mis-map if your product copy assumes a Western European mental model.

The failure mode in production is rarely a crash. It is awkward copy: “0 files” rendered with a singular branch, or a legal disclaimer that reads like machine output because a select branch was never translated. Worse is when you interpolate raw HTML or markdown through messages and translators break your escaping assumptions. The fix is structural: keep messages as data, forbid raw HTML in translators’ hands unless you have a controlled rich-text pipeline, and add snapshot tests that render messages under multiple synthetic locales—not just en and de, but at least one locale with complex plural rules and one RTL locale.

RTL is not “mirror the CSS and ship”

Right-to-left support is often reduced to “set dir=rtl and flip margins.” Directionality is necessary but not sufficient. Icons with implied directionality (chevrons, undo arrows, maps) may need logical properties or explicit mirroring rules. Mixed-direction text—an English product name inside an Arabic sentence—is where naive mirroring produces comical alignment and where Unicode bidirectional algorithm surprises leak into your UI.

Smartphone UI concept showing mirrored layout for RTL languages

Tables and forms are especially cruel. If your design system hard-codes “label on the left, field on the right,” RTL will invert the reading order but not necessarily your visual hierarchy unless you adopt logical properties consistently (margin-inline-start instead of margin-left, and so on). The same goes for modals: focus order, scroll chaining, and gesture back-navigation on mobile all interact with direction in ways desktop-only QA will miss.

A pragmatic checklist: audit every absolutely positioned element, search for left/right in your component library, verify charts and timelines have a locale-aware axis, and test one mixed-script string in every navigation surface. If that sounds tedious, compare it to the cost of shipping a banking app where the confirm button swaps sides between screens.

Collation: where your database disagrees with your UI

Sorting looks universal until it is not. Accents, case folding, locale-specific rules for ordering “ä” relative to “a,” and numeric substrings inside filenames all change ordering. If your API sorts with one collation and your client re-sorts with another, you get duplicate-detection bugs, flaky pagination, and “why is this row missing?” incidents that reproduce only for customers in specific regions.

Strings that look identical can differ by normalization form (NFC vs NFD). That is not an academic detail; it breaks uniqueness checks if one path normalizes input and another does not. For user-visible lists, decide whether locale-aware sorting is a requirement or whether you intentionally use a stable binary order for performance—and then document that choice where both frontend and backend teams will see it.

Cursor-based pagination is a common collateral victim. If your API orders by created_at and then name, but the name sort is locale-sensitive on the server and binary on the replica, pages drift. Users see duplicates or gaps when they scroll. The mitigation is boring and effective: pick one canonical ordering key for pagination—usually a stable UUID or monotonic ID—and treat locale-aware ordering as a presentation-layer concern for bounded result sets, not as the backbone of an infinite feed unless you have explicitly tested the database’s collation semantics end to end.

Numbers, dates, and the illusion of “just use ISO”

Developers love ISO 8601 timestamps in APIs. Users love timestamps that match the wall clock they are looking at. The gap between those two truths generates endless tickets. Daylight saving jumps, half-hour time zones, and “all-day” events that span UTC midnight all stress naive date formatting. If you render a meeting as 2026-05-09T22:30:00Z in the body of an email without a localized equivalent, you have not internationalized; you have internationalized the storage layer only.

Currency is worse because rounding rules differ. Showing more decimal places than the payment processor charges creates false expectations; showing fewer can look illegal in regulated contexts. ICU’s number skeletons help, but only if the same rounding mode is applied at checkout, in receipts, and in reporting exports. A mismatch between what the UI promises and what the ledger stores is an i18n defect even if every string is translated perfectly.

Telephone numbers, postal codes, and national IDs are not “strings” in the product sense; they are typed values with validation tables that change. Hard-coded regexes from Stack Overflow are a recurring source of silent exclusion. Prefer libphonenumber-style validation, keep UI masks separate from stored values, and never infer country from language choice alone—expats, travelers, and multilingual households exist.

A composite failure story (that still happens in 2026)

Imagine a dashboard that lists “teams” alphabetically. The product adds Turkish localization. A team named “İstanbul Ops” appears in a different position on Windows laptops than on MacBooks because case folding rules for dotted and dotless I diverge across environments. A manager assumes the list is wrong, clears “duplicates,” and merges two unrelated groups. Root causes: mixed collation defaults between Electron and the cloud database, plus a uniqueness constraint that did not normalize before compare.

The fix sequence is instructive: normalize identifiers, stop using display names as primary keys, align database collation with an explicit documented choice, and add a cross-platform test that sorts a frozen fixture list and compares hashes. None of those steps require fluent Turkish; they require treating localization as distributed systems hygiene.

Accessibility stacks on top—not beside

Screen readers announce direction changes when mixed inline content is authored carelessly. If your component inserts a Latin brand name into an RTL sentence without isolates, users hear jumps that sound like errors even when the screen looks fine visually. Unicode bidi isolates (FSI/PDI or the equivalent in your framework) are not optional polish for “exotic” locales; they are part of accessible markup for any mixed-script product.

Similarly, alt text for charts and infographics must be localized or generated from the same data that drives the graphic. Translating only visible captions while leaving alt English guarantees an incoherent experience for blind users in non-English locales—and in some jurisdictions, that incoherence touches compliance, not just taste.

Translation workflow: keep context attached to keys

Even perfect runtime libraries fail when translators work from spreadsheets of bare strings. “Book” as a verb and “book” as a noun need different keys; homographs explode in gendered languages. Give translators screenshots, component names, and character limits. Store developer comments alongside message IDs. If your TMS cannot carry screenshots per key, build a thin export that bundles them anyway. The engineering cost is tiny compared to emergency releases when a button reads like a threat because context was missing.

Testing without pretending you speak every language

You do not need fluent speakers on call to catch the first 80% of pain. You need:

Fixture locales — synthetic bundles that exercise long strings, RTL, and plural branches.
Visual regression with direction toggles — screenshot diffs for key flows in both ltr and rtl.
Property tests for formatting — feed random counts and currencies through your message layer and assert no uncaught exceptions and no raw tokens leaking.
Database collation audits — one script that prints the collation of every text column you sort on in prod.

Human review still matters for tone and legal copy, but engineering guardrails prevent the embarrassing structural defects that make a brand look careless rather than merely unpolished.

What to fix first if you are overwhelmed

Start with money and identity: currency formatting, dates, and government IDs or addresses almost always surface i18n defects first because users are motivated to report them. Then fix search and sort, because those touch databases. Finally, polish marketing pages—important for trust, but less likely to brick a workflow.

Internationalization is not a sprint at the end of a roadmap; it is a set of invariants your architecture either enforces or quietly violates. ICU, RTL, and collation are three places English-only tests lie to you. Tighten those, and the rest of your localization budget can go toward actual words—not emergency rewrites after launch.