Expand description
UAX #15 Unicode NFC normalization — streaming, no_std, no_alloc.
Reads UTF-8 from an &[u8] input slice, writes NFC-normalized
UTF-8 into an &mut [u8] output slice, returns the number of
bytes written. No allocator, no std, no panics on well-formed
input.
§Algorithm (UAX #15)
-
Canonical decomposition (NFD). For each input code point, expand to its fully-recursive canonical decomposition. Hangul syllables are decomposed algorithmically per UAX #15 §3.12; all other decompositions are looked up in [
tables::DECOMP_TABLE] / [tables::DECOMP_DATA]. -
Canonical reordering. Within each “combining run” (a starter followed by zero or more non-starters), sort the non-starters in stable ascending order by canonical combining class (UAX #15 §1.3).
-
Canonical composition. Walk the decomposed-and-reordered sequence; for each starter, greedily compose with following non-blocked combining marks via the canonical composition table ([
tables::COMP_TABLE]) plus Hangul algorithmic composition (UAX #15 §3.12). A mark is “blocked” from a starter if any intervening mark has canonical combining class ≥ the mark’s own class (UAX #15 D119).
§Stream-safe bound
The stream-safe text format pins the maximum number of consecutive
non-starters at 30 (UAX #15 §3). The implementation uses a fixed
32-entry combining-run buffer (2-entry headroom) on the stack. No
allocator. Input streams that violate the stream-safe bound emit
NfcError::CombiningRunOverflow.
§UCD version pin
Tables are generated from UCD tables::UCD_VERSION (currently
15.1.0). Regenerate via python3 tools/gen_nfc_tables.py after
bumping the version pin in [crate::canonical::nfc::tables] and
the vendored data files in data/ucd/<version>/.
Enums§
Constants§
Functions§
- normalize_
into - Normalize
input(well-formed UTF-8) into NFC, writing toout. Returns the number of bytes written. - quick_
check - UAX #15 NFC_Quick_Check (UAX #15 §6 Quick_Check). Walks
inputonce; returnsYesif every code point hasNFC_QC = Yesand the canonical combining classes are non-decreasing within each combining run; returnsNoif any code point hasNFC_QC = Noor any reorder is required; returnsMaybeotherwise.