Skip to main content

Module nfc

Module nfc 

Source
Expand description

UAX #15 Unicode NFC normalization — streaming, no_std, no_alloc.

Reads UTF-8 from an &[u8] input slice, writes NFC-normalized UTF-8 into an &mut [u8] output slice, returns the number of bytes written. No allocator, no std, no panics on well-formed input.

§Algorithm (UAX #15)

  1. Canonical decomposition (NFD). For each input code point, expand to its fully-recursive canonical decomposition. Hangul syllables are decomposed algorithmically per UAX #15 §3.12; all other decompositions are looked up in [tables::DECOMP_TABLE] / [tables::DECOMP_DATA].

  2. Canonical reordering. Within each “combining run” (a starter followed by zero or more non-starters), sort the non-starters in stable ascending order by canonical combining class (UAX #15 §1.3).

  3. Canonical composition. Walk the decomposed-and-reordered sequence; for each starter, greedily compose with following non-blocked combining marks via the canonical composition table ([tables::COMP_TABLE]) plus Hangul algorithmic composition (UAX #15 §3.12). A mark is “blocked” from a starter if any intervening mark has canonical combining class ≥ the mark’s own class (UAX #15 D119).

§Stream-safe bound

The stream-safe text format pins the maximum number of consecutive non-starters at 30 (UAX #15 §3). The implementation uses a fixed 32-entry combining-run buffer (2-entry headroom) on the stack. No allocator. Input streams that violate the stream-safe bound emit NfcError::CombiningRunOverflow.

§UCD version pin

Tables are generated from UCD tables::UCD_VERSION (currently 15.1.0). Regenerate via python3 tools/gen_nfc_tables.py after bumping the version pin in [crate::canonical::nfc::tables] and the vendored data files in data/ucd/<version>/.

Enums§

NfcError
Streaming NFC normalizer error.
NfcQc
UAX #15 NFC_Quick_Check property values.

Constants§

UCD_VERSION

Functions§

normalize_into
Normalize input (well-formed UTF-8) into NFC, writing to out. Returns the number of bytes written.
quick_check
UAX #15 NFC_Quick_Check (UAX #15 §6 Quick_Check). Walks input once; returns Yes if every code point has NFC_QC = Yes and the canonical combining classes are non-decreasing within each combining run; returns No if any code point has NFC_QC = No or any reorder is required; returns Maybe otherwise.