CATEGORY · M · MARKS

Marks

Combining characters that hang on, sit beside, or wrap around a base letter — diacritics, Indic vowel signs, enclosing circles.

The Mark group consists of combining codepoints. They are not letters in themselves; they attach to a preceding base character to form a complete grapheme. A French é can be a single Ll codepoint (U+00E9) or a Latin e followed by U+0301 COMBINING ACUTE ACCENT — visually identical, but a different number of codepoints. Reconciling the two forms is the job of Unicode normalization, and the M category is at the centre of it.

The subcategories

Mn
Mark, nonspacing — the largest subcategory. These marks have an advance width of zero, so they overlay or hang from the base character without consuming a column. All standard Latin diacritics live here: COMBINING ACUTE ACCENT (U+0301), COMBINING GRAVE (U+0300), TILDE (U+0303), DIAERESIS (U+0308), CEDILLA (U+0327). Hebrew niqqud and Arabic harakat are also Mn. Examples: ◌̀ ◌́ ◌̃ ◌̈ ◌̧.
Mc
Mark, spacing combining — marks that do consume horizontal advance, even though they are not letters. Chiefly the dependent vowel signs of Indic scripts: Devanagari आ-mātrā (U+093E), Bengali ī-kāra (U+09C0), Tamil ai (U+0BC8). Examples: ा ि ी (each shown after a dotted-circle base).
Me
Mark, enclosing — a tiny subcategory whose marks wrap around the base. The COMBINING ENCLOSING CIRCLE (U+20DD), COMBINING ENCLOSING SQUARE (U+20DE), COMBINING ENCLOSING DIAMOND (U+20DF) and their keycap and screen variants. Examples: ◌⃝ ◌⃞ ◌⃟.

Combining class

Every combining mark also carries a Canonical Combining Class (CCC), an integer 0–254 that controls how multiple marks on a single base are ordered. Below-mark accents have a different CCC than above-mark accents, so they can both stack on the same base without ambiguity. The normalization algorithm sorts adjacent marks by CCC before deciding whether a precomposed form exists. This is why e + U+0301 (acute, CCC 230) + U+0327 (cedilla, CCC 202) reorders the cedilla before the acute during NFC, then composes to only if a precomposed codepoint exists.

Normalization and security

Most of the bug surface around combining marks comes from systems that treat strings as sequences of codepoints rather than sequences of grapheme clusters. The string café with the precomposed é has four codepoints; with the decomposed form it has five. Naïve length checks, substring searches, password comparisons and identifier hashing all need to normalise first — usually to NFC. UAX #15 exists exactly to make this deterministic.

There are also security implications. A malicious user might register a username with a combining mark hidden in it that visually matches another user's. UAX #39 (Unicode Security Mechanisms) deals with this by restricting the marks permitted in identifiers, and by defining confusable-detection algorithms that fold strings through both NFC and a set of confusable mappings before comparison.

Rendering notes

Fonts implement combining marks through OpenType anchoring (the mark and mkmk features) or hard-coded positioning tables. When the font lacks coverage for a particular mark-on-base pair, the renderer falls back to mechanical placement, which often looks wrong — a tilde may sit too high above a q, or a cedilla may overlap with a g's descender. Designers of fonts intended for linguistic transcription invest heavily in this anchoring, which is why scholarly fonts (Charis SIL, Doulos SIL, Brill, Junicode) feel so different from system defaults when you start stacking marks.

Example characters

A selection from across Mn, Mc and Me. The combining marks are shown attached to a dotted-circle placeholder per Unicode convention.

U+0300 · Mn◌̀Combining Grave U+0301 · Mn◌́Combining Acute U+0302 · Mn◌̂Combining Circumflex U+0303 · Mn◌̃Combining Tilde U+0308 · Mn◌̈Combining Diaeresis U+030A · Mn◌̊Combining Ring Above U+0327 · Mn◌̧Combining Cedilla U+0335 · Mn◌̵Combining Short Stroke Overlay U+093E · Mc◌ाDevanagari Vowel Aa U+093F · Mc◌िDevanagari Vowel I U+09C0 · Mc◌ীBengali Vowel Ii U+0BC8 · Mc◌ைTamil Vowel Ai U+20D0 · Mn◌⃐Left Harpoon Above U+20DD · Me◌⃝Enclosing Circle U+20DE · Me◌⃞Enclosing Square U+20E3 · Me◌⃣Combining Enclosing Keycap

Related