The Symbol group is around 8,000 codepoints strong in Unicode 16.0. It is the second-largest category after Letters and contains some of the most-typed non-alphabetic characters on the planet — the dollar sign, the copyright symbol, the heart, the snowman. The four subcategories are useful for distinguishing math from currency from pictography in regex, search, and font subsetting.
The subcategories
- Sm
- Symbol, math — mathematical operators and relational signs. PLUS + (U+002B), LESS-THAN < (U+003C), EQUALS = (U+003D), GREATER-THAN > (U+003E), TILDE ~ (U+007E), MINUS SIGN − (U+2212, distinct from ASCII hyphen-minus), MULTIPLICATION × (U+00D7), DIVISION ÷ (U+00F7), PLUS-MINUS ± (U+00B1), INFINITY ∞ (U+221E), SUMMATION ∑ (U+2211), INTEGRAL ∫ (U+222B), FOR ALL ∀ (U+2200), THERE EXISTS ∃ (U+2203), the alphabet of set theory (U+2200–U+22FF), arrows → (U+2190–U+21FF, all Sm), and the Mathematical Alphanumeric Symbols (U+1D400–U+1D7FF).
- Sc
- Symbol, currency — every official currency sign. DOLLAR $ (U+0024), POUND £ (U+00A3), YEN ¥ (U+00A5), CENT ¢ (U+00A2), EURO € (U+20AC), INDIAN RUPEE ₹ (U+20B9), KOREAN WON ₩ (U+20A9), TURKISH LIRA ₺ (U+20BA), BITCOIN ₿ (U+20BF), and another fifty in the Currency Symbols block (U+20A0–U+20CF). The U+20AC EURO was added in Unicode 2.1 (1998) specifically for the new common currency.
- Sk
- Symbol, modifier — spacing letter-modifier symbols. These look like accents but exist as standalone characters: GRAVE ACCENT ` (U+0060), CIRCUMFLEX ^ (U+005E), DIAERESIS ¨ (U+00A8), MACRON ¯ (U+00AF), ACUTE ´ (U+00B4). The skin-tone modifiers used by emoji (U+1F3FB–U+1F3FF) are also Sk. Distinguish Sk (standalone, takes advance width) from Mn (combining, zero advance).
- So
- Symbol, other — everything else pictographic. COPYRIGHT © (U+00A9), TRADE MARK ™ (U+2122), REGISTERED ® (U+00AE), DEGREE ° (U+00B0), MICRO µ (U+00B5), the Zapf dingbats ✓ ✗ ✚ ✪ ✰ (U+2700–U+27BF), arrows that aren't math arrows, weather signs, chess pieces, playing cards, alchemical symbols, the entire emoji repertoire and its skin-toned, gendered, family-joined sequences.
Math symbols vs ASCII punctuation
One of the recurring traps in plain-text data is the difference between ASCII - (U+002D HYPHEN-MINUS, category Pd) and the proper minus sign − (U+2212 MINUS SIGN, category Sm). They render almost identically in most fonts but they are different characters with different categories. Mathematical typesetting systems (LaTeX, MathML, OpenType mathFont) emit U+2212. Programming languages emit U+002D. The same applies to multiplication × (U+00D7) vs the letter x, and division ÷ (U+00F7) vs the slash /.
Currency and the formatting tail
Currency signs are not just symbols — they participate in locale formatting. ICU's UNumberFormatter and JavaScript's Intl.NumberFormat choose currency placement (prefix or suffix), spacing, and the grouping separator from CLDR data keyed off the Sc codepoint. The euro sign is canonically suffixed in French (10 €) and prefixed in English (€10). The currency name itself is taken from CLDR, but the codepoint is universal — that's the whole point of giving every currency its own Sc.
Emoji and the So bucket
Most emoji are So. The few exceptions include the keycap base characters (U+0023 # is Po, U+002A * is Po, U+0030–U+0039 are Nd) which become emoji only when followed by an Emoji Variation Selector U+FE0F and a Combining Enclosing Keycap U+20E3. The Emoji property itself is independent of General Category; see the Emoji page for the cross-cutting story. Many compound emoji are ZWJ sequences whose base characters are individually So — see how emoji work for byte-level breakdowns.