General Punctuation is Unicode's typographic toolkit. It begins at U+2000 with a row of fixed-width spaces, runs through the proper dashes, quotation marks, dots, and reference signs that distinguish a well-set page from a teletype output, and ends in a quietly important set of invisible format characters — zero-width joiners, bidirectional overrides, and word-joining marks that shape how text is laid out and joined.

About this block

The block was present from Unicode 1.0 in 1991, but its characters trace back much further: most originate in centuries of European typography and in the late-twentieth-century technical standards (especially the typesetting language TeX, the Adobe character sets PostScript Standard Encoding and the Mac Roman charset, and DEC Multinational) that needed a place to put marks the seven- and eight-bit codepages could not hold. The Unicode Consortium gathered them into a single block here and codified their semantics in the Unicode Character Database properties — assigning each a general category (mostly Pd "dash", Pi/Pf "initial/final quote", Po "other punctuation", Zs "space separator", Cf "format").

The dashes are the most visible inhabitants. U+2010 HYPHEN is the typographic hyphen, distinct from the ASCII hyphen-minus U+002D. U+2013 EN DASH is used for ranges (1939–1945). U+2014 EM DASH is used for parenthetical breaks — like this one — and is the longest standard dash. U+2015 HORIZONTAL BAR is sometimes used for dialogue in European typography. U+2212 MINUS SIGN, the mathematical operator, actually lives one block over in Mathematical Operators — a common confusion.

The quotation marks at U+2018U+201F are the "curly" or "smart" quotes that every word processor silently substitutes for the ASCII straight quotes. U+2018 ‘ and U+2019 ’ are the single quotes (with U+2019 doubling as the modern apostrophe); U+201C “ and U+201D ” are the doubles. The dagger U+2020 † and double dagger U+2021 ‡ mark footnotes. U+2022 BULLET is the round list bullet, and U+2026 HORIZONTAL ELLIPSIS is the single-character ellipsis (preferred over three periods because it does not break across lines).

The opening row of fixed-width spaces — EN QUAD (U+2000), EM QUAD (U+2001), EN SPACE (U+2002), EM SPACE (U+2003), THREE-PER-EM SPACE (U+2004), FOUR-PER-EM SPACE (U+2005), SIX-PER-EM SPACE (U+2006), FIGURE SPACE (U+2007), PUNCTUATION SPACE (U+2008), THIN SPACE (U+2009), HAIR SPACE (U+200A) — exists for fine typographic adjustment. Most are rarely used; the practical workhorses are the THIN SPACE around French punctuation, the FIGURE SPACE for tabular numerals, the NARROW NO-BREAK SPACE (U+202F) used as a thousands separator in many locales, and the MEDIUM MATHEMATICAL SPACE (U+205F) used between math operands.

Finally, the block hosts a set of invisible format characters. ZERO WIDTH SPACE (U+200B) and WORD JOINER (U+2060) influence line-breaking; ZERO WIDTH NON-JOINER (U+200C) and ZERO WIDTH JOINER (U+200D) control whether adjacent letters form a ligature — the ZWJ is the very same character that holds modern emoji ZWJ sequences together. The bidi controls at U+202AU+202E and the isolates at U+2066U+2069 drive the Unicode bidirectional algorithm for mixing left-to-right Latin with right-to-left Arabic and Hebrew.