The Cyrillic block — U+0400 through U+04FF, 256 codepoints — encodes the script that grew out of the ninth-century mission of the Byzantine brothers Constantine (later Saint Cyril) and Methodius to the Slavs of Great Moravia. The brothers themselves devised Glagolitic, a more ornate script; the alphabet that came to bear Cyril's name was developed shortly afterward by their disciples at the Preslav Literary School in Bulgaria, drawing letter shapes directly from Greek uncials and inventing new letters for sounds Greek did not represent. From there it spread north and east with the Orthodox Church, becoming the script of Kievan Rus', of Serbia and Bulgaria, and eventually of an empire that stretched from Warsaw to Vladivostok.
About this block
The shape of modern Cyrillic owes as much to Peter the Great as to Cyril. In 1708 the tsar imposed the Гражданский шрифт ("civil script") reform, deliberately secularising the alphabet by redrawing the letters along Latin-Antiqua proportions, dropping several archaic forms, and reserving the older Church Slavonic letterforms for liturgical use. The result is the visual register Russian readers still recognise today: an alphabet that looks Western in its proportions but Eastern in its repertoire. The Soviet alphabet reforms of 1917–1929 trimmed the Russian set further, removing і, ѣ (yat), ѳ (fita), and ѵ (izhitsa) — letters that had survived only as etymological spellings. Ukrainian retained і; Bulgarian kept ѣ in some uses until its own 1945 reform; and the dropped letters live on in Unicode as historic codepoints so digitised pre-revolutionary texts can still be encoded faithfully.
The block was added to Unicode 1.0 in 1991 and corresponds roughly to ISO 8859-5 and to KOI8-R, the de-facto Russian codepage on early Soviet computers. Uppercase Russian Cyrillic occupies U+0410–U+042F (А Б В Г Д Е Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я); lowercase runs U+0430–U+044F. Above that range live the letters needed by other Cyrillic-script languages: Ukrainian Ґ, Є, І, Ї; Belarusian Ў; Serbian Ј, Љ, Њ, Ћ, Џ; Macedonian Ѕ, Ќ, Ѓ; and the historically Russian Ё U+0401 / ё U+0451 placed at the top of the block, separated from the rest of the alphabet because pre-Unicode codepages encoded it inconsistently. Below U+0410 sit additional non-Russian letters and the historic letters Ѣ ѣ (yat), Ѳ ѳ (fita), and Ѵ ѵ (izhitsa). The remainder of the block holds Cyrillic letters used by Caucasian languages such as Abkhaz and Chechen, by Siberian languages such as Yakut, Buryat, and Chukchi, and by Central Asian languages including Kazakh, Kyrgyz, Tajik, and Bashkir.
Cyrillic is one of the scripts most actively extended by Unicode. The original 256 codepoints proved insufficient as more languages were standardised, and four supplementary blocks were added: Cyrillic Supplement at U+0500–U+052F, Cyrillic Extended-A at U+2DE0–U+2DFF (combining letters used in Old Church Slavonic manuscripts), Cyrillic Extended-B at U+A640–U+A69F (more historic and minority-language letters), and Cyrillic Extended-C at U+1C80–U+1C8F (a small set of Old Church Slavonic letterforms added in 2016). Together these encode more than 440 Cyrillic codepoints.
One subtlety to know about Cyrillic in practice: many of its letters are visually indistinguishable from Latin letters in most fonts but live at entirely different codepoints. Cyrillic а U+0430, е U+0435, о U+043E, р U+0440, с U+0441, у U+0443, and х U+0445 look almost identical to Latin a, e, o, p, c, y, and x at U+0061, U+0065, U+006F, U+0070, U+0063, U+0079, U+0078. This visual overlap is the basis of the IDN homograph attack, in which a domain like аррlе.com (with all-Cyrillic letters) is registered to impersonate apple.com. Browsers and registrars now apply restrictions from Unicode normalization and IDN policy to refuse mixed-script labels in most cases, but the underlying confusable set is structural — it cannot be removed without breaking either script.