Katakana (U+30A0–U+30FF) — UnicodeCharacter

Katakana is one of the two syllabaries — collectively called kana — that share the modern Japanese writing system with the Chinese-origin kanji. The block at U+30A0–U+30FF holds 96 codepoints covering the full gojuuon table, its voiced and semi-voiced derivatives, the small forms used for compound syllables and the geminate consonant, two archaic letters, and a handful of supporting symbols including the middle dot and the long-vowel mark. Katakana letters are visually angular and economical, drawn from squared-off fragments of kanji; their counterparts in Hiragana are cursive, drawn from the calligraphic abbreviation of the same characters.

About this block

The block was present in Unicode 1.0 in October 1991. It corresponds bit-for-bit to the katakana portion of JIS X 0208, the foundational two-byte Japanese encoding from 1978, and the parallel layout to the Hiragana block at U+3040–U+309F is deliberate: each kana at offset n in the Katakana block has its phonetic match at offset n in Hiragana. This regularity is why software can convert between the two syllabaries with a single arithmetic adjustment — a feature Japanese IMEs lean on every day, since the same input keystrokes produce hiragana that the user then converts to katakana for loanwords. Katakana itself was developed in the early 9th century by Buddhist monks at temples like Kōfuku-ji in Nara, who needed a fast way to annotate Chinese sutras with Japanese pronunciation guides. They took small fragments of kanji — the left radical, a top stroke, a corner — and used them as phonetic shorthand. The name kata-kana literally means "fragment kana."

The standard gojuuon — "fifty sounds" — is arranged in a five-vowel grid: アイウエオ (a, i, u, e, o); カキクケコ; サシスセソ; タチツテト; ナニヌネノ; ハヒフヘホ; マミムメモ; ヤユヨ; ラリルレロ; ワヲ; and the syllable-final ン. Voicing is marked with the dakuten diacritic ゛, producing ガギグゲゴ from the K-row, ザジズゼゾ from the S-row, ダヂヅデド from the T-row, and バビブベボ from the H-row. The H-row also accepts the handakuten ゜ for a semi-voiced labial: パピプペポ. Unicode encodes each voiced and semi-voiced syllable as a single precomposed codepoint, plus a separate combining-mark pair (U+3099, U+309A) and a spacing-mark pair (U+309B, U+309C) for cases where decomposition is needed. The block also includes small versions of every vowel and the y-row — ァィゥェォャュョ — plus the small ッ (sokuon), which doubles the consonant of the following syllable, and a small ヮ used in certain loanword transcriptions.

Two punctuation-adjacent characters are essential. U+30FB ・ KATAKANA MIDDLE DOT separates components inside a katakana word and is the standard delimiter between transliterated given and family names of non-Japanese people (e.g. ジョン・スミス for "John Smith"). U+30FC ー KATAKANA-HIRAGANA PROLONGED SOUND MARK — colloquially the chōonpu — lengthens the preceding vowel and is what makes コーヒー (kōhī, "coffee") two syllables longer than its consonants suggest. Two archaic letters, ヰ wi and ヱ we, are retained at U+30F0 and U+30F1 as direct counterparts to Hiragana's ゐ and ゑ; they have been obsolete in modern Japanese since 1946 but appear in older texts, surnames, brand names, and some Ainu writing. Half-width versions of the basic katakana set live in the Halfwidth and Fullwidth Forms block at U+FF65–U+FF9F, preserved from the JIS X 0201 single-byte encoding that originally used them on space-constrained terminals and price-display hardware. The dedicated Katakana Phonetic Extensions block at U+31F0–U+31FF adds small kana for representing Ainu and certain dialect sounds (small ク, small シ, small ト, and so on).

In modern Japanese text, katakana plays a role roughly analogous to italics in English: it draws attention. Foreign loanwords are written in it almost universally — テレビ terebi ("television"), コンピューター konpyūtā ("computer"), パン pan ("bread," via Portuguese). Onomatopoeia and mimetic words use katakana when they describe sharp, mechanical, or non-human sounds — ガタガタ gatagata (rattling), ピカピカ pikapika (sparkling). Scientific names of plants and animals, technical jargon, slang, transliterated personal names, telegraph messages, and the names of many Japanese companies (ソニー, トヨタ, パナソニック) all use katakana. Software and protocol designers will encounter katakana most often in Shift-JIS-derived legacy data, in IDN domain labels, and in any text dealing with consumer electronics. Half-width katakana, in particular, remains a long-tail compatibility hazard — it round-trips through the Halfwidth/Fullwidth block but is treated as a separate set of codepoints from the canonical Katakana block, which means string comparison and normalization (see the NFKC form) must be considered.

Katakana

About this block

Notable characters

Every character in the block

About this block

Notable characters

Every character in the block

Related blocks