Character inspector
Paste any string and see exactly what codepoints are in it — including the invisible ones.
Paste any string and see exactly what codepoints are in it — including the invisible ones.
| # | Glyph | Codepoint | Decimal | Name / Block | Category | UTF-8 bytes |
|---|
The inspector iterates the input string by codepoint using a for…of loop, which correctly handles surrogate pairs (unlike indexing with bracket notation, which gives you 16-bit code units and breaks on supplementary-plane characters like emoji). For each codepoint it shows the glyph itself, the canonical U+XXXX notation, the decimal value, an identifying name, the general category, and the UTF-8 byte length.
The summary above the table compares four counts that often disagree:
Intl.Segmenter with granularity: 'grapheme'. A flag, a family emoji, and a base letter with combining marks each count as one grapheme.[...str].length returns..length returns; equal to codepoint count for BMP-only strings, larger when supplementary-plane characters are present.Names are resolved from a small built-in lookup of roughly eighty common codepoints. For everything else, the tool falls back to the Unicode block name (e.g. CJK Unified Ideographs, Emoticons, Cyrillic). Blocks partition the codepoint space into named ranges and are useful even without a per-character name. The category column uses a block-based heuristic to suggest a general category — letter, mark, digit, symbol, control, format, separator. For precise category data you'd consult the Unicode Character Database; this tool deliberately ships a heuristic rather than a 200 KB JSON file.
Paste a string and you immediately notice things you couldn't see before. A "smart quote" you copied from a Word document turns out to be U+201D RIGHT DOUBLE QUOTATION MARK, not the ASCII ". A string that "should" match in your database turns out to contain a U+200B ZERO WIDTH SPACE somewhere in the middle, courtesy of a copy-paste from a tracking pixel. A name field that "should" sort correctly turns out to contain a U+00A0 NO-BREAK SPACE where the user intended a regular space. A family emoji like 👨👩👧 turns out to be five codepoints joined by zero-width joiners (U+200D), not a single atomic character.
Paste café spelled in two different ways and you can see the difference instantly:
c, a, f, U+00E9 LATIN SMALL LETTER E WITH ACUTE — four codepoints, five UTF-8 bytes.c, a, f, e, U+0301 COMBINING ACUTE ACCENT — five codepoints, six UTF-8 bytes.Both render identically. They will not compare equal as JavaScript strings. They will compare equal after running both through NFC with the normalizer. This is the most common variant of the "why doesn't equality work" bug in any Unicode-aware system, and seeing the codepoints side by side makes it obvious.