HTML entity encoder
Encode text as named, decimal, and hexadecimal HTML entities — or decode any mix of entities back to plain text.
Encode text as named, decimal, and hexadecimal HTML entities — or decode any mix of entities back to plain text.
HTML provides three syntaxes for embedding a character by reference rather than by literal byte:
&name; — drawn from a fixed list defined by the HTML specification (about 2,200 entries). Familiar examples: &, <, , ©, é.&#NNNN; — where NNNN is the codepoint in decimal.&#xHHHH; — where HHHH is the codepoint in hex. The x may be uppercase or lowercase.The encode mode produces all three forms. Named entities are emitted only for the small set of common characters where a named reference exists in HTML5 (this tool ships with about 30 of the most useful). For any character without a defined name, the named output falls back to the decimal form. The decode mode walks the input and replaces any of the three syntaxes with the corresponding character, using the same lookup table for names plus the browser's own DOM parser for completeness.
In a UTF-8 document with the correct <meta charset="utf-8"> declaration, you almost never need entities for non-ASCII characters. You can write café and — directly. The exceptions are the four characters that have syntactic meaning in HTML: &, <, >, and, inside attribute values, " or ' depending on the quoting style. Those must be escaped as &, <, >, ", and ' (HTML5 does define ', but it has historically not been supported in older HTML versions; the numeric form is safer if you need to support legacy parsers).
Entities are also useful when generating HTML programmatically with a tool that can't easily emit arbitrary bytes — for example, when concatenating strings into a template that goes through an ASCII-only transport. And they show up frequently in XML and in email (HTML email tooling is notoriously lossy with non-ASCII).
| Name | Glyph | Codepoint | Note |
|---|---|---|---|
| & | & | U+0026 | Ampersand — must be escaped in HTML |
| < | < | U+003C | Less-than — must be escaped |
| > | > | U+003E | Greater-than |
| " | " | U+0022 | Double quote |
| ' | ' | U+0027 | Apostrophe (HTML5+) |
| | U+00A0 | Non-breaking space | |
| © | © | U+00A9 | Copyright sign |
| ® | ® | U+00AE | Registered sign |
| ™ | ™ | U+2122 | Trade mark sign |
| … | … | U+2026 | Horizontal ellipsis |
| — | — | U+2014 | Em dash |
| – | – | U+2013 | En dash |
| ‘ | ‘ | U+2018 | Left single quote |
| ’ | ’ | U+2019 | Right single quote / apostrophe |
| “ | “ | U+201C | Left double quote |
| ” | ” | U+201D | Right double quote |
| € | € | U+20AC | Euro sign |
| £ | £ | U+00A3 | Pound sign |
| ¥ | ¥ | U+00A5 | Yen sign |
| ° | ° | U+00B0 | Degree sign |
| ± | ± | U+00B1 | Plus-minus sign |
| × | × | U+00D7 | Multiplication sign |
| ÷ | ÷ | U+00F7 | Division sign |
| § | § | U+00A7 | Section sign |
| ¶ | ¶ | U+00B6 | Pilcrow / paragraph sign |
| † | † | U+2020 | Dagger |
| ‡ | ‡ | U+2021 | Double dagger |
| • | • | U+2022 | Bullet |
| ′ | ′ | U+2032 | Prime (feet, minutes) |
| ″ | ″ | U+2033 | Double prime (inches, seconds) |
A numeric reference can use any codepoint in the Unicode range, including supplementary-plane characters: 😀 and 😀 both produce 😀. There is no named entity for emoji. Conversely, named entities are case-sensitive: ‡ and † are different characters. The trailing semicolon is technically optional for a handful of legacy names in HTML5 but strongly recommended — XML parsers require it.