TOOL · HTML

HTML entity encoder

Encode text as named, decimal, and hexadecimal HTML entities — or decode any mix of entities back to plain text.

How it works

HTML provides three syntaxes for embedding a character by reference rather than by literal byte:

Named character references — &name; — drawn from a fixed list defined by the HTML specification (about 2,200 entries). Familiar examples: &, <,  , ©, é.
Decimal numeric character references — &#NNNN; — where NNNN is the codepoint in decimal.
Hexadecimal numeric character references — &#xHHHH; — where HHHH is the codepoint in hex. The x may be uppercase or lowercase.

The encode mode produces all three forms. Named entities are emitted only for the small set of common characters where a named reference exists in HTML5 (this tool ships with about 30 of the most useful). For any character without a defined name, the named output falls back to the decimal form. The decode mode walks the input and replaces any of the three syntaxes with the corresponding character, using the same lookup table for names plus the browser's own DOM parser for completeness.

When do you need entities?

In a UTF-8 document with the correct <meta charset="utf-8"> declaration, you almost never need entities for non-ASCII characters. You can write café and — directly. The exceptions are the four characters that have syntactic meaning in HTML: &, <, >, and, inside attribute values, " or ' depending on the quoting style. Those must be escaped as &, <, >, ", and ' (HTML5 does define ', but it has historically not been supported in older HTML versions; the numeric form is safer if you need to support legacy parsers).

Entities are also useful when generating HTML programmatically with a tool that can't easily emit arbitrary bytes — for example, when concatenating strings into a template that goes through an ASCII-only transport. And they show up frequently in XML and in email (HTML email tooling is notoriously lossy with non-ASCII).

Common named entities

Name	Glyph	Codepoint	Note
&	&	U+0026	Ampersand — must be escaped in HTML
<	<	U+003C	Less-than — must be escaped
>	>	U+003E	Greater-than
"	"	U+0022	Double quote
'	'	U+0027	Apostrophe (HTML5+)
		U+00A0	Non-breaking space
©	©	U+00A9	Copyright sign
®	®	U+00AE	Registered sign
™	™	U+2122	Trade mark sign
…	…	U+2026	Horizontal ellipsis
—	—	U+2014	Em dash
–	–	U+2013	En dash
‘	‘	U+2018	Left single quote
’	’	U+2019	Right single quote / apostrophe
“	“	U+201C	Left double quote
”	”	U+201D	Right double quote
€	€	U+20AC	Euro sign
£	£	U+00A3	Pound sign
¥	¥	U+00A5	Yen sign
°	°	U+00B0	Degree sign
±	±	U+00B1	Plus-minus sign
×	×	U+00D7	Multiplication sign
÷	÷	U+00F7	Division sign
§	§	U+00A7	Section sign
¶	¶	U+00B6	Pilcrow / paragraph sign
&dagger;	†	U+2020	Dagger
&Dagger;	‡	U+2021	Double dagger
•	•	U+2022	Bullet
′	′	U+2032	Prime (feet, minutes)
″	″	U+2033	Double prime (inches, seconds)

Edge cases

A numeric reference can use any codepoint in the Unicode range, including supplementary-plane characters: 😀 and 😀 both produce 😀. There is no named entity for emoji. Conversely, named entities are case-sensitive: &Dagger; and &dagger; are different characters. The trailing semicolon is technically optional for a handful of legacy names in HTML5 but strongly recommended — XML parsers require it.

HTML entities and escapes — the long-form guide
Codepoint converter — entity output for one character
Character inspector — see what's actually in a string
URL encoder — the URL analogue
— Em Dash
€ Euro Sign
General Punctuation block

How it works

When do you need entities?

Common named entities

Edge cases

Related