If you’re parsing text or processing old non-English websites, you’ve probably encountered strings like ä. These strings are symptoms of character encoding issues. This article summarizes common encoding errors and what they probably meant. Use the document as a reference to quickly identify the encoding issue you’re facing. Use CTRL+F to search for the symptom you’re seeing.
The selection of issues is highly subjective. Send me an e-mail if you think something is missing.
Symptom | Original | Original encoding | Wrongly decoded as |
---|---|---|---|
Ä | Ä | UTF-8 | ISO-8859-1 |
Ö | Ö | UTF-8 | ISO-8859-1 |
Ãœ | Ü | UTF-8 | ISO-8859-1 |
É | É | UTF-8 | ISO-8859-1 |
À | À | UTF-8 | ISO-8859-1 |
È | È | UTF-8 | ISO-8859-1 |
Ù | ٠| UTF-8 | ISO-8859-1 |
Ç | Ç | UTF-8 | ISO-8859-1 |
 |  | UTF-8 | ISO-8859-1 |
Ê | Ê | UTF-8 | ISO-8859-1 |
ÃŽ | Î | UTF-8 | ISO-8859-1 |
Ô | Ô | UTF-8 | ISO-8859-1 |
Û | Û | UTF-8 | ISO-8859-1 |
Ë | Ë | UTF-8 | ISO-8859-1 |
à | Ï | UTF-8 | ISO-8859-1 |
à | Á | UTF-8 | ISO-8859-1 |
à | Í | UTF-8 | ISO-8859-1 |
Ñ | Ñ | UTF-8 | ISO-8859-1 |
Ó | Ó | UTF-8 | ISO-8859-1 |
Ú | Ú | UTF-8 | ISO-8859-1 |
ä | ä | UTF-8 | ISO-8859-1 |
ö | ö | UTF-8 | ISO-8859-1 |
ü | ü | UTF-8 | ISO-8859-1 |
ß | ß | UTF-8 | ISO-8859-1 |
é | é | UTF-8 | ISO-8859-1 |
à | à | UTF-8 | ISO-8859-1 |
è | è | UTF-8 | ISO-8859-1 |
ù | ù | UTF-8 | ISO-8859-1 |
ç | ç | UTF-8 | ISO-8859-1 |
â | â | UTF-8 | ISO-8859-1 |
ê | ê | UTF-8 | ISO-8859-1 |
î | î | UTF-8 | ISO-8859-1 |
ô | ô | UTF-8 | ISO-8859-1 |
û | û | UTF-8 | ISO-8859-1 |
ë | ë | UTF-8 | ISO-8859-1 |
ï | ï | UTF-8 | ISO-8859-1 |
á | á | UTF-8 | ISO-8859-1 |
à | í | UTF-8 | ISO-8859-1 |
ñ | ñ | UTF-8 | ISO-8859-1 |
ó | ó | UTF-8 | ISO-8859-1 |
ú | ú | UTF-8 | ISO-8859-1 |
¡ | ¡ | UTF-8 | ISO-8859-1 |
¿ | ¿ | UTF-8 | ISO-8859-1 |
’ | ’ | UTF-8 | ISO-8859-1 |
– | – | UTF-8 | ISO-8859-1 |
— | — | UTF-8 | ISO-8859-1 |
ÿþÄ | Ä | UTF-16 | ISO-8859-1 |
ÿþÖ | Ö | UTF-16 | ISO-8859-1 |
ÿþÜ | Ü | UTF-16 | ISO-8859-1 |
ÿþÉ | É | UTF-16 | ISO-8859-1 |
ÿþÀ | À | UTF-16 | ISO-8859-1 |
ÿþÈ | È | UTF-16 | ISO-8859-1 |
ÿþÙ | Ù | UTF-16 | ISO-8859-1 |
ÿþÇ | Ç | UTF-16 | ISO-8859-1 |
ÿþÂ | Â | UTF-16 | ISO-8859-1 |
ÿþÊ | Ê | UTF-16 | ISO-8859-1 |
ÿþÎ | Î | UTF-16 | ISO-8859-1 |
ÿþÔ | Ô | UTF-16 | ISO-8859-1 |
ÿþÛ | Û | UTF-16 | ISO-8859-1 |
ÿþË | Ë | UTF-16 | ISO-8859-1 |
ÿþÏ | Ï | UTF-16 | ISO-8859-1 |
ÿþÁ | Á | UTF-16 | ISO-8859-1 |
ÿþÍ | Í | UTF-16 | ISO-8859-1 |
ÿþÑ | Ñ | UTF-16 | ISO-8859-1 |
ÿþÓ | Ó | UTF-16 | ISO-8859-1 |
ÿþÚ | Ú | UTF-16 | ISO-8859-1 |
ÿþä | ä | UTF-16 | ISO-8859-1 |
ÿþö | ö | UTF-16 | ISO-8859-1 |
ÿþü | ü | UTF-16 | ISO-8859-1 |
ÿþß | ß | UTF-16 | ISO-8859-1 |
ÿþé | é | UTF-16 | ISO-8859-1 |
ÿþà | à | UTF-16 | ISO-8859-1 |
ÿþè | è | UTF-16 | ISO-8859-1 |
ÿþù | ù | UTF-16 | ISO-8859-1 |
ÿþç | ç | UTF-16 | ISO-8859-1 |
ÿþâ | â | UTF-16 | ISO-8859-1 |
ÿþê | ê | UTF-16 | ISO-8859-1 |
ÿþî | î | UTF-16 | ISO-8859-1 |
ÿþô | ô | UTF-16 | ISO-8859-1 |
ÿþû | û | UTF-16 | ISO-8859-1 |
ÿþë | ë | UTF-16 | ISO-8859-1 |
ÿþï | ï | UTF-16 | ISO-8859-1 |
ÿþá | á | UTF-16 | ISO-8859-1 |
ÿþí | í | UTF-16 | ISO-8859-1 |
ÿþñ | ñ | UTF-16 | ISO-8859-1 |
ÿþó | ó | UTF-16 | ISO-8859-1 |
ÿþú | ú | UTF-16 | ISO-8859-1 |
ÿþ¡ | ¡ | UTF-16 | ISO-8859-1 |
ÿþ¿ | ¿ | UTF-16 | ISO-8859-1 |
ÿþ | ’ | UTF-16 | ISO-8859-1 |
ÿþ | – | UTF-16 | ISO-8859-1 |
ÿþ | — | UTF-16 | ISO-8859-1 |
蓃 | Ä | UTF-8 | UTF-16 |
雃 | Ö | UTF-8 | UTF-16 |
鳃 | Ü | UTF-8 | UTF-16 |
觃 | É | UTF-8 | UTF-16 |
胃 | À | UTF-8 | UTF-16 |
裃 | È | UTF-8 | UTF-16 |
駃 | Ù | UTF-8 | UTF-16 |
蟃 | Ç | UTF-8 | UTF-16 |
苃 | Â | UTF-8 | UTF-16 |
諃 | Ê | UTF-8 | UTF-16 |
軃 | Î | UTF-8 | UTF-16 |
铃 | Ô | UTF-8 | UTF-16 |
鯃 | Û | UTF-8 | UTF-16 |
诃 | Ë | UTF-8 | UTF-16 |
迃 | Ï | UTF-8 | UTF-16 |
臃 | Á | UTF-8 | UTF-16 |
跃 | Í | UTF-8 | UTF-16 |
釃 | Ñ | UTF-8 | UTF-16 |
鏃 | Ó | UTF-8 | UTF-16 |
髃 | Ú | UTF-8 | UTF-16 |
꓃ | ä | UTF-8 | UTF-16 |
뛃 | ö | UTF-8 | UTF-16 |
볃 | ü | UTF-8 | UTF-16 |
鿃 | ß | UTF-8 | UTF-16 |
꧃ | é | UTF-8 | UTF-16 |
ꃃ | à | UTF-8 | UTF-16 |
ꣃ | è | UTF-8 | UTF-16 |
맃 | ù | UTF-8 | UTF-16 |
ꟃ | ç | UTF-8 | UTF-16 |
ꋃ | â | UTF-8 | UTF-16 |
| ê | UTF-8 | UTF-16 |
껃 | î | UTF-8 | UTF-16 |
듃 | ô | UTF-8 | UTF-16 |
믃 | û | UTF-8 | UTF-16 |
ꯃ | ë | UTF-8 | UTF-16 |
꿃 | ï | UTF-8 | UTF-16 |
ꇃ | á | UTF-8 | UTF-16 |
귃 | í | UTF-8 | UTF-16 |
뇃 | ñ | UTF-8 | UTF-16 |
돃 | ó | UTF-8 | UTF-16 |
뫃 | ú | UTF-8 | UTF-16 |
ꇂ | ¡ | UTF-8 | UTF-16 |
뿂 | ¿ | UTF-8 | UTF-16 |
Ă„ | Ä | UTF-8 | Windows 1250 |
Ă– | Ö | UTF-8 | Windows 1250 |
Ăś | Ü | UTF-8 | Windows 1250 |
É | É | UTF-8 | Windows 1250 |
Ă€ | À | UTF-8 | Windows 1250 |
Ă™ | Ù | UTF-8 | Windows 1250 |
Ç | Ç | UTF-8 | Windows 1250 |
Ă‚ | Â | UTF-8 | Windows 1250 |
ĂŠ | Ê | UTF-8 | Windows 1250 |
ĂŽ | Î | UTF-8 | Windows 1250 |
Ă” | Ô | UTF-8 | Windows 1250 |
Ă› | Û | UTF-8 | Windows 1250 |
Ă‹ | Ë | UTF-8 | Windows 1250 |
ĂŹ | Ï | UTF-8 | Windows 1250 |
ĂŤ | Í | UTF-8 | Windows 1250 |
Ă‘ | Ñ | UTF-8 | Windows 1250 |
Ă“ | Ó | UTF-8 | Windows 1250 |
Ăš | Ú | UTF-8 | Windows 1250 |
ä | ä | UTF-8 | Windows 1250 |
ö | ö | UTF-8 | Windows 1250 |
ĂĽ | ü | UTF-8 | Windows 1250 |
Ăź | ß | UTF-8 | Windows 1250 |
Ă© | é | UTF-8 | Windows 1250 |
Ă | à | UTF-8 | Windows 1250 |
è | è | UTF-8 | Windows 1250 |
Ăą | ù | UTF-8 | Windows 1250 |
ç | ç | UTF-8 | Windows 1250 |
â | â | UTF-8 | Windows 1250 |
ĂŞ | ê | UTF-8 | Windows 1250 |
Ă® | î | UTF-8 | Windows 1250 |
Ă´ | ô | UTF-8 | Windows 1250 |
Ă» | û | UTF-8 | Windows 1250 |
Ă« | ë | UTF-8 | Windows 1250 |
ĂŻ | ï | UTF-8 | Windows 1250 |
á | á | UTF-8 | Windows 1250 |
Ă | í | UTF-8 | Windows 1250 |
ñ | ñ | UTF-8 | Windows 1250 |
Ăł | ó | UTF-8 | Windows 1250 |
Ăş | ú | UTF-8 | Windows 1250 |
¡ | ¡ | UTF-8 | Windows 1250 |
Âż | ¿ | UTF-8 | Windows 1250 |
’ | ’ | UTF-8 | Windows 1250 |
– | – | UTF-8 | Windows 1250 |
— | — | UTF-8 | Windows 1250 |
√Ñ | Ä | UTF-8 | Mac Roman |
√ñ | Ö | UTF-8 | Mac Roman |
√ú | Ü | UTF-8 | Mac Roman |
√â | É | UTF-8 | Mac Roman |
√Ä | À | UTF-8 | Mac Roman |
√à | È | UTF-8 | Mac Roman |
√ô | Ù | UTF-8 | Mac Roman |
√á | Ç | UTF-8 | Mac Roman |
√Ç |  | UTF-8 | Mac Roman |
√ä | Ê | UTF-8 | Mac Roman |
√é | Î | UTF-8 | Mac Roman |
√î | Ô | UTF-8 | Mac Roman |
√õ | Û | UTF-8 | Mac Roman |
√ã | Ë | UTF-8 | Mac Roman |
√è | Ï | UTF-8 | Mac Roman |
√Å | Á | UTF-8 | Mac Roman |
√ç | Í | UTF-8 | Mac Roman |
√ë | Ñ | UTF-8 | Mac Roman |
√ì | Ó | UTF-8 | Mac Roman |
√ö | Ú | UTF-8 | Mac Roman |
√§ | ä | UTF-8 | Mac Roman |
√∂ | ö | UTF-8 | Mac Roman |
√º | ü | UTF-8 | Mac Roman |
√ü | ß | UTF-8 | Mac Roman |
√© | é | UTF-8 | Mac Roman |
√† | à | UTF-8 | Mac Roman |
√® | è | UTF-8 | Mac Roman |
√π | ù | UTF-8 | Mac Roman |
√ß | ç | UTF-8 | Mac Roman |
√¢ | â | UTF-8 | Mac Roman |
√™ | ê | UTF-8 | Mac Roman |
√Æ | î | UTF-8 | Mac Roman |
√¥ | ô | UTF-8 | Mac Roman |
√ª | û | UTF-8 | Mac Roman |
√´ | ë | UTF-8 | Mac Roman |
√Ø | ï | UTF-8 | Mac Roman |
√° | á | UTF-8 | Mac Roman |
√≠ | í | UTF-8 | Mac Roman |
√± | ñ | UTF-8 | Mac Roman |
√≥ | ó | UTF-8 | Mac Roman |
√∫ | ú | UTF-8 | Mac Roman |
¬° | ¡ | UTF-8 | Mac Roman |
¬ø | ¿ | UTF-8 | Mac Roman |
‚Äô | ’ | UTF-8 | Mac Roman |
‚Äì | – | UTF-8 | Mac Roman |
‚Äî | — | UTF-8 | Mac Roman |
I’ve used the following Python script to create the table.
pairs = [
("UTF-8", "ISO-8859-1"),
("UTF-16", "ISO-8859-1"),
("UTF-8", "UTF-16"),
("UTF-8", "Windows 1250"),
("UTF-8", "Mac Roman"),
]
chars = "ÄÖÜÉÀÈÙÇÂÊÎÔÛËÏÁÍÑÓÚäöüßéàèùçâêîôûëïáíñóú¡¿’–—"
for orig, wrong in pairs:
for c in chars:
try:
enc = c.encode(orig).decode(wrong)
except UnicodeDecodeError:
continue
print("| " + "".join(f"&#{ord(d):d};" for d in enc) + f" | {c} | {orig} | {wrong} |")
This might also interest you