Encodes and decodes domain names to and from ASCII-compatible encoding (Punycode).
Punycode is a special encoding system used to represent Unicode characters in ASCII format. It is primarily used to encode domain names containing non-ASCII characters, such as those in different languages or with special characters, into a format compatible with the Domain Name System (DNS). Punycode allows internationalized domain names (IDNs) to be stored, transmitted, and interpreted correctly by systems that only support ASCII characters.
Punycode converts Unicode characters into a string of ASCII characters that can be used in domain names. It achieves this by representing Unicode characters as a series of ASCII-compatible characters, prefixed with xn--
, to ensure the domain name is valid and interoperable with the DNS.
Punycode works by encoding characters that lie outside the standard ASCII range into a series of ASCII characters using a combination of basic Latin letters and numbers. This allows domain names with characters like accented letters or non-Latin scripts (e.g., Chinese, Arabic) to be used in web addresses.
Consider the domain name 例子.测试
, which contains Chinese characters. To encode this domain name in Punycode, the Unicode characters are transformed into ASCII-compatible characters, and the result is:
例子.测试
would be xn--fsq.xn--0zwm56d
.xn--
: To signify that the string is encoded in Punycode, the resulting string is prefixed with xn--
.The domain name münich.com
(which includes the German umlaut character ü
) would be encoded in Punycode as xn--mnich-kva.com
.
Punycode encoding is most commonly used in the following scenarios:
Punycode decoding reverses the encoding process, converting the ASCII-compatible string back into its original Unicode form. This allows a Punycode-encoded domain name to be displayed correctly with its original non-ASCII characters.
For example, xn--fsq.xn--0zwm56d
is decoded back to the Unicode domain 例子.测试
.
xn--
to identify that the domain is encoded in Punycode.In summary, Punycode is a crucial encoding scheme that enables the inclusion of non-ASCII characters in domain names and URLs, making the internet more inclusive and accessible to users around the world who use diverse writing systems.