URL encoding converts special characters in text into a format safe for URLs by replacing them with ASCII codes.
URL encoding, also known as percent encoding, is a method of converting characters into a format that can be transmitted over the internet in a URL. This is necessary because URLs can only safely contain a limited set of characters. Special characters, such as spaces, punctuation, or non-ASCII characters, must be encoded to ensure that the URL is valid and functions correctly.
A URL consists of a set of characters that can include:
-
, _
, .
, ~
)However, other characters like spaces or symbols such as &
, ?
, #
, or non-ASCII characters (like ç
or 汉
) could be misinterpreted by web servers, browsers, or other systems when part of a URL. URL encoding ensures that such characters are replaced with a valid ASCII format.
URL encoding replaces unsafe characters with a %
followed by two hexadecimal digits representing the character's ASCII code. For example:
becomes %20
.+
becomes %2B
.!
becomes %21
.The encoding process is straightforward: each byte of a character is converted to its hexadecimal equivalent, and then prefixed with a %
sign.
%
: The hexadecimal value is prefixed with a %
sign to form the encoded character.For example:
) has an ASCII value of 32, which is 20
in hexadecimal. It is encoded as %20
.&
) has an ASCII value of 38, which is 26
in hexadecimal. It is encoded as %26
.If we have the following URL:
https://www.utilcrate.com/search?query=hello world&category=books
We can encode it as:
https://www.utilcrate.com/search?query=hello%20world&category=books
Notice that the space in hello world
is encoded as %20
, and the &
between the query parameters is already a safe character but is often encoded in certain contexts as %26
.
query=hello world
becomes query=hello%20world
.&category=books
remains the same because &
is a reserved character for separating query parameters but might also be encoded in some contexts as %26
.Here are some common characters and their percent-encoded equivalents:
Character | Encoded |
---|---|
Space | %20 |
& | %26 |
= | %3D |
? | %3F |
/ | %2F |
+ | %2B |
# | %23 |
, | %2C |
: | %3A |
; | %3B |
% | %25 |
When URLs contain query parameters, URL encoding ensures that the data passed in the URL remains intact, without interfering with the URL structure. For example:
https://www.utilcrate.com/?search=cat&color=blue&size=large
Each of the values in the query string (cat
, blue
, large
) is typically ASCII-safe, but if any parameter contains unsafe characters (e.g., spaces or special symbols), they will be encoded.
URL decoding is the reverse process of URL encoding. It takes the percent-encoded characters and converts them back to their original form. For instance:
%20
is decoded back to a space character.%26
is decoded back to an ampersand (&
).Most modern programming languages and web browsers can automatically decode URL-encoded strings when necessary.
URL encoding is used in a variety of scenarios:
/images/汉字/
).%
followed by two hexadecimal digits to represent unsafe characters.&
, =
, and ?
.In summary, URL encoding is an essential technique for ensuring that URLs are safe, readable, and functional across different systems. It is widely used in web applications, APIs, and data transmission over the internet.