Encoders

URL Encode & Decode

Encode URLs with percent-encoding or decode encoded URLs back to readable text instantly.

Advertisement

What is URL Encoding

URL encoding, also known as percent-encoding, is the mechanism by which characters that are not allowed in a URL — or that have special meaning — are represented using a percent sign followed by two hexadecimal digits. For example, a space becomes `%20` and an ampersand becomes `%26`. This allows arbitrary text to be safely transmitted as part of a URL.

URLs were originally designed to carry ASCII characters only, and even within ASCII, certain characters are reserved for special purposes. The slash separates path segments, the question mark begins a query string, the ampersand separates query parameters, and the equals sign pairs keys with values. Without encoding, these characters in user-supplied data would be interpreted as URL structure rather than as data.

URL encoding solves this by escaping reserved and unsafe characters into a form that has no structural meaning. The receiving server decodes the percent-encoded sequences back to the original characters after parsing the URL structure. Understanding when and how to apply this encoding is fundamental to building correct, secure web applications.

  • Also known as percent-encoding
  • Represents unsafe characters as %XX hexadecimal sequences
  • Required for any data that may contain reserved characters
  • Applied differently to path, query, and fragment components

Reserved and Unsafe Characters

Not all characters are equal in the eyes of a URL. RFC 3986 distinguishes between reserved characters (which have structural meaning) and unreserved characters (which are safe to use as-is). Knowing the difference is essential for correct encoding.

Reserved characters include the forward slash, question mark, hash, ampersand, equals, plus, colon, semicolon, comma, and a few others. These characters delimit structural parts of the URL. When they appear in user data, they must be encoded to avoid confusing the parser. For example, a search query containing a slash must encode it as `%2F` so it is not treated as a path separator.

Unreserved characters — uppercase and lowercase letters, digits, hyphen, period, underscore, and tilde — never need encoding. They have no special meaning and are safe in any URL component. Encoding them anyway produces a valid but unnecessarily long URL.

Unsafe characters are those that have historically caused problems in URLs, such as spaces, quotes, angle brackets, curly braces, and pipe characters. These should always be encoded. Spaces in particular are problematic: they are not allowed in URLs at all and must be encoded as `%20` or, in query strings only, as a plus sign.

  • Reserved: / ? # & = + : ; , @ $
  • Unreserved: A-Z a-z 0-9 - . _ ~
  • Unsafe: space, ", <, >, {, }, |, \, ^, [, ], `
  • Always encode unsafe characters to avoid malformed URLs

Percent-Encoding in Practice

In practice, applying percent-encoding correctly means using the right encoding function for each URL component. The path, query string, and fragment have different rules about which characters are reserved, and using the wrong function produces subtly broken URLs.

For the path component, encode using a function that preserves path delimiters (slashes) while escaping other reserved characters. Most languages provide a `encodeURIComponent`-like function for individual segments and a separate function for entire paths. Using `encodeURIComponent` on an entire path would incorrectly encode the slashes, breaking the URL structure.

For the query string, encode keys and values separately using `encodeURIComponent`, then join them with `=` and `&`. This preserves the structural ampersands and equals signs while escaping any reserved characters within the data itself. Spaces in query strings are commonly encoded as `+` rather than `%20`, though both forms are technically valid in form-encoded data.

For the fragment, encoding rules are similar to the query. However, fragments are handled client-side and are not sent to the server, so encoding is mostly about ensuring the browser and JavaScript can interpret the fragment correctly.

  • Use different encoding functions for path, query, and fragment
  • Encode query keys and values separately, then join with = and &
  • Spaces in query strings may be encoded as + or %20
  • Never encode the entire URL with a single function call

Common Pitfalls and Mistakes

URL encoding is a frequent source of bugs in web applications, and certain mistakes appear again and again. Recognizing these patterns helps you avoid them.

The single most common mistake is double-encoding. This happens when data is encoded once and then passed through another function that encodes it again. The percent signs in `%20` become `%2520`, which decodes to `%20` instead of a space. Double-encoding often occurs when one layer of code assumes another layer has not already encoded the data. Establish clear ownership of encoding at each layer to prevent this.

The opposite problem is forgetting to encode at all. User-supplied data dropped directly into a URL without encoding can break the URL structure, leak data into other parameters, or in severe cases enable injection attacks such as HTTP response splitting or open redirect. Always encode user data before placing it in a URL, no matter how benign it appears.

Another subtle issue is mixing encoding schemes. The plus sign means a space in form-encoded query strings but a literal plus in the path. Decoding a path with a form-decoder will incorrectly turn plus signs into spaces. Use the correct decoder for each component.

  • Double-encoding: encoding already-encoded data
  • Missing encoding: placing raw user data directly into URLs
  • Mixing form-decoding (plus = space) with path decoding
  • Encoding entire URLs instead of individual components
  • Assuming data is safe because it "looks" alphanumeric

URL Encoding in Different Languages

Most programming languages provide built-in functions for URL encoding, but the names, behaviors, and edge cases vary significantly. Understanding the differences prevents bugs when moving between languages or integrating systems written in different stacks.

In JavaScript, `encodeURIComponent` encodes everything except `A-Z a-z 0-9 - _ . ! ~ * ' ( )` and is the right choice for query parameter values. The older `escape` function is deprecated and should never be used. `encodeURI` is intended for entire URLs and leaves structural characters like slashes and colons unencoded, making it suitable only for already-formed URLs that need light cleanup.

In Python, `urllib.parse.quote` accepts a `safe` parameter specifying characters not to encode, defaulting to `/`. For query values, pass `safe=''` to encode slashes as well. `urllib.parse.urlencode` builds an entire query string from a dictionary and handles encoding of both keys and values automatically.

In Go, `url.QueryEscape` encodes for query strings (spaces become `+`), while `url.PathEscape` encodes for path segments (spaces become `%20`). PHP offers `urlencode` (form-style, spaces as `+`) and `rawurlencode` (RFC 3986, spaces as `%20`). Choosing the right function for the context is more important than memorizing every detail — when in doubt, consult the documentation and test with edge cases like spaces, plus signs, and non-ASCII characters.

  • JavaScript: encodeURIComponent for query values, never escape()
  • Python: urllib.parse.quote with safe parameter
  • Go: url.QueryEscape vs url.PathEscape
  • PHP: urlencode vs rawurlencode
  • Always test with spaces, plus signs, and Unicode

Best Practices for URL Handling

Beyond knowing the encoding functions, adopting a few best practices around URL handling will make your applications more robust, secure, and maintainable over time.

Never build URLs by string concatenation alone. Use a URL builder or parser library that understands the structure of a URL — scheme, host, path, query, fragment — and handles encoding for you. This eliminates an entire class of bugs, including missing slashes, double slashes, and incorrect encoding.

Validate and normalize URLs before storing or processing them. Parse the URL, check that the scheme and host are as expected, and reconstruct the canonical form. This catches malformed input, prevents redirect-based attacks, and makes URL comparison reliable. Be especially cautious with user-supplied redirect targets to avoid open-redirect vulnerabilities.

Document the encoding contract at API boundaries. If your API expects query parameters to be percent-encoded in a particular way, say so explicitly. This is particularly important for non-ASCII characters, where different encodings (UTF-8 vs Latin-1) produce different percent-encoded sequences.

Finally, test with realistic and adversarial input. Include spaces, Unicode, reserved characters, empty strings, and very long values in your test cases. Encoding bugs often hide in edge cases that simple alphanumeric test data never exercises.

  • Use URL builder libraries instead of string concatenation
  • Validate and normalize URLs before processing
  • Be cautious with user-supplied redirect targets
  • Document encoding expectations at API boundaries
  • Test with spaces, Unicode, reserved characters, and long values