2017  Kodetalk | Feedback | Privacy Policy | Terms | About
userimage

Learn About Handle Special Characters With URL Encoding.

Overview

Recently I worked on a web based application where it deals with n number of dynamic URLs. The business of the application keep on growing with the new URLs everyday. The problem is the context which will append with the base URL that nobody can guess because its capable to accept the dynamic path variable.


In case anyone dealing with the similar kind of problems like,

  • Certain special characters are not allowed in the URL entered into the address bar
  • Special Characters Not Allowed
  • Links / URLs containing special characters are not recognized
  • Got HTTP 404 Not Found Error by accessing the web page even though application is running fine into the server.

then this post may be useful for you.


Objective

Convert the characters having some special meaning within URLs into a valid US-ASCII format. Explanation of the different programmatic approaches with examples.


Problem

Suppose I am browsing a URL http://www.example.com/product/buiscuit/ord#331 then the browser will complained with HTTP 404 Not Found even though its valid. Its because, the spacial character # which has a different meaning within the URL.


Solution

URL Encoding (Also known as Percent-encoding)

URL Encoding is the practice of translating unprintable characters or characters with special meaning within URLs to a representation that is unambiguous and universally accepted by web browsers and servers.


According to RFC 3986  "The characters in a URL only limited to a defined set of reserved and unreserved US-ASCII characters. Any other characters are not allowed in a URL. But URL often contains characters outside the US-ASCII character set. They must be converted to a valid US-ASCII format for worldwide interoperability".


How to use URL Encoding
To map the wide range of characters that is used worldwide, a two-step process is used:
  • At first the data is encoded according to the UTF-8 character encoding.
  • Then only those bytes that do not correspond to characters in the unreserved set should be percent-encoded like %HH,where HH is the hexadecimal value of the byte.

And it is good to know that, there is no need to write any big size of boiler plate code for this.


In JavaScript

The escape() method does not encode the + character which is interpreted as a space on the server side as well as generated by forms with spaces in their fields. Due to this shortcoming and the fact that this function fails to handle non-ASCII characters correctly, you should avoid use of escape() whenever possible. The best alternative is usually encodeURIComponent(). escape() will not encode: @*/+ and its not at all recommendable because its deprecated.


Use of the encodeURI() method is a bit more specialized than escape() in that it encodes for URIs [REF] as opposed to the query string, which is part of a URL. Use this method when you need to encode a string to be used for any resource that uses URIs and needs certain characters to remain un-encoded. Note that this method does not encode the ' character, as it is a valid character within URIs. encodeURI() will not encode: ~!@#$&*()=:/,;?+'


Lastly, the encodeURIComponent() method should be used in most cases when encoding a single component of a URI. This method will encode certain chars that would normally be recognized as special chars for URIs so that many components may be included. Note that this method does not encode the ' character, as it is a valid character within URIs. encodeURIComponent() will not encode: ~!*()'


For Example

<!DOCTYPE html> <html> <script> function testURLEncode() { var uri = ":@-._~!$&'()*+,=;"; var res1 = escape(uri); document.getElementById("res1").innerHTML = "<b>escape()</b> "+res1; var res2 = encodeURI(uri); document.getElementById("res2").innerHTML = "<b>encodeURI()</b> "+res2; var res3 = encodeURIComponent(uri); document.getElementById("res3").innerHTML = "<b>encodeURIComponent()</b> "+res3; } </script> <body> <button onclick="testURLEncode()">Click Me</button> <p id="res1"></p> <p id="res2"></p> <p id="res3"></p> </body> </html>

Result : 

escape() %3A@-._%7E%21%24%26%27%28%29*+%2C%3D%3B
encodeURI() :@-._~!$&'()*+,=;
encodeURIComponent() %3A%40-._~!%24%26'()*%2B%2C%3D%3B


In Java

URLEncoder should be the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character & nor the parameter name-value separator character =.


A URL String can be simply encoded using the static encode(String s, String enc) method within the URLEncoder class. encode(String s) has been deprecated with encode(String s, String enc) now being the only available method.


For Example

import java.io.UnsupportedEncodingException; import java.net.URLEncoder; public class TestURLEncode { public static void main(String[] args) { String encodedString = ""; String uri = ":@-._~!$&'()*+,=;"; try { encodedString = URLEncoder.encode(uri, "UTF-8"); } catch (UnsupportedEncodingException e) { e.printStackTrace(); } System.out.println("Encoded string : " + encodedString); } }

Result :

Encoded string : %3A%40-._%7E%21%24%26%27%28%29*%2B%2C%3D%3B


And if you interested to know more about URLencoding including examples for other language too then please refer to https://www.rosettacode.org/wiki/URL_encoding and you may visit HTML URL Encoding Reference also for familiarity purpose on the ASCII Encoding Reference