Software creative always looking for a challenge

Posts from: March 18, 2010

Escaping HTML in Java

HTML uses some special characters to control how a page is displayed. These characters need to be escaped before placed on a page if they are to be displayed as part of the page content (and not just to control how the page appears). This is similar to the way double quote characters in a C/C++ string have to be escaped in order for code to compile properly. Therefore, a web application needs to escape all user input before rendering HTML back to the user.

There are 2 ways to deal with this, both with their strengths and weaknesses:

  1. Filtering – throwing away all characters that are not in the set of acceptable input characters.
  2. Escaping special characters – escaping all special characters by turning them into their respective HTML entities.

In this short article we will cover the second way. We note however that in order to effectively protect against Cross Site Scripting(XSS) vulnerabilities a combination of both approaches may need to be used. Automated testing are the key to ensuring your application handles all input correctly, preventing malicious acts. More…