Information-flow violations are the most serious security vulnerabilities in today's Web applications. In fact, the Open Web Application Security Project (OWASP) reports that the six most common security vulnerabilities are in the area of information flow. Detecting such vulnerabilities may be difficult. Due to the size and complexity of real-world Web applications, manual code inspection is often ineffective, and security testing may remain inconclusive due to insufficient coverage. We have designed and implemented Taint Analysis for Java (TAJ), a static-analysis algorithm that has been embedded in an IBM product, Rational AppScan Developer Edition. TAJ automatically detects four of the top six Web-application security vulnerabilities:
- Cross-site scripting (XSS) attacks occur whenever a Web application accepts data originating from a user and sends it to another user's browser without first validating or encoding it. For example, an attacker can embed JavaScript code into his or her profile on a social Web site. That code will be executed on the browser of any other user visiting that profile. This is the most common vulnerability according to the OWASP.
- Injection flaws arise when a Web application accepts input from a user and sends it to an interpreter as part of a command or query without first validating it. An attacker can trick the interpreter into executing unintended commands or changing data. The most common attack of this type is Structured Query Language injection (SQLi). This is the second most frequent vulnerability according to the OWASP.
- Malicious-file executions happen when a Web application improperly trusts input files, or uses unverified user data in stream functions, thereby allowing hostile content to be executed on the server. This is the third most common vulnerability according to the OWASP.
- Information leakage and improper error-handling attacks take place when a Web application leaks information about its own configuration, mechanisms, and internal problems. Attackers use this weakness to steal sensitive data or refine their attacks. This is the sixth most common vulnerability according to the OWASP.
Each of these vulnerabilities can be seen as an integrity problem in which data coming from an untrusted source propagates, through data- and/or control-flow, to data that is used in a high-integrity sink without being properly endorsed. Endorsement can take the form of a verification performed by a validator, or a correction performed by a sanitizer.
While there is a growing trend of static-analysis research in the area of information-flow security for Web applications, it has not been possible to apply the solutions proposed so far to industry-level Web applications. Many of the existing solutions entail complex, non-standard type systems, which are unlikely to enjoy broad adoption. Other solutions, based on program slicing, avoid overwhelming users with too many false positives by computing precise slices but, as a consequence, they are unscalable.
TAJ includes what we believe is the "right combination" of various static-analysis solutions: it is precise enough to produce a low false-positive rate, yet scalable enough to allow the analysis of large applications. TAJ models characteristics of Java Platform, Enterprise Edition (Java EE) applications typically omitted in previous work. Such characteristics, when modeled correctly, are known to limit analysis scalability. Furthermore, TAJ includes a set of techniques that allow it to run and produce useful results on extremely large applications, even when constrained to a limited time budget.
More specifically, TAJ makes the following contributions:
- Hybrid thin slicing.TAJ is based on a novel thin-slicing algorithm that combines flow-insensitive data-flow propagation through the heap with flow- and context-sensitive data-flow propagation through local variables, achieving precision and scalability when tracking tainted flows.
- A sound and effective model of the execution of Web applications. TAJ models reflective calls, tainted flows through containers, detection of taint in the internal state of objects, the JavaServer Pages (JSP), Enterprise JavaBeans (EJB), Struts and Spring frameworks, and many other challenging features that have largely been ignored in the literature, but that are essential for precise analysis of Web applications.
- A set of optimization-under-constraints techniques. When applications are extremely large and the end user still requires the analysis to terminate in a short time, TAJ supports a prioritization policy that focuses the analysis on portions of the Web application that are likely to participate in taint propagation.
The results obtained by executing TAJ on production-level applications show considerable improvement over previous work.
Currently, we are extending TAJ in three directions:
- Support for new languages
- String-analysis capabilities for automatic detection and verification of validators and sanitizers
- Integration of white box (static analysis) and black box (testing of functional requirements)
TAJ is implemented on top of IBM's T.J. Watson Libraries for Analysis (WALA).
Contributors
Omer Tripp, Marco Pistoia, Stephen J. Fink, Takaaki Tateishi, Julian Dolby, Manu Sridharan and Omri Weisman.
Publications
- PLDI 2009 — Omer Tripp, Marco Pistoia, Stephen J. Fink, Manu Sridharan and Omri Weisman. TAJ: Effective Taint Analysis of Web Applications. Accepted for Publication in Proceedings of the ACM SIGPLAN 2009 Conference on Programming Language Design and Implementation (PLDI 2009), Dublin, Ireland, June 2009.
Patents
- Stephen Fink, Yinnon A. Haviv, Marco Pistoia, Omer Tripp and Omri Weisman. Importance-Based Call Graph Construction. Filed in the United States Patent and Trademark Office, March 2009.
- Shinya Kawanaka, Marco Pistoia, Guy Podjarny, Ory Segal, Adi Sharabani, Takaaki Tateishi and Sachiko Yoshihama. Improved Crawling of Object Model Using Transformation Graph. Filed in the United States Patent and Trademark Office, August 2008.
- Marco Pistoia, Takaaki Tateishi, Omer Tripp, and Omri Weisman. A Client-Driven Refinement-Based Static Analysis Method for Identifying Chainable Accesses to a Logical Container. Filed as Docket IL8-2008-0188 in the United States Patent and Trademark Office, June 2008.