LeakBot

Innovation Matters


An application with a memory leak will eventually terminate abnormally, after exhausting all available memory. Applications written in languages such as Java or C#, despite the automatic reclamation of memory, nonetheless may still have memory leaks. Finding the cause for leaks in such settings is an immense challenge. This is in part because the leaks can occur for a number of different reasons: e.g. forgetting to deregister as a listener, improperly implementing object equality, sizing caches to be larger than available memory. They are also difficult to diagnose because of problems of scale: if you're leaking objects, there's bound to be a lot of objects to dig through! Finally, applications these days are composed of a myriad of frameworks that no one developer entirely understands. Even for experts, manually digging through the mountain of details is error prone and time consuming. We have developed a tool that automates this process, called Leakbot. It uses graph algorithms to group the live objects of an application into equivalence classes, and pattern matching to look for trends in the constituency of each class. A leak is a class that exhibits monotonically increasing number of matches.


LeakBot regions
Figure 1. We group the live objects in an application into equivalence classes, based on how they evolve, and where they are situated in the object reference graph. For example, in today's complex applications, we find groups of data structures that evolve like a leak (growing without bound), or like a cache, or a pool.

Our group has always been driven by work with important IBM customers. These engagements were the origin of the ideas behind Leakbot. When analyzing memory leaks on these engagements, the developers typically provided us with snapshots of the Java heap. Most existing tools required us to digest a graphical view of the snapshots: a node for each object, and an edge for each reference between them. However, these graphs were very large, consisted of objects from unfamiliar frameworks, and were connected in complex ways. As the reliance on frameworks increased, we found that manually poring over the individual details of the data became more painful. Nobody had the time or expertise to dig through the mountain of details manually.

Unfortunately, simple summaries of the graph didn't help. Knowing the common data types only told us that there are lots of strings. But applications weren't just leaking strings, they were leaking whole data structures.

We next used graph algorithms to look for the big data structures. This step was beyond most tools, but alone was not enough. Sometimes, data structures were just big, not bugs. Even then, knowing that a class loader owned most of the objects didn't help. The heads of the leaking data structures usually lay somewhere between the class loader roots and the strings.

Regionskey
Figure 2. We identify three attributes of each object to define the eqiuvalence class to which an object belongs. We then match the canonical keys against a series of memory snapshots, and look for trends.

Therefore, we implemented analyses, based on scalable graph algorithms, to automate the discovery of those leaking data structures and the context in which they are situated. It was particularly important to distinguish based on context, because the leaking structures were often of a common type, from a framework the application used. For example, a single application these days uses XML documents for several purposes, but only one is problematic. Our analyses groups objects into equivalence classes based on their type and context. It is very likely that the same code logic governs whether or not the objects in a class will eventually be collected, or will live forever. The analyses then look for how the equivalence classes evolve over time. Each equivalence class is described by a pattern: a location in the graph, and a type that is leaking. We look for trends in the membership of each class by matching the patterns against memory snapshots from the application, each taken over time as the application runs.

This technology now ships as part of Rational Application Developer version 6.0, and as part of the Memory Dump Diagnostic for Java technology preview. We are currently working on enhancing the algorithms we use to group objects into equivalence classes, and on an infrastructure that allows the graph analyses to scale to graphs with hundreds of millions of nodes. We are also extending the work to new domains, such as diagnosing problems with excessive memory footprint. Our experience has shown that applications that make heavy use of frameworks very often require too much memory to accomplish their tasks. Analyzing this type of problem shares many of the difficulties of leak analysis. It is even more challenging, due to not having trends to analyze.

Related Publications  

Nick Mitchell and Gary S. Sevitsky. LeakBot: An Automated and Lightweight Tool For Diagnosing Memory Leaks in Large Java Applications. ECOOP 03 - European Conference on Object-Oriented Programming. February 2003 (2003 Pat Goldberg Memorial Best Paper Award).


News and Information
IBM Research Is Software Group's Secret Weapon, eWeek, May25, 2005.

Rate this article

Innovator's corner  

Nick MitchellNick Mitchell Researcher

What is the most exciting potential future use for the work you're doing?
To date, we have mainly been focused on the important problem of memory leak diagnosis. However, our algorithms discover patterns in graphs in a way that is largely agnostic to the semantics of the domain from which the graph originated. We are very excited about the prospects of extending the analyses to new domains. We already have several interesting prospects, some within the same field of dynamic program analysis, and some in entirely unrelated domains. Furthermore, given that our analysis is automated and highly scalable, there are interesting prospects with deploying the technology earlier in the development cycle, in order to catch problems before they reach the deployment phase.


What is the most interesting part of your research?
I love being able to work with our customers to tune and debug their large-scale applications. The challenge of returning to the lab and coming up with something that is both cool and that actually helps address the problems I found in the field is especially satisfying.


What inspired you to go into this field?
I took the introductory CS course at UC Berkeley, taught by Stuart Russell from the Abelson and Sussman book, and my physics major became a minor. Though it might just've been the way he said "lambder".

What is your favorite invention of all time?
My feet thank me for the technologies in running shoes, but my cat really likes all the inventions that enable canning. She's usually right!

Research team  

Anandi Krishnamurthy

Anandi Krishnamurthy

Satish Gupta

Satish Gupta

Indrajit Poddar

Indrajit Poddar

Gary Sevitsky

Gary Sevitsky

Giridhar Sreenivasamurthy

Giridhar Sreenivasamurthy

Related Research