Melody, an analytic tool for systems managers, provides automatic and self-adapting summarization of computer system information. Summarization rules are generated automatically using statistical inference applied to a large repository of previously collected system descriptions. The rules then are used to identify and display a succinct analysis of a system’s description, focusing on the parts of the description most relevant for problem determination and identification. The Melody tool is used around the world in IBM xSeries health centers, where support engineers perform a variety of analytical tasks, including troubleshooting, systems optimization, maintaining and updating, to name a few.
Melody, or “Machine Learning for Dynamic System Analysis,” is an analytical framework that gives IBM xSeries systems analysts an automatic and self-adapting summarization of computer system information. In its first application, it is helping to determine problems in the IBM xSeries health center.
The IBM xSeries health center deals with thousands of calls every day from customers who have problems with their xServers. In order to identify the source of the customer’s problem, xSeries analysts need to know the operational status and configuration of the machine. Support engineers, for example, are relying on the Dynamic System Analysis (DSA) tool to collect a huge amount of information about the machine’s hardware and software configuration and status. They can view this information via a Web interface. Support engineers, however, can take into account only a tiny portion of the large volume of information that DSA collects -- typically thousands of data items and tens of thousands of event log messages. Hence, they can fail to notice important system information and can waste time on irrelevant data. Melody alleviates this problem by automatically identifying and displaying parts of the machine’s description most relevant for problem determination and identification (see Figure 1).
Figure 1. Melody learning and analysis framework.
The summarization tool automatically identifies important data items for systems analysts. This identification is performed without relying on expert knowledge of any kind. Experts usually can identify some aspects of the system that may apply to specific problems, but the complexity of systems and the ever-changing nature of hardware and software components make it impossible to encode rules manually for every kind of situation. With Melody, prior information from many systems replaces the expert: DSA collects more than 4,000 system descriptions each month. Melody uses this constantly accumulating repository of system descriptions to learn automatically what configuration elements and log messages may be either the cause or symptom of a problem. This knowledge is updated periodically, thereby keeping summarization decisions up-to-date. One challenge associated with this approach: The repository does not provide descriptions of perfectly configured systems. In fact, most systems in this repository already have been diagnosed with a problem.
To learn automatically from this large population of systems, a team in the Machine Learning group at the Haifa Lab devised ways to identify statistically valid inferences that would help the user, while dismissing statistically unimportant differences between systems. The team used techniques for estimating the “missing mass” of a probability distribution, which guaranteed that with high probability, no single configuration alert would have more than a predetermined chance of occurring a priori. This is a novel approach to summarization, because deciding which configuration fields should appear in the summarized view is based solely on the data. By contrast, other approaches require manually encoded rules to guide summarization.
A machine’s serial number, for example, is different from the serial numbers of all other machines, but this difference should not be regarded as the possible source of a problem. It is less obvious whether other configuration items provide relevant information. With Melody, the process of relevance identification is entirely automatic. This means that Melody needs no manual configuration, even when the list of configuration items grows or changes, as is often the case with new DSA versions.
In analyzing log messages to understand a system’s typical behavior, the Research team took into account the timing of messages and their different parameters. The team developed patterning of messages and a hybrid graphical/textual visualization for grouped log messages in order to provide a succinct yet effective log summarization view (see Figure 2).
Figure 2. Melody log analysis
The algorithms developed in the Melody project can be applied to any computer system. The repository may be based on a large population of systems, or on snapshots of the same system (or set of systems) at different times. In the xSeries health center, support engineers already are using Melody to guide them to potentially relevant information, hence reducing the time it takes to handle a customer’s issue and increasing the accuracy of the resolution.
As systems become more complex, they challenge any single individual to understand every situation and possibility that may contribute to problems in a system. Fortunately, complex systems also generate large amounts of data. While no individual can fully interpret this data, data-driven methods such as data mining and machine learning give systems analysts an opportunity to turn this data into usable knowledge -- automatically.
Publications
S. Sabato and S. Shalev-Shwartz. Prediction by categorical features: generalization properties and application to feature ranking. Proceedings of the Twentieth Annual Conference on Learning Theory (COLT). 2007.
S. Sabato. Melody: Reducing Warranty Costs of xServers using Machine Learning. The Fifth Proactive Problem Prediction, Avoidance and Diagnosis Conference (P3AD). April 2007.
S. Sabato, E. Yom-Tov, A. Tsherniak and S. Rosset. Analyzing system logs: A new view of what's important. Proceedings of the Second Workshop on Computer Systems with Machine Learning (SysML). 2007.
Additional Resources
The Machine Learning and Constraint Satisfaction Group (Haifa Research Lab)
MeLoDy project page (IBM)
Last updated September 10, 2007
Rate this article




