The learning optimizer (LEO) project uses autonomic feedback loops to create a self-tuning database query optimizer. LEO self-validates and adjusts its model to improve query optimization and execution without requiring any user interaction. The goal of LEO is to reduce the total cost of owning database management systems by simplifying database administration.
Structured query language (SQL) has emerged as the industry standard for querying relational database management systems, largely because a user need only specify what data is wanted, not the details of how to access that data. A query optimizer uses a mathematical model of query execution to automatically determine the best way to access and process any given SQL query. This model is heavily dependent upon the optimizer’s estimates for the number of rows that will result at each step of the query execution plan (QEP), especially for complex queries involving many predicates and/or operations. These estimates rely upon statistics on the database and modeling assumptions that may or may not be true for a given database.
The goal of the LEO (Learning Optimizer) project is to develop an autonomic query optimizer that automatically self-validates its model without requiring any user interaction to repair incorrect statistics or cardinality estimates. By monitoring queries as they execute, the autonomic optimizer compares the optimizer’s estimates with actuals at each step in a QEP, and then computes adjustments to its estimates that may be used during future optimizations of similar queries. Moreover, estimation errors can also trigger re-optimization of a query in mid-execution. The autonomic refinement of the optimizer’s model can result in a reduction of query execution time by orders of magnitude at negligible additional run-time cost. Database users will benefit from LEO by reduced administration time, fewer problem queries, and overall improved query performance with increased robustness and predictability of query response times.
The LEO Feedback Loop
The LEO feedback loop is comprised of four steps: monitoring, analysis, feedback and feedback exploitation. The monitoring component saves at query compilation time the cardinality estimates derived by the optimizer for the best (i.e., least-cost) plan, and during query execution saves the actual cardinalities observed for that plan. The analysis component uses the monitored information to identify modeling errors and compute corrective adjustments. This analysis is a standalone process that may be run separately from the database server, possibly on another system. The feedback component modifies the catalog statistics of the database according to the learned information. The exploitation component closes the feedback loop by using the learned knowledge in the system catalog to provide adjustments to the query optimizer’s cardinality estimates.
The four components can operate independently, but form a consecutive sequence that constitutes a continuous learning mechanism by incrementally capturing plans, monitoring their execution, analyzing the monitor output, and adjusting the optimizer's model with the learned knowledge. In addition to using query feedback to reactively adjust the model for future queries, the LEO project considers progressive optimization to re-optimize a query on the fly whenever significant modeling errors are detected, as well as proactive correlation detection through sampling to bootstrap the Learning Optimizer.
The LEO project is multi-disciplinary, as it touches on database research, statistics, information theory, data mining and artificial intelligence. Major research challenges addressed in the LEO project are:
(1) ensuring stability and convergence of the autonomic system,
(2) guaranteeing consistency of the overall optimizer's model upon refinements,
(3) carrying out efficient monitoring in shared-nothing configurations.
Related Publications
Ashraf I. Aboulnaga, Peter J. Haas, Sam Lightstone, Guy M. Lohman, Volker G. Markl, Ivan Popivanov and Vijayshankar Raman. Automated Statistics Collection in DB2 Stinger. VLDB 2004. June 2004.
Ihab F. Ilyas, Volker Markl, Peter J. Haas, Paul Brown and Ashraf Aboulnaga. Automatic relationship discovery in self-managing database systems. International Conference Autonomic Computing (ICAC '04). 2004.
Ihab Ilyas, Peter J. Haas, Volker G. Markl, Paul G. Brown and Ashraf I. Aboulnaga. CORDS: Automatic discovery of correlations and soft functional dependencies. SIGMOD 2004 . ACM, February 2004.
Ihab Ilyas, Volker G. Markl, Peter J. Haas, Paul G. Brown and Ashraf I. Aboulnaga. CORDS: Automatic Generation of Correlation Statistics in DB2. VLDB 2004. June 2004.
Volker Markl, G. M. Lohman and V. Raman. LEO: An Autonomic Query Optimizer for DB2. IBM Systems Journal Special Issue on Autonomic Computing, January 2001.
Vijayshankar Raman, Volker G. Markl, David E. Simmen, Guy M. Lohman and Mir Hamid Pirahesh. Progressive Optimization in Action. VLDB 2004. June 2004.
M. Stillger, G. M. Lohman, V. Markl and M. Kandil. LEO - DB2's LEarning Optimizer. Proc. Intl. Conf. Very Large Data Bases (VLDB). 2001.
News and Information:
Automated Statistics Profiling as the first delivery of LEO technology into DB2 will be part of DB2 Stinger.
The LEO project will be presented at VLDB 2004 in Toronto in August by one talk and two demos.
"Automated Statistics Profiling" will be featured in a talk at IDUG Europe in October in Prague, as well as at the DB2 North America Tech Confernece 2004 in Las Vegas.
Rate this article






