IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Think Research


 


Featured Concept
Chip Verification on a Large Scale

By Rowan Dordick

Novel techniques for verifying the correctness of chip designs are having a major effect within IBM


In Brief:

Finding bugs in a chip's design before it is released to manufacturing can save millions of dollars in added costs. Developments in verification technology at IBM now allow functional hardware to be produced on the initial "tape-out."
Most of us take for granted that the hardware in our computers will perform flawlessly, while we are pleasantly surprised when our applications do. The well-publicized flaw in Intel's Pentium® microprocessor was a reminder, however, that verifying the correctness of a hardware design is a challenge of a very high order. In fact, it is the most time-consuming and critical aspect of the design process, as well as an active area of research.

During the past decade, a series of advances at IBM - notably at the Haifa Research Laboratory - has revolutionized the way in which chips are verified. Indeed, Haifa's transformation from a scientific center, with a relatively limited mission, to a full-fledged research laboratory dates from the early 1980s, when it took on the challenge of verifying a new chip design (see "Early Verification at Haifa").

What is verification? On one level, it is straightforward, and we do it all the time. Reviewing a document to make sure it says what we intended to say is a form of verification. So is turning on and off an appliance, such as a television, to see that it functions as advertised. In short, the task of verification is determining whether some product conforms to its specifications and expected behavior, and, if not, why.

Although the idea may seem obvious, the actual task of verifying the design of a chip - whether it is an application-specific integrated circuit (ASIC), such as a bridge between two data buses, or an entire microprocessor - would appear to verge on the impossible. And, in a sense, it is.

The inevitability of bugs

Unlike most everyday devices whose inputs and operations are effectively predefined, very large scale integration (VLSI) chips must be able to react to a constantly changing environment. Whereas a television, for example, transforms broadcast signals at assigned wavelengths into images and sound, a VLSI chip must be able to respond appropriately to an essentially infinite number of possible inputs - whether they are signals from devices attached to a data bus or a program and its data. For any given input, there is an expected (or "correct") output that results from various components in the chip functioning according to their architectural and design specifications. With the exception of very small designs, there is no way to test all possible combinations of inputs and memory values. As a result, the challenge of verifying a design raises knotty issues of planning, selection and coverage analysis.

In the traditional approach to design verification, debugging a chip is a labor-intensive process, involving hundreds of people who try to think up and run as many tests as time allows. After a certain point, they must send the design to manufacturing - a process called tape-out - and hope for the best. Typically, the best means that multiple iterations are required before even minimally functional hardware is produced.

Such a hit-or-miss approach stretches out the entire design and verification process, resulting in costly delays in reaching the market. With each tape-out taking, on average, four months to complete, the failure to debug a chip at the design stage can consume millions of dollars in added labor and lost revenue.

Haifa's revolutionary approach to verification, therefore, is not simply a technical achievement, but one that has had an immense impact on development costs. By automating much of the verification process, the new methodology has led to the astounding ability to produce functional chips on the initial tape-out (or "RIT," as it is called in IBM - for Release Instruction Tape). While that allows designers to continue debugging their designs by running tests on the hardware, debugging at the software model of the hardware design is still preferable.

Finding the bugs

The need for verification is rooted in the very nature of the design process (see "What's in a Design"). As is true with any creative effort, chip design involves the translation of an abstract idea - namely, the kinds of functions that the chip is to perform - into successively more concrete representations. The final stage is a layout showing where every transistor and interconnection is to be placed.

Because this hierarchical, top-down process takes place in multiple stages carried out by different people, opportunities for ambiguity and error are rife. Verification is the means of keeping the process on track, by ensuring that each level of implementation contains all that was specified in the previous level, and no more.

There are two basic approaches to verifying a design. The first, called simulation, runs tests - for example, what happens when an add instruction is issued to combine the contents of registers X and Y - on the software model of the design, which is coded in a hardware description language (HDL). Because the expected result can be calculated independently, a comparison with the simulated outcome provides a means of telling whether the design is correct. The second approach, called formal verification, seeks to actually prove that the design is correct.

Random, dynamic and biased

Haifa has made pioneering contributions to both kinds of verification, beginning with simulation, for which it developed an approach called biased random test program generation (biased RTPG). Unlike traditional techniques, in which test programs are written by hand and run on the HDL design model after it is complete, RTPG generates test programs automatically and executes them dynamically.

"The significant innovation is the dynamic generation," points out Raanan Gewirtzman, manager of the systems verification group at Haifa. "It means that the process of generating a test program is carried out one instruction at a time, so that at each step you know the result of executing that instruction. That allows you, in planning the next instruction, to use a resource, such as the contents of a register, that was changed by the execution of the previous instruction. You can therefore test dependencies."

During verification, the test generation process is interleaved with the instruction generation execution, which is carried out on a C program - called a reference model - that behaves in accordance with the architectural specification. And that lets one see the expected effect of an instruction. The reference model also allows one to see whether the test program is proceeding in the right direction, thereby avoiding infinite loops and other unproductive tests. When a problem is encountered, it is possible to backtrack or insert an escape sequence, imitating the actions of a human test writer, notes Gewirtzman.

"Although the tests are generated randomly, they are biased toward things that an expert verification engineer would have tested," says Moshe Levinger, who has worked on Haifa's latest test generator, called Genesys. The biases come both from a combination of "knowledge" in the tool itself and from the user directing the tool.

To ensure that the tool adequately tests the bulk of the potential problem areas in a design, a considerable effort must be devoted to planning one's biasing strategy. But there is an inherent difficulty in evaluating the effectiveness of the biased tests. "You can think of an analogy of a wall that you want to stick darts into so that they are evenly distributed," explains Shmuel Ur, who joined Haifa two years ago to help improve its test generation methodology. "The darts are like the test programs and the absence of a means of ensuring coverage is equivalent to playing blindfolded."

Haifa's simulation technology has addressed this issue by developing a variety of biasing strategies for various kinds of functional units within a chip. These strategies balance the probability of testing all the key aspects of the design.

Already recognized as a major advance in IBM's process development, the technology is being successfully used widely within the company.

Formal verification

Haifa's new simulation methods have proved remarkably successful. Yet simulation is a compromise between breadth of coverage and the resources available - time and computer power. And there are never enough. An actual chip, such as an IBM PowerPC(TM) microprocessor, runs through more cycles in a few seconds than can be simulated in software on tens of workstations running for months at a time. Besides, given the choice, one does not want to be merely confident that a design is bug-free; one wants to be certain. Such certainty is possible in some cases. Using formal verification techniques, a VLSI design can be declared free of bugs with the same degree of certitude normally accorded to a proof in mathematics or logic.

There are actually three broad types of formal verification. Whereas one type, known as theorem proving, does mimic the structure of a mathematical proof, the type that Haifa researchers have worked on is called symbolic model-checking, which uses temporal logic to express the required timing relation between events. Yet a third type, called functional equivalence checking, has been used by researchers at the Thomas J. Watson Research Center, who have developed a scheme to verify that the transistor-level design accurately reflects the design at the HDL level (see "Verity - Truth by Proof"). Given the apparent superiority of formal verification, one might wonder why simulation remains the dominant means of verification at the HDL level. The reason is that formal methods also run into the combinatorial wall - the point at which the exponential explosion of possibilities overwhelms the available techniques.

RuleBase

Haifa's formal verification methodology - called RuleBase - grew out of an approach developed by Edmund Clarke, now at Carnegie-Mellon University (CMU) in Pittsburgh, Pennsylvania, and Ken McMillan, in his CMU Ph.D. thesis, published in 1992. As enhanced and extended by Haifa's formal verification team, led by Ilan Beer, the technique became applicable in an industrial environment, both in terms of its usability and its ability to verify larger designs.

The basic idea behind RuleBase is, first, to translate the specifications into a set of rules, each of which represents a fragment of the specification. RuleBase then checks whether the rules hold for a particular environment in which the chip might be used, that is, for particular combinations of legal inputs. The RuleBase verification program assigns a "pass" or "fail" to every rule, allowing the designer or verification team to find very subtle bugs that would be extremely unlikely to be caught by simulation.

The size problem

In developing RuleBase, Haifa researchers were faced with what is known as the size problem. It arises from the fact that symbolic model-checking requires an abstract model of the chip on which to check whether the rules hold. The model is associated with a state space and its transition relation. A state represents an assignment of binary values (zero or one) to all the inputs and one-bit memory latches (or flip-flops) in the design, and the full state space includes all the states that are reachable from a given state for a possible combination of inputs.

The transition relation is needed to find all the states that can be reached in one step, or clock cycle, from a given state. To carry out symbolic model-checking, this transition relation must be calculated and stored.

Because a chip with n latches has 2n states, a large amount - and ultimately an impossibly huge amount - of memory is needed to store the states as the number of latches increases. That requirement imposes a practical limit to the size of the chips that can be formally verified. Among Haifa's major contributions to symbolic model-checking has been the development of innovative ways to reduce the number of latches in a design that need to be considered during the verification process. "By going through a reduction step before verification, designs with as many as a 1,000 latches can now be verified; three to four times more than could be handled by the original model-checking techniques," says RuleBase team member Cindy Eisner (see "Cones of Influence and Other Reduction Techniques").

Improved usability

For each rule that is checked, there are two possible results. In the case of a failure, RuleBase produces a counter-example, showing a situation in which the rule does not hold. The counter-example is shown graphically as a waveform, representing the actual signal, which has been found to occur for a particular set of inputs. For example, if the rule specifies that a certain kind of event should not occur between two other kinds of events, then the counter-example would present the waveform of the forbidden event sandwiched between the other two.

There are dangers in assuming that a pass means nothing more is to be done. "It is possible, however, that the rule is vacuously true," explains Avner Landver, a member of the RuleBase team. "That could happen in the case of a simple rule, such as 'If A occurs, then B must occur.' If A never occurs, this rule will be satisfied, but the person who wrote the rule may not have considered that possibility." To guard against such vacuous truths, RuleBase attempts to produce a so-called witness for every true rule, which shows an instance in which both A and B occur.

Perhaps the most appealing aspect of RuleBase for users is its improved formal language for coding the rules. Called Sugar, it replaces a somewhat arcane language used in the original version with one that is both intuitive and easy to understand, as well as more concise. "We can express a rule in three lines in Sugar that would take 20 lines in the original language," says Shoham Ben-David, one of Sugar's developers.

Catching bugs on the fly

Sugar has had an even more profound effect on model-checking. As is often the case, new notations can make possible entirely new insights into a phenomenon. Sugar, says Beer, was the key to making what is known as "on-the-fly" verification workable.

On-the-fly verification interleaves the verification steps with the process of building the state space, allowing one, in the initial stages, to check the rules on a much smaller number of states and thereby vastly reduce the resources needed for verification. As soon as one finds a bug, one stops, so the rest of the state space and transition relations need never be built.

However, the technique requires a different kind of algorithm for checking the rules, and that algorithm applied to only a small percentage of the rules one needed to check. "With Sugar, we were able to find a way to apply the algorithm to almost all of the rules," says Landver, "which has made model-checking a much simpler technique."

Fewer missed bugs, fewer RITs

Haifa's innovations in verification technology have transformed the design process in IBM, resulting in higher-quality designs, shorter development cycles and substantial cost savings. But the growth in the complexity of VLSI designs and the consequent challenges for those who must verify them mean that Haifa will have to continue to enhance its technology to maintain IBM's competitive advantage.

"There is still much to be done," says Aharon Aharon, manager of the VLSI design and tools department. "The development of new methods that combine simulation and formal verification techniques, and the invention of novel coverage methods and tools, are all subjects for future research." n


What's in a Design

The design of a very large scale integration (VLSI) chip - one containing 100,000 or more transistors - is a fundamentally hierarchical process that begins with a general idea of what the chip is supposed to do. Designers must then describe the functional modules that will result in a piece of hardware able to perform the specified functions.

Each stage adds detail. So, for example, the top-level specification of a microprocessor might call for the capability of adding two 64-bit numbers. At the next level, in what is known as the micro-architecture document, the entire chip is partitioned into functional modules and additional details are provided. In the case of the adder, it might be specified that four 16-bit units are to be connected in a certain fashion, but there would be no mention of the logic gates, let alone the transistors, needed to implement these functions.

These first two stages of the design process provide the basis, or directions, for the heart of the design, in which the natural language descriptions (generally English) of what the chip is to do and how it is to do it are defined in a hardware description language (HDL). The HDL design is a software representation of the hardware at a functional level. Every function that the chip can carry out when it is supplied with appropriate electrical input signals can be carried out in software, using the HDL "model" of the chip.

The HDL design is then mapped onto a transistor-level representation of the chip, which shows the internal structure of the modules - registers, logic gates, etc. - described at the HDL level. While logic synthesis tools - such as BooleDozer, developed at the Thomas J. Watson Research Center - are available to help in this process, manual, or custom, design is still needed for microprocessors to attain the highest possible circuit density and performance.

The final stage of the design process is concerned with deciding where the functional units and their individual transistors should be placed on the chip, with the goal, for example, of minimizing the total length of interconnecting wires between them. Here, too, tools are available to help with this layout problem. Once the layout is complete, the chip is ready for manufacturing. In practice, however, the various stages overlap, and changes at one level - especially those resulting from the correction of faults found by verification - can lead to revisions before the final tape-out.

Early Verification at Haifa

One tends to suppose that great successes must have been carefully planned. In the case of the verification technology developed by the Haifa Research Laboratory it was anything but.

In 1982, Haifa was still a scientific center, devoted to solving application problems of national importance in Israel. When IBM Endicott, however, hired Intel to design a chip for the 370 PC, a small machine on which programmers could write code for mainframes in a local environment, the task was assigned to Intel's design facility in Haifa.

To ensure compliance with IBM 370 architecture, Endicott wished to independently verify the design. Having heard that there was a scientific center in Haifa, the IBM manager of the project, Lance Johnson, wondered if it might be able to help.

While there was no reason to suppose that Haifa had any expertise in verification, it turned out that one of its scientists, Israel Berger, had experience in this area. Haifa turned out to be a good choice. The results were excellent, and the chip was functional on its first pass through manufacturing.

The Intel project was followed with a microcontroller designed by IBM's facility in Essonnes, France. Later, Haifa verified chips for design groups at Burlington, Vermont, and Boca Raton, Florida. But the event that marked the true ascendancy of Haifa's star was the America project, which led to the creation of the IBM RISC System/6000®.

In order to find the best techniques in every area of chip and system design, a company-wide symposium was held in Austin, Texas. After Berger presented the Haifa methodology, John Cocke, the father of RISC technology and the force behind the project, interviewed him in private. "He had questions about our coverage - how we could ensure that certain kinds of problems could be found - and I stressed to him the role of our biasing technology," recalls Berger.

With Cocke convinced of the superiority of Haifa's approach, it quickly got the nod to handle the verification. It was the first state-of-the-art microprocessor that they had verified and led to yet further development of Haifa's biased random test program generator (biased RTPG).

The results of the RS/6000 verification were extraordinary. "It was the first time that a chip of that size proved functional on the initial RIT," says Aharon Aharon, who was hired by Berger while working on his master's thesis to evaluate the verification methodology being applied to the Intel design. "Since then," adds Aharon, "most of IBM has adopted Haifa's approach."

Meanwhile, biased RTPG has continued to evolve, first into a model-based version, MBTG, which added the capability of using a single test generator to produce test programs for many different architectures, and, finally, into its current form, known as Genesys. A related tool, called Genie, is used for verifying the dependencies between processors in multiprocessor systems. And related work is being done on the even more complex task of verifying the design of entire computer systems with a tool called SysGen.


Cones of Influence and Other Reduction

The ability to verify designs with hundreds of latches is essential if model-checking is to be used to verify actual hardware - mostly ASICs, such as bus bridges, cache controllers and so on. Haifa researchers have come up with a number of techniques that reduce the resources needed to run RuleBase.

One reduces the number of latches that need to be examined by determining a "cone of influence" that contains only the latches that would be involved in reaching a particular logical output. Another finds latches that, because of their logical relationship, always have the same value. In such cases, they can be collapsed and treated as a single latch. Still another technique, called constant propagation, finds circuit elements whose value remains unchanged for a given set of inputs. For example, a two-input OR gate always has the output "1" if at least one input is "1," regardless of the other.

Such reduction techniques, which do not affect the generality of the verification process, have extended the range of model-checking to larger chips, and helped transform the technique from an academic research subject into an industrially useful approach to verification.


Verity - Truth by Proof

Haifa's verification techniques are aimed at verifying the so-called register-transfer-level (RTL) representation of a design. For many kinds of chips, that RTL description can serve as the input to a logic synthesis tool, which then automatically produces a correct transistor-level implementation. But to achieve the highest possible performance in a microprocessor, manual design is preferable for large fractions of the design. That introduces the possibility of error, so it is necessary to verify that the RTL and the transistor-level representations of the design are functionally equivalent.

Verity is a formal verification tool developed at the Thomas J. Watson Research Center by a team led by Andreas Kuehlmann. Meant for verifying large, custom CMOS designs, such as microprocessors, it uses symbolic techniques to verify the functional equivalence of the two design representations. This implicitly proves the functional equivalence of the two levels of the design for all possible input patterns.

Verity is used widely throughout IBM, and it is an essential part of the design methodology of the IBM PowerPC microprocessors and of the CMOS microprocessors for the System/390®. "One of its unique virtues," notes Kuehlmann, "is the speed at which it operates, allowing designers to use it in an interactive mode. They can try a particular circuit implementation, then press the Verity button and find out within minutes, or even seconds, whether it's equivalent with the RTL-specification."





    About IBMPrivacyContact