By Bob Payne,
Vice President of Strategic Technology
Background-The Systems-on-a-Chip Revolution
Current-generation production silicon process technology can support deployment of cost effective designs in the multiple millions of gates range. Even at a conservative 50-percent silicon utilization rate, VLSI's VSC8 0.35-micron process generation is capable of one million gates per square centimeter (gates/cm2) of silicon die area. VLSI's VSC9 0.25-micron process achieves two million gates/cm2 and VLSI's VSC10 0.2-micron process increases this to 2.7 million gates/cm2. We expect that our forthcoming 0.15-micron process will be capable of integrating more than five million gates/cm2, or roughly 20 million transistors on a piece of silicon the size of a fingernail.
These are not inflated claims. They are based on how many logic transistors VLSI can build into a production grade, economically optimal, integrated circuit in our San Antonio, Texas production fab line. These extremely high levels of integration makes it possible to reduce what used to be huge electronic systems into a single silicon chip.
Physical size reduction on this scale also equates to radical cost reductions. Already, the equivalent of all the computing power deployed by NASA to design, build and control the Apollo moon missions can be obtained in the form of a laptop computer costing a couple of thousands of dollars, retail. Imagine what happens a few years from now when the computing power of two, four or even eight such 1998-model laptops can be distilled onto a silicon chip costing between $20 to $50.
The problem we face today, however, is that while we can inexpensively manufacture almost unbelievably sophisticated ICs, the processes and tools we use to design these circuits have not kept pace with advances in manufacturing technology. "You can build it, but you can't design it!" has become a cry echoing all across the semiconductor industry. The gap between design productivity and manufacturing capability is throttling the industry's ability to deliver the full potential of advanced custom silicon technologies.
The Design Productivity Gap-Paradigm Shift Ahead!
While it is true that design productivity using traditional techniques is improving at a compound annual growth rate of roughly 25 percent, the problem is that the semiconductor industry continues to find ways to cram more transistors into smaller silicon areas at rates faster than predicted by Moore's Law. Moore's Law projects transistor counts per unit of silicon area to double every 18 months, equating to a compound annual growth rate of 59 percent. As you can see, this 59 percent growth in manufacturing potential overwhelms the 25 percent growth in design productivity.
In the early 1990's some technology analysts projected that the semiconductor industry would fall short of the 59 percent Moore's Law growth rate. Ironically, not only has the semiconductor industry maintained Moore's Law-caliber growth rates, it has exceeded them! In addition to smaller process geometries enabled through fundamental improvements in photolithography, the industry has deployed a host of techniques to boost circuit density beyond the expectations of industry road-maps of just a few years ago. At the same time that we've been shrinking photolithography resolutions, we've added more layers of interconnect metal, improved the compactness of our circuit libraries and have learned how to lay out chips more efficiently. These additional improvements in IC density technologies have had the effect of boosting transistor counts per unit area of silicon growth rates well beyond Moore's Law expectations.
The point here is that incremental improvements in design productivity will not close the gap. Bridging the gap will require a complete paradigm shift in the way we design and implement high-gate count custom ICs. The concept of design paradigm, or design style, extends beyond considerations of "tools" or "methodology." Design styles combine tools, technologies and methodologies to provide a set of simplifying assumptions that make it possible to organize and perform a task, in this case chip design.
As the design productivity gap continues to widen, the economic benefit of finding ways to bridge the gap increases, creating a powerful incentive for companies to offer innovative approaches to solving the problem. Companies that can successfully align development productivity with manufacturing capabilities will prosper. System design teams who establish productivity over their competitors will be able to develop end-products that will enjoy overwhelming advantages compared to their competitors. There is a pot of gold at the end of the paradigm shift rainbow. Likewise, there is a graveyard awaiting companies who don't adapt to successful new paradigms.
Preliminary Attempts to Bridge the Gap
The custom silicon and EDA industries clearly recognize that the productivity gap exists and bridging it rates as the top priority for silicon makers and tool vendors alike. While this has led to much ferment within the industry, no one has yet come up with a comprehensive, fully satisfactory approach to resolving design productivity issues. Each class of players concerned with IC design propose a different solution to the problem. Independent IP vendors say that their wares will convert chip design from stringing together individual gates and flops to an plug and play process based on interchangeable circuit block components. EDA companies advocate better EDA tools. Chip makers seem to have also lost sight of the fact that their customers are concerned with building complete end products that integrate hardware and software, not just chips.
In our view, these perspectives do not lead to solving the productivity gap as they fail to address three major productivity concerns facing the systems-on-chip design community--compounding of risks inherent in plugging multiply-sourced IP into a complex design; the difficulties of using software to model complex hardware; and instituting truly integrated approaches to designing both hardware and software.
Compound Risks
The multiply-sourced IP solution's main flaw is that it fails to address the compounding of risks generated by piecing together a design from a large number of blocks emanating from multiple sources. Today's independently-developed blocks typically range between roughly 5,000 gates to 20,000 gates. By comparison, the ARM processor core-the centerpiece of many system-on-a-chip custom ICs--is only a little over 20K gates. Assuming that the average block contains roughly 10,000 gates, it requires 100 blocks to make a million gate chip.
Integrating a building block into a design is not yet a fool-proof, risk-free process. It requires significant effort to design the interfaces between blocks. A mistake or oversight in glue logic design can, as we say in the business, ruin your whole day.
The likelihood of successfully mixing IP from multiple providers, who each have different ideas about bus protocols and modeling strategies, offers a very low probability of first-pass success. The problem here is not so much flaky IP designs, but the need to "hand craft" interfaces between building blocks that lack consistent standards.
Solving this problem requires minimizing the number of unproven, non-standard interfaces. This can accomplished several ways. First, use much higher gate-count blocks that may already integrate a number of previously separate functions. You stand a better chance of success building a million gate chip from three or four 200,000-300,000 gate blocks than 100 10,000 gate blocks. Second, insure that blocks have standardized interfaces. This implies that these blocks should have something standard to interface with, like an on-chip bus. Third, and most importantly, be sure that any blocks you design with are proven in real-world silicon both in their own right and when connected to other blocks.
Software Simulation Paradox
One of the main weaknesses of the current design styles comes when engineers attempt to simulate how a chip design will behave when cast into real-world silicon. The table below shows how much time is required to achieve the same amount of testing that would be required to replicate an hour of real-world hardware operation. The table depicts hardware-based simulation-verification procedures in clear cells and software-based verification in shaded cells.
| Cycle Rate: |
Debug (Hrs): |
Debug Time |
Technology: |
| 1 |
1 |
1 Hour |
Silicon Reference Design |
| 10-1 |
10 |
~1 Working Day |
FPGA |
| 10-2 |
100 |
4 Days |
HW Emulator |
| 10-3 |
1,000 |
1.4 Months |
Throughput Model |
| 10-4 |
10,000 |
1.2 Years |
Transaction Model |
| 10-5 |
100,000 |
~12 Years |
Cycle Accurate Sim Model |
| 10-6 |
1,000,000 |
>1 Lifetime |
RTL Model |
| 10-7 |
10,000,000 |
~1 Millennium |
Gate Level Model |
A prototype design implemented with field programmable gate arrays will typically run ten times as slow as the final application. This is still pretty good performance as far as simulation performance is concerned. Moving down a row of the chart, today's hardware emulators run approximately 100 times slower than the real silicon. Cycle rates slow down even further as one crosses the line from hardware-based modeling to software-based simulation verification methods.
Software-based throughput models can be constructed to evaluate application and operating system execution rate. This kind of model will allow designers to tune the architecture looking for bottlenecks where hardware blocks may be either swamped or starved with data,. Transaction models simulate hardware at a bus functional execution rate, but will run 10,000 times slower than real-world hardware. The cycle-accurate abstraction level is typically simulated with a C-language model that is optimized to give decent speeds, but still runs 100,000 times slower than the final application.. RTL (Register Transfer Level) models are typically implemented in VHDL or Verilog and run a million times slower than the real thing, particularly when modeling a design multi-million gate design.
Gate level simulation, which forms the bedrock of the conventional ASIC design process, is at best capable of achieving several multiples of 10 clock cycles per second. With this performance, it would take more than a thousand years of to achieve the equivalent of one hour of real-time operation of a 100-Megahertz custom IC! After several months of 24-hour-a-day gate level simulation the HW team will have accomplished less than one half second of real-world operation. Would you like to fly on an airplane whose control system hardware has been tested for the first half-second of its working life?
It is no wonder that we are finding that nearly half the deep submicron designs require iteration ("spins") despite efforts to faithfully implement the original design specifications. It is clear that current gate level and RTL level simulations, at best, only scratch the surface of real-world hardware behavior. From this analysis, it follows that some form of hardware-based modeling is required if designers want systems-on-a-chip design end products to conform to their original ambiguous specifications.
Hardware-Software Co-Development Challenge
Although there has been much discussion of hardware-software co-design, co-simulation and co-development, this remains a major weakness of the current system development paradigm. Currently, software development begins after hardware design. This leads to an unnecessarily extended (six months of hardware design followed by six months of software development) system development process. It also leads to suboptimal application performance when hardware development occurs without sufficient consideration for the software that will run on it. Forcing software development to wait until after hardware development concludes just doesn't cut it in today's time-to-market pressure cooker environment.
To achieve parallel co-development of hardware and software, the software team must have some kind of high-fidelity working model of application hardware on which they can execute, debug and test software at the same time hardware development proceeds. We believe that there is no substitute for hardware-based system modeling to achieve true, parallel software-hardware co-design. This statement is likely to be controversial as there are a number of software-based simulation products that claim to support hardware-software co-development using a mixture of cycle accurate simulation models and RTL models. The problem is that these tools cannot support multimillion gate, multimillion MHz systems-on-a-chip hardware enabled by current silicon manufacturing technologies. Today's software simulation tools are more attuned to designs in the 200K gates / 20 MHz clockrate class. When these tools tackle faster, more complex circuits, let alone the million-gate, 100 MHz circuits currently in the process technology sweet spot, they quickly run out of steam. The sooner the software team can begin working with a model of end product hardware that qualifies as a fully functional target system in an application software cross-development environment, the better.
Rapid Silicon Prototyping-Silicon Returns to Silicon Design
Having reviewed the current state of approaches to bridging the design productivity gap, this paper will present the main points of a new design style, developed at VLSI, called Rapid Silicon Prototyping.
Design styles build on the foundations of their predecessors. Industry has progressed from the hand-packed design style of the 1970's to the ASIC design style of the 1980's and then to the software synthesis-based design style of the 1990's. We still hand pack cell layouts and mixed signal functions to create ASIC objects. ASIC objects are processed with synthesis tools. Rapid Silicon Prototyping builds on top of today's software/synthesis-based design style, but offers a new set of simplifying assumptions that make multimillion gate design practical.
Deconfigurable and Extensible Reference Designs
The essence of Rapid Silicon Prototyping is that the best way to design a custom chip is to start with an existing chip-in this case a deconfigurable and externally extensible reference design created specifically to as serve as a development lab prototype for custom end-products. By deconfigurable we mean that prototype ICs will typically contain far more features and internal blocks than most customers will require for a production-grade end product. In deconfiguring a prototype chip in the design process, engineers can "edit out" features that will not be needed in the production version of the IC.
Low risk deconfiguration is enabled by two pillars of deconfigurable reference design architecture: populating the reference chip with proven, reusable on-chip IP building blocks and standardized on-chip buses. Every reusable building block on the chip exists as an fully- documented HDL description that can be compiled and integrated into a manufacturable IC. Not only do the building blocks exist as HDL descriptions, but their appearance on the reference IC tells designers something very important. They work in real-world silicon.
On-Chip Buses
Linking the blocks through standardized on-chip buses extends the "design reuse" concept to cover interconnect. Each block has a standardized interface that plugs into an on-chip databus, largely eliminating the time and risks involved in designing custom glue logic for each block. The buses themselves have been selected for their robustness and ability to keep the chip running even if blocks are bypassed or removed from the design. Through-on-chip buses, the riskiest details of the custom IC integration process have become, in essence, "pre-sweated."
On-chip buses have another advantage. They can be extended off-chip to connect with other bus-compatible peripherals and functional blocks. The off-chip bus extensions lead to standardized bus card slots mounted on a circuit board. This makes the prototype chip extensible by enabling the development team to connect other chips-either free-standing ICs containing requisite IP or FPGAs programmed to implement an IP hardware description-to the prototype chip via board level extensions to the reference design's on-chip buses. To create a single-chip production version of the chip- and board-level prototype, the design team runs the hardware descriptions of the prototype chip's desired building blocks and on-chip bus structures plus the IP from the plugged-in external chips and FPGA through a standard EDA tool flow to compile a manufacturing netlist.
VLSI's on-chip bus strategy focuses on three open industry bus architectures. These include ARM's AMBA System Bus (ASB); an extended version of the ARM Peripheral Bus called VLSI Peripheral Bus (VPB) and the industry-standard PCI bus. Here VLSI is adapting existing standards to provide a standard on-chip interconnection architecture.
Board-Level Prototyping Platforms
In addition to the prototyping reference design chip and bus slots, the prototyping board will also contain provisions for several kinds of memory (RAM, EPROM, FLASH, etc.) communications ports, infrared communications and on-chip debugging connections (JTAG, ICE, etc.) These give the development team ample resources to run application software on the prototyping platform and full debugging visibility into hardware and software operations.
The illustration above shows how the deconfiguration/extension process works. The shaded area of the left hand block diagram shows reusable blocks integrated into a reference design chip. The blocks in the clear area are blocks connected to the reference chip on an external (board-level) extension to the on-chip bus. The production design, shown in the shaded area to the right, deletes unneeded blocks from the reference prototype and integrates reused and modified blocks from the prototype chip plus additional IP blocks that were plugged into the prototyping board's bus slots.
Enabling Hardware-Software Co-Design
Not only does this greatly accelerate and lower the risks of custom IC design, it opens the door to full hardware-software co-design. The prototyping boards are real-world models of the final application. Software developers can use the prototyping boards as target systems for cross development at the same time hardware developers work out the details of what will appear on the production chip.
All prototype reference design chips offered or planned for introduction by VLSI are built around the ARM RISC embedded processor architecture. The ARM architecture is widely supported by leading commercial embedded real-time operating systems as well as VLSI's own JumpStartÔ development tool and run-time software.
The above diagram illustrates how the Rapid Silicon Prototyping process comes together in the development lab. The development team has a wide array of options to add and subtract IP from a design; test and debug it and arrive quickly at a final design for fabbing into first silicon with high confidence that will work as expected on the first pass. Meanwhile, application software developers can use the prototyping environment as a target system to write, debug and test software.
Rapid Silicon Prototyping Benefits and Roadmap
Rapid Silicon Prototyping prototype shows great promise of bridging the design productivity gap that throttles progress towards true system-on-a-chip custom IC designs. Rapid Silicon Prototyping addresses the three main stumbling blocks of current IP-oriented design methodologies-the compounding risk dilemma, the software simulation paradox, and the hardware-software co-design challenge.
Rapid Silicon Prototyping provides custom IC customers with both a significantly accelerated time-to-market and the ability to tackle much higher transistor count designs with high confidence that they will emerge to manufacturing on-time, on-budget. In general, we believe that VLSI's initial implementation of a Rapid Silicon Prototyping design style, embodied in the VLSI VelocityÔ VRSP7 development system can cut the complete design cycle for products derived from it by 50 percent. This doubling of design productivity can be used in several ways:
- 50 percent design cycle reductions for chips integrating more than a million gates.
- Doubling the number of available gates designable in a given period of time.
- Increasing the probability of first-pass success when first silicon comes back from the fab.
The first installment of the Velocity design style, the VRSP7 prototyping custom IC represents a straightforward integration of a high performance ARM 7 RISC core with a collection of widely used data communications interfaces, two on-chip buses and on-chip debugging ports. The complete VRSP7 board level platform includes this chip with board level bus slots, various kinds of system memory, an FPGA, and communications ports. Initial applications for this platform target systems-on-a-chip microcontrollers for computer peripherals and networking equipment.
As the Velocity capability evolves, it is possible to create myriad instances of development platforms targeting specific application areas and technical milestones. We envision, for example, prototyping platforms for digital wireless products, that would bring together collections of proven IP building blocks used in these applications. Prototyping platforms also provide a vehicle for proving and developing products based on advancing technologies. Here we project prototyping vehicles for advanced versions of ARM processors, DSP cores, mixed RISC-DSP chips and so on. We can even create customer-specific prototyping vehicles, targeting individual customer needs and mixing VLSI and customer-owned IP on a prototyping chips.
The prototyping platform concept adds a measure of "compound interest" to VLSI's IP development programs. The more designs we prototype, the more IP we prove in the real-world and can reuse in other designs. This will greatly accelerate our efforts to stock our shelves with proven, reusable IP targeting specific market areas and applications. This gives VLSI a tremendous marketplace edge, but only because it improves our customer's ability to get innovative, market-winning products out the door faster than ever before.
One thing Rapid Silicon Prototyping does not do is to make the current standard cell design style obsolete. It builds on the success of current cell-based design technologies to boost development team productivity. A good cell-based design flow remains vital to the process of converting language-based descriptions of custom IC hardware into verified, efficiently laid-out, power- and timing-optimized, self-testing manufacturable silicon. Rapid Silicon Prototyping helps EDA tools to operate more efficiently and increases their value by freeing them from performing tasks that no software can be expected to do-abstractly model a very complex arrangement of millions of dynamic hardware elements with 100 percent fidelity.
Conclusion: Rapid Silicon Prototyping at the Dawn of the Custom Silicon Era
In the final analysis, Rapid Silicon Prototyping has the potential to realize the full potential of system-on-a-chip technologies and ring in a new era for the semiconductor industry. Converting multimillion gate custom IC design from an expensive, risky, time consuming ordeal into an efficient, high-confidence, rapid-turnaround exercise has profound effects on the economic position of high gatecount custom silicon. Our products should be just as easy to design as they are inexpensive to manufacture. More product developers will turn to custom solutions and consumers will gain the benefits of new generations of faster, cheaper, and altogether niftier end products.
Bob Payne is vice president of Strategic Technology at VLSI. Bob is celebrating his 30th anniversary in the silicon business, having begun his career in 1968 designing logic ICs.
Velocity in Silicon Strategies Magazine | Velocity in the News
Velocity Background Briefing | Velocity Slide Area |Get Even More Info
Press Releases | VLSI's Bob Payne on Velocity | Velocity Product Bulletins
|