|
|
|
|
|
|
A configurable and extensible processor. |
|
System designers can optimize Xtensa for
embedded applications by 1) sizing and selecting features 2) and adding new
instructions. |
|
Provides easy customization for both hardware
and software. |
|
Process is simple, fast, and robust. |
|
|
|
|
|
|
Previously, processor designs are costly and
fixed solutions. |
|
It was not possible to modify these cores for
particular application. |
|
Xtensa lets the system designer select and size
only the features required for a given application. |
|
The designer may define new system-specific
instructions if preexisting features don’t provide the required
functionality. |
|
|
|
|
Xtensa fits easily into the standard ASIC design
flow. |
|
Fully synthesizable, therefore designers can use
the popular physical-design tools during the place and route process. |
|
|
|
|
Recent research focuses on automatic instruction
set design, or reconfigurable/retargetable processors. |
|
These groups implement automatic instruction set
design by systematically analyzing a benchmark program to derive and
entirely new instruction set for a given micro architecture. |
|
Tensilica focuses on how to generate a
high-performance and low-power implementation of a given micro architecture
with application-specific extensions. |
|
|
|
|
Reconfigurable processors couple a general
purpose computer engine with a lot of hardware programmable logic. (Or entirely hardware programmable logic
in extreme case.) |
|
Compared to non configurable processor, the
reconfigurable processor can be an order of magnitude slower. |
|
|
|
|
A compromise by Razdan and Smith is to use a
custom designed high performance processor with small amounts of
hardware-programmable logic. |
|
Compiler generated information is used to
dynamically reconfigure the hardware programmable logic. |
|
However, differences in operational frequency of
the programmable and non programmable hardware requires the system to be
simple or deeply pipelined. |
|
|
|
|
Tensilica processor generator adds application
specific functionality at the time the hardware is designed, eliminating
need for programmable logic. |
|
However, this also prevents the designer
from modifying the extensions for
different applications. |
|
Another approach included adding
coprocessors for application
specific fucntionality, but this increased communication overhead. |
|
|
|
|
|
|
Yet another approach is the modification of
processor at the RTL level. |
|
But this solution is fixed, and any
modifications to the extension in the future would require the modification
of the RTL level again. (Tedious.) |
|
|
|
|
|
|
Tensilica uses high-level language, TIE
(Tensilica Instruction Extension) to express processor extensions. |
|
TIE can add new functionality to RLT description
and automatically extend software tools.
(Allows C/C++) |
|
No communication overhead since extensions are
integral parts of the processor. |
|
|
|
|
|
Processors used to be custom designed. |
|
Sophisticated circuit structures. |
|
Efficient (can implement TLA buffers, specialize
RAM.) |
|
High frequency (700~1000MHz. |
|
However, requires a lot of development time, and
most cases not efficiently designed. |
|
Not suited for embedded systems: |
|
Use different CAD tools than the rest of the
system. |
|
Hard to modify to better match the application. |
|
|
|
|
|
|
|
|
Arrival of Synthesizable Processors. |
|
Although cannot match raw frequency of custom
designed processors, configurability and extensibility more than compensate
for difference in maximum operating frequency. |
|
Easier to integrate into large ASICs. Matches design flow. Quickly manufactured. |
|
Enables configuration and extension by designer. |
|
|
|
|
|
|
|
Xtensa ISA (Instruction Set Architecture)
enables configurability, minimizes code size, reduces power dissipation,
and maximizes performance. |
|
Base ISA defines approximately 80 instructions
(superset of traditional 32-bit RISC instruction sets). |
|
Achieves smaller code size with the use of
denser encoding and register window. |
|
Compiler use smaller instructions for most
common operations. |
|
Register window eliminates register saves and
restores at entry and exit of subroutines. |
|
24-bit and 16-bit instruction formats. |
|
|
|
|
|
|
First Xtensa implementation with traditional
RISC five stage pipeline. |
|
Processor accesses instruction cache and tags in
first half of I stage. Computes
cache hit/miss signal in 2nd half. |
|
Instruction is decoded and register file is
accessed in R stage. |
|
Machine computes effective address for loads and stores and executes ALU
instructions on E stage and also determines if conditional branch is taken. |
|
For loads, the processor accesses data cache in
the first half o the M stage and computes the cache hit/miss signal in 2nd
half. |
|
Register file is updated in W stage. |
|
|
|
|
|
|
|
|
|
|
|
|
Configuration process starts at Tensilica
website (www.tensilica.com). |
|
Generation process produces the processor’s
configured RTL description and its configured software development
tools. (ANSI C/C++ compiler,
linker, assembler, debugger, code profiler, and instruction set simulator.) |
|
After generation process, user can measure
performance or start hardware synthesis using the RTL description. |
|
The profiler can be used to identify bottlenecks
in application performance. |
|
Designer can also map the RTL description to
gate-level netlist using industry standard synthesis tools. |
|
|
|
|
|
|
Tensilica Instruction Extension is a language
that lets designers incorporate application-specific functionality in the
processor by adding new instructions. |
|
TIE lets designer specify the mnemonic, the
encoding, and semantics of single cycle instructions. |
|
Designer can also use TIE to declare new
processor state (state register). |
|
Instructions with similar operands can be
grouped into classes, which can contain one or more instructions. |
|
The semantics of an instruction are described
using a subset of Verilog. |
|
|
|
|
User state registers can hold intermediate
values or control information. |
|
This makes the TIE compiler automatically
generate a library that an RTOS can use to save and restore processor
state. |
|
User state registers accessible via predefined
instructions like a conventional register file. |
|
|
|
|
|
|
TIE is independent of the processors pipeline. |
|
Designers do not have to implement the logic for
bypass detection, interlocks. (Done automatically but the TIE compiler.) |
|
The same TIE description can be used with
multiple Xtensa implementations. |
|
|
|
|
|
|
TIE compiler automatically extends software
tools. (Adds new instructions as intrinsics to the C and C++ compiler.) |
|
The semantics of the new instructions are also
translated into native C implementation, allowing the designer to verify
the functionality of the instructions. |
|
|
|
|
|
|
New hardware is seamlessly integrated into the
pipeline. (No coprocessor
communication overhead.) |
|
Easier to verify due to faster simulation times
(four or five orders of magnitude faster than RTL). |
|
Hardware and software are configured together
automatically. |
|
|
|
|
|