Notes
Outline
Xtensa: A Configurable  and Extensible Processor
By Suvidhean Dhirakaosal
What is Xtensa?
A configurable and extensible processor.
System designers can optimize Xtensa for embedded applications by 1) sizing and selecting features 2) and adding new instructions.
Provides easy customization for both hardware and software.
Process is simple, fast, and robust.
Why Xtensa?
Previously, processor designs are costly and fixed solutions.
It was not possible to modify these cores for particular application.
Xtensa lets the system designer select and size only the features required for a given application.
The designer may define new system-specific instructions if preexisting features don’t provide the required functionality.
Why Xtensa?                …continued
Xtensa fits easily into the standard ASIC design flow.
Fully synthesizable, therefore designers can use the popular physical-design tools during the place and route process.
Processor Development
Recent research focuses on automatic instruction set design, or reconfigurable/retargetable processors.
These groups implement automatic instruction set design by systematically analyzing a benchmark program to derive and entirely new instruction set for a given micro architecture.
Tensilica focuses on how to generate a high-performance and low-power implementation of a given micro architecture with application-specific extensions.
"Reconfigurable processors couple a general..."
Reconfigurable processors couple a general purpose computer engine with a lot of hardware programmable logic.  (Or entirely hardware programmable logic in extreme case.)
Compared to non configurable processor, the reconfigurable processor can be an order of magnitude slower.
"A compromise by Razdan and..."
A compromise by Razdan and Smith is to use a custom designed high performance processor with small amounts of hardware-programmable logic.
Compiler generated information is used to dynamically reconfigure the hardware programmable logic.
However, differences in operational frequency of the programmable and non programmable hardware requires the system to be simple or deeply pipelined.
"Tensilica processor generator adds application..."
Tensilica processor generator adds application specific functionality at the time the hardware is designed, eliminating need for programmable logic.
However, this also prevents the designer from  modifying the extensions for different applications.
Another approach included adding coprocessors  for application specific fucntionality, but this increased communication overhead.
"Yet another approach is the..."
Yet another approach is the modification of processor at the RTL level.
But this solution is fixed, and any modifications to the extension in the future would require the modification of the RTL level again.  (Tedious.)
"Tensilica uses high-level language"
Tensilica uses high-level language, TIE (Tensilica Instruction Extension) to express processor extensions.
TIE can add new functionality to RLT description and automatically extend software tools.  (Allows C/C++)
No communication overhead since extensions are integral parts of the processor.
Synthesizable Processors
Processors used to be custom designed.
Sophisticated circuit structures.
Efficient (can implement TLA buffers, specialize RAM.)
High frequency (700~1000MHz.
However, requires a lot of development time, and most cases not efficiently designed.
Not suited for embedded systems:
Use different CAD tools than the rest of the system.
Hard to modify to better match the application.
"Arrival of Synthesizable Processors."
Arrival of Synthesizable Processors.
Although cannot match raw frequency of custom designed processors, configurability and extensibility more than compensate for difference in maximum operating frequency.
Easier to integrate into large ASICs.  Matches design flow.  Quickly manufactured.
Enables configuration and extension by designer.
Overview of Xtensa
Xtensa ISA (Instruction Set Architecture) enables configurability, minimizes code size, reduces power dissipation, and maximizes performance.
Base ISA defines approximately 80 instructions (superset of traditional 32-bit RISC instruction sets).
Achieves smaller code size with the use of denser encoding and register window.
Compiler use smaller instructions for most common operations.
Register window eliminates register saves and restores at entry and exit of subroutines.
24-bit and 16-bit instruction formats.
Slide 14
Hardware Implemenation
First Xtensa implementation with traditional RISC five stage pipeline.
Processor accesses instruction cache and tags in first half of I stage.  Computes cache hit/miss signal in 2nd half.
Instruction is decoded and register file is accessed in R stage.
Machine computes  effective address for loads and stores and executes ALU instructions on E stage and also determines if conditional branch is taken.
For loads, the processor accesses data cache in the first half o the M stage and computes the cache hit/miss signal in 2nd half.
Register file is updated in W stage.
Slide 16
Slide 17
Slide 18
Configuration
Configuration process starts at Tensilica website (www.tensilica.com).
Generation process produces the processor’s configured RTL description and its configured software development tools.  (ANSI C/C++ compiler, linker, assembler, debugger, code profiler, and instruction set simulator.)
After generation process, user can measure performance or start hardware synthesis using the RTL description.
The profiler can be used to identify bottlenecks in application performance.
Designer can also map the RTL description to gate-level netlist using industry standard synthesis tools.
Extension via TIE
Tensilica Instruction Extension is a language that lets designers incorporate application-specific functionality in the processor by adding new instructions.
TIE lets designer specify the mnemonic, the encoding, and semantics of single cycle instructions.
Designer can also use TIE to declare new processor state (state register).
Instructions with similar operands can be grouped into classes, which can contain one or more instructions.
The semantics of an instruction are described using a subset of Verilog.
"User state registers can hold..."
User state registers can hold intermediate values or control information.
This makes the TIE compiler automatically generate a library that an RTOS can use to save and restore processor state.
User state registers accessible via predefined instructions like a conventional register file.
Slide 22
"TIE is independent of the..."
TIE is independent of the processors pipeline.
Designers do not have to implement the logic for bypass detection, interlocks. (Done automatically but the TIE compiler.)
The same TIE description can be used with multiple Xtensa implementations.
Slide 24
"TIE compiler automatically extends software..."
TIE compiler automatically extends software tools. (Adds new instructions as intrinsics to the C and C++ compiler.)
The semantics of the new instructions are also translated into native C implementation, allowing the designer to verify the functionality of the instructions.
Slide 26
Advantages of TIE
New hardware is seamlessly integrated into the pipeline.  (No coprocessor communication overhead.)
Easier to verify due to faster simulation times (four or five orders of magnitude faster than RTL).
Hardware and software are configured together automatically.
Slide 28
Slide 29