Title: ASPDAC 1998 TUTORIAL Part 1' Embedded System Components DRAFT
1ASP-DAC 1998 TUTORIAL Part 1. Embedded System
ComponentsDRAFT
- Rajesh K. Gupta
- University of California, Irvine.
2Building Systems-On-A-Chip Using Cores
Commodity Hardware -compression -encryption -mode
m -signal proc. -image proc.
Commodity Software - encryption/decryption -
device drivers - legacy code - operating/runtime
system
SOC is a SM of LSI Logic Corporation.
3S-O-C Application Classes
4SOC Design Problem Components
2. HDL Modeling Architectural synthesis Logic
synthesis Physical synthesis
1. Design environment, co-simulation constraint
analysis.
Interface
Analog I/O
3. Software synthesis, Optimization, Retargetable
code gen., Debugging Programming environ.
Processor
ASIC
Interface
4. Test Issues, Test access, Isolation, ATPG
Memory
DMA
Processor cores introduce software part of system
design.
5Co-Design Components
- Specification, Modeling and Analysis
- How to capture designer intent efficiently in a
design language? - HDL optimizations
- Constraint modeling and analysis
- System Validation
- How to use description in building a
(computational) prototype capable of running
actual applications? - Co-simulation, Formal Verification
- System Design and Synthesis
- Delayed partitioning of hardware and software
- Software synthesis and optimizations
- Interface design and optimizations.
9
6Synthesis Tasks
- Operation scheduling, resource binding, control
generation - Scheduling determines operation start times
- minimize latency
- Resource binding resource selection, allocation
- minimize area (maximize sharing)
- Control synthesis
- data-path connectivity synthesis
- detailed resource connections
- steering logic
- connection to the interface
- control synthesis
- synthesize controller that provides
operations/resource enables, operation
synchronization, resource arbitration
7A CAD Methodology for SW
- Automated software synthesis from specs.
- Synthesis tools generate implementation
- Global optimization of the program.
- Optimization used to achieve design goals.
- Analysis and verification tools for feedback.
- Compilation for embeddable software
- Software Optimizations
- Code compression
- Optimization for power
- Instruction-set generation
- Static memory allocation
8Available Core Building Blocks
68030
ARM810
PPC401
9What Is A Core Cell?
- Working definition
- at least 5K gates
- pre-designed
- pre-verified
- re-usable
- Examples
- Processor LSI logic CW4001/4010/4100, ARM 7TDMI,
ARM 810, NEC 85x, Motorola 680x0, IBM PPC - DSP cores TI TMS320C54X, Pine, Oak
- Encryption PKuP, DES
- Controllers USB, PCI, UART
- Multimedia JPEG comp., MPEG decoder, DAC
- Networking ATM SAR, Ethernet
10Core Types
- Soft cores (code)
- HDL description
- flexible, i.e., can be changed to suit an
application - technology independent may be resynthesized
across processes - significant IP protection risks
- Firm cores (codestructure)
- gate-level netlist to be placed and routed
- technology sampled
- Hard cores (physical)
- ready for drop in
- include layout and timing (technology dependent)
- IP is easily protected
- mostly processors and memory
- functional test vectors or ATPG vectors available.
11Core Types and Their Use
Technology ASIC or FPGA
12Core Portability
- Determined by technology independence and data
format. - Technology independence based on the type of core
- both open and proprietary data formats are
current in use.
DEF Design Exchange Format (Cadence) SPEF
Standard Parasitic Extended Format
(Cadence) GDSII Layout format (Cadence) ITL
Interpolated Table Lookup cell-level timing model
(Mentor) LEF Layout Exchange Format (Cadence)
MMF Motive Modeling Format (Viewlogic) NLDM
Non-linear Delay Model (Synopsys) TLF Table
Lookup Format (Cadence) VCD Verilog Change Dump
(Cadence) WGL Waveform Graphical Language (TSSI)
13Timing Information in Firm and Hard Cores
- Timing behavior can be generated from SPICE
inputs - However, it is not always possible for big cores
- static timing information is necessary
- Basic delay model
- propagation delay model from inputs to outputs
- slew model (as a function of load and input slew)
- input/output capacitances
- setup and hold constraints on inputs.
14Systems-On-A-Chip (SOCs)
- Two Types
- Technology-Driven
- Developed In-House, maximum leverage of
technology crown-jewels - Close cooperation between module developers and
system designers - or wide-ranging cross-licensing agreements
between partners - Component-Driven
- Core cells as IP carriers
- IP encapsulated into usable products
- design reuse is critical to IP products
15Component-Driven SOC
- Core supplier different from core user
- Third party IP providers
- Significant technology packaging without
importing it - The IP provider wants to sell a product and not
the technology behind the product - Enormous technical, and legal challenges
- can it be done successfully?
- who guarantees if a SOC works as required
- who is liable in case the end product does not
perform?
16ASIC Cores Availability
- 3Soft uC, DSP, LAN, SCSI, PI
- ARM uC, uP
- Plessey per. controllers, DSP
- Scenix uC, PCI, DMA
- Western Digital Center uC
- TI DSP NEC DSP, uC
- Symbios ARM7 TC
- VAutomation uP, controllers
- CAST 2910A, IDT49C410, DMAc
- LSI logic CoreWare
- IBM Microelectronics
- Motorola FlexWare
- Lucent
One-stop Shops
One-Stop Shops
- Digital Design Dev MIDI
- Hitachi MPGE, PCI, SCSI, uC
- Palmchip MPEG, UART, ECC
- Silicon Engg. micro VGA
- Butterfly DSP DSP, FFT, DFT, ADSL, OFDM
- Int. Sil. Systems ADPCM, FIR
- Analog Devices DSP
- DSP Group Pine, Oak
- LogicVision BIST, JTAG
- ROHM UART, SIO, PIO, FIFOc, Add, Mpy, ALU
- Synopsys DesignWare, ISA, Intel uC
- Chip Express FIFO, RAM, ROM
- VLSI Libraries Memory, Mpy
- Eureka PCI Virtual Chips PCI, USB
- Logic Innovations PCI, ATM
- OKI PCI, PCMCIA, DMA, UART
- Sand USB, PCI
- Sierra ATM SAR, Ether, R3000
- Focus Semi PLL, VCXO
- VLSI Cores Encryption, DES
- ASIC Intl DES
NOT EXHAUSTIVE.
17FPGA/CPLD Cores Availability
- Capacity constrained cores
- do not include wide/high performance PCI, ATM
SAR, or Microprocessors - Altera
- 8-bit 6502
- DMAC 8237
- Xilinx
- PCI
- Actel
- System Programmable Gate Array (SPGA)
- combine FPGA with customer ASIC
- ASIC examples PCI, Router, DMA controller.
18Current Core Market Models
Three ways
- 1. A design house licenses design and tools
- DSP Group (Pine and Oak Cores), 3Soft, ARM (RISC)
- offering includes HDL simulation model, tool
and/or an emulator - customer does the design, fab.
- 2. Core vendor designs and fabs ICs
- TI, Motorola, Lucent
- VLSI, SSI, Cirrus, Adaptec
- 3. Core vendor sells cores, takes customer
designs and fabs ICs - LSI logic, TI, Lucent
Licensable
Foundary Captive
Foundary captive cores do not have to reveal
internal design and layoutof the core. The
foundary provides a bounding box.
19Core Trends1997 Survey of Designers
Months to completion
- 74 hardware designers.
- 26 plan to purchase core for next design
- 40 hard, 68 soft, 32 firm
Source Integrated System Design
20Application Needs
Source Integrated System Design
21Using Cores PCI
- Class of interface cores such as
- USB, UART, SCSI, PCI, 1394 etc.
- Identify target technology
- ASIC, FPGA
- PCI (Peripheral Component Interface)
- processor independent CPU interface to
peripherals - multi-master, peer-to-peer protocol
- synchronous 8-33 MHz (132 MB/s)
- arbitration central, access oriented, hidden
- variable length bursting on reads and writes
- (I/O, Mem) x (Read, Write) and IACK commands
22PCI Cores
- VHDL/Verilog synthesizable cores with options
- PCI-Host, PCI-Satellite
- 32-bit (33 MHz) or 64-bit (66 MHz)
- FIFO or register data storage
- Synchronous or Asynchronous host interface
- Core components
- Master/Target Read/Write FIFOs,
- Master/Target State Machines
- Configuration registers
- Timing requirements
- input setup time 7ns clock to output delay
11ns - DC Specs input pin caps 10 pF, clk pin 12 pF,
ID Sel 8pF
23User Experience
- Huges Network Systems
- DirecPC ASIC in a satellite receiver card
- 80K gates device on Chip Express process
- DirecPC consists of
- IDT R3041 RISC controller
- Memory, Demodulator, Error-check, PCI core
- PCI core from Virtual Chips
- 17K gates including asynchronous FIFOs
- Guesstimate 4K extra gates due to the core (5)
- Comments
- Their test vectors assume you have direct access
to the internal interface of the core. I looked
through their test vectors and tried to do the
same things using my back end. - They were kind of giving us a reference
documentation. It wasnt turnkey.
Source EE Times
24Using Cores DSPs
- 16-bit fixed point processors are most commonly
used. - DSPs
- simple Clarkspur Design CD2450 (variable data
width) - compatible DSPGroup, TI, SGS-T 320C5x
- clone
- Options
- memory, mem controller, interrupt controller,
host port, serial port - Criticals
- power consumption as most DSP applications go
into portable products
25Design using DSP Cores
- Core vendors often supply a development chip or
core version of the COTS processor - board-level prototyping fairly common
- followed by single-chip solution
- To avoid board-level prototyping, a
full-functional simulation model is a must,
particularly for foundry captive cores. - Software tools provided
- assembler, linker, instruction set simulator,
debugger, (high-level language compiler?)
26DSP Sample Points
- TI TEC320C52
- 16-bit fixed-point TMS320C52
- 1Kx16 data RAM, 4Kx16 program RAM
- 2 serial ports, 1 16-bit timer
- and 0.8 micron 15,000-gate gate array
- Motorola 7-Day CSIC
- 8-16 MHz HC08, DMA, MMU, ..
- SGS-Thomson ST18932, ST18950
- 16-bit fixed-point DSPs, 0.5 u, 3.3 volt CMOS,
80MHz - has no off-the-shelf DSP IC
- used in PC sound cards, 950 has a better assembly
Not exhaustive, only a representative sample.
27Third Party DSP Cores
- DSPGroup Pine
- 16-bit fixed-point, 0.8u CMOS, 5.0/3.3 V, 40 MHz
- 36-bit ALU, 16-bit MPY, 2Kx16 RAM/ROM, (prog mem
is outside core) - used in pagers and answering machines
- DSPGroup Oak
- same as Pine, plus includes a bit manipulation
unit - Viterbi decoding support instructions (min, max)
- used in digital cellular telephony
- Clarkspur CD2400, CD2450
- 16-bit fixed-point
- 24-bit ALU, MPY, Acc, 2x 256x16 data RAM/450
makes it 48 bits - used in fax-modem
28One-Stop Shops LSI Logic CoreWare
- Cores for building ASIC for most embedded
applications - laser printer, ATM, PDA, Set-top, Router,
Graphics accelerators, etc. - CPU cores miniRISC CW4K, Oak DSP
- miniRISC compatible with MIPS R4000
- 0.5u CMOS, 2mW/MHz, 60MHz, 3-stage pipeline
- 32-bit address/data bus
- full scan 99 fault coverage, gate-level timing
model - Interface PCI, Fibre Channel, SerialLink
- Networking Ethernet, ATM (SAR), Viterbi, RS
- Compression etc MPEG, JPEG, DAC/ADC.
29Core Examples
- Only a representative sample of cores. Not
exhaustive or even comparative. - Processor cores
- LSI Logic CW4001, CW4010
- ARM (7) processors
- Motorola FlexCore
- Memory cores
- 16M/18M Rambus DRAM
- Multimedia cores
- CompCore CD2
- Networking
- Media Access Controller (MAC)
- Encryption cores
- VLSI cores, ASIC international.
30LSI Logic CW4001 Core
- Behavioral Verilog/VHDL model
- Gate-level timing accurate model
- Specifications
- 60 MHz, 60 MIPS (45 MIPS average), 3 stage
pipeline - 0.5 micron CMOS process, 4 sq. mm., 2mW/MHz
- Full-scan with 99 fault coverage.
- Interfaces
- CBUS, Computational Bolt-On (CBO), Co-processor,
MMU - Customizability
- BIU, cache controller, MDU, MMU, DRAM/SRAM
controllers, timers, caches (lt16K), RAM/ROM, DMAc - Upto 3 Co-processors (FPU, Graphics, Compression,
Network Protocol), MPY/DIV unit, CRC, direct
access to CPU GPRs
31Using CW4001
- Co-processor has its own instruction set
including - read data bus for instruction, rd/wr to external
mem. - read/write to CPU registers, stall and interrupt
CPU - CW delivers 05 and 2631 opc fields to
Co-processor instr. decoder - Coprocessor executs in lockstep with CPU
pipeline stages.
32CW4010 CPU Core
- Verilog/VHDL model with gate-level timing
- 80MHz, 160 MIPS (110 MIPS average), 6 stage
pipeline - 0.5 micron CMOS, 9 sq. mm., 5 mW/MHz
- Integrated cache controllers with separate I and
D caches - cache size from 2-16 KB
- 64-bit memory and cache interface
- Up to 3 co-processors
- Full-scan with 99 fault coverage.
33Advanced RISC Machines (ARM )
- A family of 32-bit RISC processor cores
- ARM6, ARM7 MPU with Cache, MMU, Write Buffer and
JTAG - ARM7TDMI ARM7 with Thumb ISA, ICE, Debug MPY
- ARM8 cached, low power, 5-stage pipe (vs 3 in
others) - StrongARM1, StrongARM2 available as Digital
SA-110 (21285) - Piccolo DSP co-processor for ARM, shares system
bus (AMBA) - support for Viterbi, bit manipulation operations
- four nestable zero-overhead hardware loop
constructs - splittable ALU, 1 cycle dual 16-bit operations
- saturation arithmetic
- 1024 point in place complex radix 2 FFT in 33,331
cycles - Manufacturing partnerships and/or licensing with
- Cirrus logic, GEC Plessey, Sharp, TI and VLSI
Tech.
34ARM Processor Cores
Source ARM Inc.
- Enhancements ARM7D, ARM7DM, ARM7DMI
- M 64-bit result hardware multiplier running at
8bits/cycle - D 2 boundary scan chains for basic debug
- I Embedded ICE debug
- Thumb instruction set
35ARM Enhancements Embedded ICE
- The EmbeddedICE core cell allows debugging of ARM
core embedded with an ASIC - real time address and data-dependent breakpoints
- full access and control of the CPU
- can be reduced for size savings once the part
goes into production.
40KB/s software download
ASIC
ICE
ARM Core
Uses boundary scan pins
Debug Host running ARMsd
EmbeddedICE Cell (creates to core)
Source ARM Inc.
36ARM Enhancements Thumb ISA
- 8- or 16-bit external, 32-bit internal
- Thumb instruction set is a subset of 32-bit ARM
instruction set - 16-bit instructions
- expanded into 32-bit ARM instructions at run
time without any penalty - Up to 65-70 smaller code size compared to ARM
- 130 of ARM performance with 8/16 bit memory
- 85 of ARM performance with 32-bit memory
001
10
Rd
Constant
16-bit Thumb instr.
ADD Rd constant
maj. opc.
min. opc.
dest. and src.
zero extended
always
1110
001
01001
0 Rd
0 Rd
0000 Constant
32-bit ARM instr.
37ARM Applications
- Widely used in a variety of applications
- low cost 16-bit applications
- mobile phones, modems, fax machines, pagers
- hard disk and CD drive controllers
- engine management
- low cost 32-bit applications
- smart cards
- ATM and ethernet network interfaces
- low power, on-chip application code
- high performance 32-bit applications
- digital cameras
- set top boxes, network switches, laser printers
- external memory system (RAM, ROMs)
Courtesy S. Dey, ICCAD96
38Motorola FlexCore
- CPU cores based on 680x0 family
- EC000, EC020, EC030
- all with static operation, 5/3.3 volt supplies
- performance
- EC000 2.7 MIPS _at_16.67MHz, 33 mW
- EC020 7.4 MIPS _at_25 MHz, 150 mW
- EC030 11.8 MIPS _at_33 MHz, 258 mW
- Serial I/O cores 68681UART, MBus, SPI
- RT clock, Dual timer cores
- SCSCI, Parallel I/O, 8051 interfaces
- DRAM, Interrupt, JTAG controllers
- PLA, PLL, oscillators, power management cells.
39Memory Core Example
- Virtual Chips 16M/18M bit Rambus DRAM
- Verilog/VHDL simulation model
- Organization
- two banks, 512 pages per bank, 72x256 per page
- dual internal banks, 2K byte cache per bank
- Programmable ack, write, read delays through
control registers - Synchronous protocol for fast block oriented
xfrs. - Modes of operation
- reset, stand-by, power-down, active
- Deliverable VHDL, Verilog source, test bench,
test vectors, documentations. - Others Sand DRAM, VRAM verilog models.
40Multimedia Cores
MPEG input
Source CompCore
- JPEG compression, MPEG decoding, Video DAC, etc.
- IBM Microelectronics, LSI logic, PalmChip,
Silicon Engineering, Mentor Graphics, CompCore,
Intrinsix VGA - Example MPEG-2 decoder from CompCore
- 70K-80K gates
- 18K bits of internal SRAM
- 16Mbit SDRAM (external)
- bitstream buffering, frames
- 54MHz, 16-bit external mem. bus
CD2 Decoder
microc. interface
Audio Decoder
Video Decoder
virtual mem. controller
synchronization
SRAM
SRAM
SRAM
phy. mem. controller
1Mx16 SDRAM
audio stream
video str.
41Other Core Categories
Networking
Encryption
- Protocol choices
- switched Ether, s. TR, ATM155, ATM25
- Example SYM1000 from Symbios
- HDL code, 3.3 V, 0.5u
- CSMA/CD ethernet
- programmable inter-packet gap.
- Optional CRC insertion, and check
- MII interface to physical layer device
- Host bus interface
- LSI Logic ATMizer
- VLSI Cores
- PKuP encryption core
- implements modular exponentiation
- synthesizable HDL core
- DES core as a synthesizable Verilog model
- two models 8 bytes/8 cycle, 8 bytes/16 cycles
- ASIC International
- DES cores
- Exponentiator Engine
- Hash function cores
42Challenges in Using Cores
- A core cell is not a single product
- a PCI cell consists of 25 separate Verilog files
- plus as many synthesis scripts
- immature interface abstraction
- e.g., there is no direct access to the core from
the end product. Access must be created. - A core is not an end product
- a core cell is design know-how to use it for a
particular process, tools and even application - Testability and testing is a challenge
- as opposed to design, testing is not a
hierarchical problem - using 90 testable cores does not give 90 system
testability - tests are core-specific, not applicable from
primary IO - What is an efficient design methodology using
cores?
43Summary of Part I
- Core cells present a new market opportunity
- core cells are breathing life into many old
designs (6502) - a new class of third-party vendors who bridge
the gap between design houses and EDA vendors. - Productization of cores faces many challenges
- portability of cores versus design reuse
- socketing standards (portability and reuse)
- IP protection encryption, product versus
technology - design and test methodologies
- Research outlook is aligned with industry
expectations - all new designs start with HDL description
- immediate focus on validation, testability issues
- long term focus on software optimization,
complexity management.