Title: Wireless SystemsonaChip Research at the Berkeley Wireless Research Center
1Wireless Systems-on-a-Chip Research at the
Berkeley Wireless Research Center
Bob Brodersen Dept. of EECS Univ. of
Calif. Berkeley
- http//bwrc.eecs.berkeley.edu
2Outline
- What is BWRC?
- Defining a wireless system and how you design it
- Why software radios are a bad idea
- Single chip radios in a day
31990-1997 InfoPad
(Example InfoPad application slide)
Set-top box doubles as basestation and gateway
from WAN
Allows family and personal use of set-top box
access
4The Success of InfoPad - the Research Projects
ATM and Fast Ethernet Backbone
Scheduling for Quality of Service and
SPEECH AND
Capacity Optimization
BASE STATION
HANDWRITING
and WIRED to
MAC Layer Protocols for Up and Down Links
RECOGNITION
WIRELESS
BRIDGE
CDMA PowerControl Algorithms
Optimization of Modulation and SS Techniques
for Interference Limited, Picocellular Channels
User Interface based on Pen and Speech Input
Indoor Picocellular Channel Measurements
Network and InfoPad Aware Design Tools
Support for Distributed Processing for the
Mobile Network
DESIGN
(the InfoNet)
TOOL
APPLICATIONS
Low PowerMonolithic CMOS Radio Implementations -
Wideband Spread Spectrum
DECT Radio (TDMA)
InfoNet
System Level Power Analysis Tools
AUDIO, PEN
AND VIDEO
Migration of processing between Pad and Network
SERVERS
TYPE
Low Power Signal Processing Accelerators
InfoNet
Energy Optimized ARM Processor Core
NAMESERVER,
CELL and PAD
Text/Graphics Decompression for Color Display
SERVERS
Concurrent Electrical/Mechanical Casing Design
5What didnt work...
- Interaction with Industry - great during
retreats, but inconsistent in between - Communication between groups
- Collaborative tools didnt work
- Large overhead in meetings
- Common environment - Spread between multiple
floors of Cory and Soda hall - so . We now have the BWRC
6Center Organization
- University affiliated but industrial support will
be through overhead free gift funds - 7 member companies
- Center lifetime of at least 6 years with yearly
reviews and 3 year informal commitments - New research space off campus
- 11,000 square foot facility designed for
collaboration - Approximately 4-7 faculty members, 40-60 students
and 6 staff
7Center Research
- Involvement from senior researchers from
companies to mentor students - Full time at center (3 weeks/month)
- Part time (week/month)
- Day or two per week
- Research projects driven by center faculty
- Uses inputs and capabilities of members
- Long range to avoid competitive issues
- Goal is to put research in the public domain as
soon as possible
8Application Drivers
- 1) Universal Spectrum Sharing
- An approach to channel utilization which allows
uncoordinated use of spectra without loss in
capacity - Extensible over time to exploit advances in
technology and support new applications - 2) PicoRadio
- System on a chip implementation supporting all
functions up to external interface (sensors,
transducers) - Total power dissipation in the 100s of
microwatts achieved through optimization of
protocols and architectures
9Center Activities
Universal Spectrum Sharing
PicoRadio
Applications
Behavioral/Architectural Specification,
Verification And Optimization
Design Tools
Automated Design
Implementation
PicoNode Testbed
BEE Testbed
IC Implementation
10Outline
- What is BWRC?
- Defining a wireless system and how to design it
- Why software radios are a bad idea
- Single chip radios in a day
11CMOS is the technology
f
t
Hemts,HBTs
100GHz
0.18u
30GHz
0.25u
0.35u
0.5u
10GHz
0.6u
0.8u
1u
GaAs
3GHz
1.5u
1GHz
2u
Bipolar
3u
CMOS
95
97
75
77
79
81
83
85
87
89
91
93
99
Year
12A Complete Wireless System
Communication Algorithms
Analog Baseband and RF Circuits
Protocols
13Wireless System Design Issues
- It is now possible to use CMOS to integrate all
analog and digital radio functions - What makes an algorithm appropriate for
implementation is rapidly changing - Complex analog circuits linearly degrading
- Digital computation exponentially improving
- Even protocols (Physical and MAC level) require
high levels of computation for wideband links
14Our Design Environment for Wireless Systems
SpecificationMatlab, Opnet
Analog Data Processing
Protocols Control
Conceptual
Digital Data Processing
Opnet, VCC
Matlab
Matlab
Behavioral
Simulink
Simulink, Stateflow
C, Stateflow
Structural
Synopsys, Cadence, Unicad
Spectre and Spectre RF
ARMulator,ARM Compiler
Physical
Agilent ADS ASITIC Cadence
ARM FPGAs
Unicad Cadence, Mentor Power TimeMill
15Communications Algorithms and Their
Implementation
- Blast algorithms (Lucent) - antenna arrays which
have demonstrated 40 bits/Hz (1Mb/s in 25kHz) - Multiuser detection - eliminates multiuser
interference - Digital implementation of timing and carrier
synchronization
Requires 1000s of MOPs of processing how to
do it at the lowest energy and smallest area???
16Outline
- What is BWRC?
- Defining a wireless system and how to design it
- Why software radios are a bad idea
- Single chip radios in a day
17First choose the right architecture
.5-5 MIPS/mW
10-100 MOPS/mW
Flexibility
Embedded Processor
DSP (e.g. TI 320CXX )
100-1000 MOPS/mW
Reconfigurable Processors (Maia)
Embedded
Factor of 100-1000
FPGA
Direct Mapped
Area or Power
Hardware
18Fully parallel implementations
- Basic building block - adaptive correlator
- 25 MHz clock
- 36 multipliers
- 1.2 GOPS (operations multiplies,adds and MACs)
- 7 mW
19Comparison - Software vs Direct mapped
- Software solutions gt 100 times less efficient
(even ignoring overhead of parallel processing) - .5-5 MIPS/mW software DSP (best case) processor
- 100-1000 MOPS/mW dedicated
20But arent software processors improving with
Moores law?
- Primary means of performance increase of software
processors is by increasing clock rate
21The Result Is the Power Crises
Source Microprocessor Report
- Increasing clock rate directly increases the
power dissipation
22Is Arbitrary Digital Complexity Possible?
- Complexity is increasing by a factor of 100 every
10 years so that is not a problem - The power requirements are!
- Conclusion the energy efficiency of the
architectures and algorithms is critical
23What is the problem?
The Von Neumann architecture was developed in
1945!!
- The assumptions back then
- Hardware is expensive
- Scientific computation is the application
- Cost, size and power are not an issue
- Hardware and software were separate
-
- Time sharing the
- hardware
- was absolutely necessary
-
24The Situation Now for Embedded Applications
- Hardware is cheap
- Potentially 1000s of multipliers on a chip
- Power, cost and size is critical
- Applications are I/O and DSP intensive
- Software is becoming harder than hardware
- Hardware and software are on one chip
25Time multiplexing a multiplier is that a good
idea?
DSP processor (25 mm2)
12x12 multiplier (.05 mm2)
26Software radios?
- Computation is incredibly inefficient (for
communication algorithms) .5-5 mW/MIP vs.
100-1000 mW/MOP in dedicated hardware - Moores law will not fix this problem
- Myths about software
- Much faster to develop (prototype yes, final
product no) - It is flexible (not true in embedded systems)
- Dont need to make early (or any) decisions
(decisions need to be made sometime) - Can fix problems after the product has shipped
(is this really viable and what does it cost?) - The success of software in the GP environment is
not applicable in the embedded world
27What is the solution?
- Software based parallelism is becoming
increasingly inefficient - Speculative execution, Superscalar, VLIW
- The basic problem is that a conventional software
description obscures the parallelism
Algorithms
Software
Architecture
Parallel
Sequential
Parallel
A Better Approach - skip the sequential
description
Algorithms
Architecture
Parallel
Parallel
28An Energy Efficient Architecture Direct Map
Describe the algorithm using a description which
preserves the parallelism and directly convert to
hardware
29Mathworks tools can be used for algorithm,
analog modeling and protocols
- Matlab - Procedural language for algorithm design
- High level language with I/O support
- Well documented, supported and known
- Extensive libraries for DSP and Communications
- Simulink DSP
- Block diagram discrete time simulator
- Finite word length, explicit clocks
- Analog models
- Stateflow - Control and Protocols
- Extended finite state machine description
- Integrated with Simulink
-
30Outline
- What is BWRC?
- Defining a wireless system and how to design it
- Why software radios are a bad idea
- Single chip radios in a day
31How do we get to a chip?
- Start from an enhanced Simulink/Stateflow
description - Add floorplan
- Based on a library of blocks that have physical
level module generators - Can get estimation of area, power and delay
- Only use synthesis from Stateflow descriptions of
control - Use block level place and route tools work at
multiplier/adder/shifter level not gates
32Mapping the Algorithm into Hardware
33Module generation
- Take parameters (e.g. bit width) from block
diagram as input and generatelayout - Allows deterministic area, power and delay
estimates - Retains optimized density, speed and power of
custom design - Allows reuse
e.g. 12X12 Multiplier
34Energy, Area and Delay parameters from Module
Generators
Energy model of real multiplier in terms of word
length
Area model of complex MAC in terms of word length
35Control
- Stateflow
- Extended Finite State Machine
- Subset of Syntax
- Converted to VHDL
- Synthesized
- VHDL
- Synthesized directly
VHDL Stateflow Macros map to a netlist of
Standard Cells using standard synthesis
36Summary The Standard ASIC Design Flow
Architecture Micro-Architecture
- Difficulties
- Logic Verification
- Timing Closure
- Routing Congestion
Front-End
Critical Problem Indeterminate Design Time
Back-End
- Design Decisions made at Every Step
- Critical information lost below Architecture level
37Our Domain Specific Approach
- Fully Automated
- Make design decisions at top level
- Primary architecture support is for Direct-Mapped
communication algorithms
Goal Provide predictability in the design
process and a fully automated path
38(an aside) Déjà vu???
- The Simulink driven design with parameterized
modules is just the reincarnation of good ole
Silicon Compilation of gt10 years ago - What happened?
- A decline of research into design methodologies
- A single dominant flow has resulted - the
Verilog-Synopsys-Standard Cell - Processor solutions therefore seem competitive
- Lack of methodologies to support alternative
styles of design
39A Complete Wireless System is more than DSP
- Analog RF and baseband circuits
- Amplify
- Mix
- A/D and D/A
- DSP and Communication algorithms
- Protocols
40Minimizing the Analog Components
Analog
Digital
cos(wot)
RF input
I (50MS/s)
(
f
2GHz)
c
A/D
Digital
Baseband
Receiver
RF filter
LNA
A/D
Q (50MS/s)
chip boundary
sin(wot)
Crystal
A zero IF (direct conversion) receiver
41Receiver Prototype
- Active Area 4 mm2
- Noise Figure (DSB) 8.5 dB
- S11 lt -30 dB
- Voltage Gain 41 dB
- -3-dB Bandwidth 90 kHz lt f lt 18 MHz
- -1-dB Compression -31.1 dBm
- IIP2 (27 MHz, 37MHz) - 6.7 dBm
- IIP3 (35 MHz, 60MHz) - 18.3 dBm
- PLL Phase Noise -85 dBc/Hz _at_ 2.5 MHz
- LO-to-RF Leakage -81 dBm
- SD Dynamic Range 42 dB _at_ 200 MHz
- Power Dissipation 106 mW
0.25-mm, 6-metal CMOS process
42Analog RF Flow - What is needed?
- Characterization and modeling of on-chip passive
elements, MOS devices and subcircuits - Integration of Circuit and RF simulation
capability - Sceptre, Sceptre RF, EEsof - Library of reuseable analog modules
- Designs that support technology scaling rules
for analog components - But most importantly
- How to co-design of the analog, communication
algorithms and protocols? -
43Simulink description of a radio system
Rf modeling
Digital modeling
44Baseband equivalent analog modeling
45System Simulation of Zero-IF Receiver
- 10 users (equal power)
- 13.5dB receiver NF
- PLL -80dBc/Hz _at_ 100kHz
- 2.5 I/Q phase mismatch
- 82dB gain
- 4 gain mismatch
- IIP2 -11dBm
- IIP3 -18dBm
- 500kHz DC notch filter
- 20MHz Butterworth LPF
- 10-bit, 200MHz S-D ADC
Output SNR 15dB
46With Analog Impairments
- ideal receiver
- real receiver
- 10 users (equal power)
- 20MHz Butterworth LPF
- 500kHz DC notch filter
- 13.5dB receiver NF
- 82dB gain
- 4 gain mismatch
- 2.5 I/Q phase mismatch
- IIP2 -11dBm
- IIP3 -18dBm
- PLL -80dBc/Hz _at_ 100kHz
- 10-bit, 200MHz S-D ADC
47Our Design Environment for Wireless Systems
Specification(UML)
Analog Data Processing
Protocols Control
Conceptual
Digital Data Processing
Rational ROSE,Visual Modeler
Behavioral
Matlab, Simulink
Matlab, Simulink
Telelogic, Stateflow
Structural
Synopsys, Unicad
HSPICE
ARMulator,ARM Compiler
Physical
HP EESoft ASITIC Cadence
ARM FPGA Express
Unicad Cadence, Power TimeMill
48Conclusions
- What is BWRC?
- Coordinated research effort on Single chip radio
systems - Defining a wireless system and how to design it
- Analog RF and baseband processing, Communication
algorithms and Protocols - Why software radios are a bad idea
- 100 to 1000 times more energy and area, bad
mapping to the technology - Single chip radio implementation in a day
- Algorithm, Architecture and Physical design from
the same description can make it possible
49Conclusions
- A domain specific approach to wireless system on
a chip design can provide - Orders of magnitude improvement in power and
area - Deterministic and rapid chip implementation
- Accurate high level estimation leading to useful
high level design optimization
50Design of the Analog Components
51Block Module Macros
- Block
- Fixed Layout and Schematic (Analog)
- Module
- Parameterized
- Tiled Layout
- Generated Schematic
Block Module Macros map to a Single Abstract
52Microprocessor Macros
- Includes
- Processor
- Memory
- Bus
- Interface
- Hard Soft Cores
- Automatic Code Generation from Stateflow
- Modeled in Simulink
Microprocessor macro is a self-contained
processor subsystem
53DSP and Protocol Design Flow
Specification(UML)
Analog Data Processing
Protocols Control
Conceptual
Digital Data Processing
Rational ROSE,Visual Modeler
Behavioral
Matlab, Simulink
Matlab, Simulink
Telelogic, Stateflow
Structural
Synopsys, Unicad
HSPICE
ARMulator,ARM Compiler
Physical
HP EESoft ASITIC Cadence
ARM FPGA Express
Unicad Cadence, Power TimeMill
54Prototype A Zero-IF Receiver
- Zero-IF architecture for high integration and
efficient power consumption
55Gated Clocks for Low Power
- Clock gating is modeled with Enable signals which
can freeze the state of a register at the
architecture level - Enable Generators become gated clock buffers in
the physical design
Entered at the algorithmic/architecture level
56Analog/DSP/Protocol
Specification (C, Matlab, SDL)
Analog Data Processing
Protocols Control
Behavioral
Digital Data Processing
Behavioral/ Structural
VCC, Opnet, Telelogic, Stateflow
Stateflow Simulink
Matlab, Simulink
Structural
Unicad, Cadence, Synopsys
Spectre
ARMulator,ARM Compiler
Physical
HP EESoft ASITIC Cadence
ARM FPGA Express
Unicad Cadence, Power TimeMill