Title: MSc - Microprocessors
1- MSc - Microprocessors
- Dr. Konstantinos Tatas
- com.tk_at_fit.ac.cy
2Useful Information
- Instructor Lecturer K. Tatas
- Office hours TBA
- E-mail com.tk_at_fit.ac.cy
- http//staff.fit.ac.cy/com.tk
- Lecture periods/week 3
- Duration 10 weeks
- ECTS 7 (175 hours)
3Course Objectives
- By the end of the course students should be able
to - Evaluate the complex trade-offs involved in
embedded system design - Write detailed embedded system requirements and
specification documents - Write executable specifications using UML/SystemC
- Develop applications using ARM Developer Suite
- Write efficient ARM assembly and C programs in
ARM and Thumb mode - Analyze program performance using traces
- Use code transformations to improve
performance/code size/power consumption.
4Course Outline (1/2)
- Week 1 Introduction to embedded systems
Embedded microprocessor evolution Design
metrics and constraints (performance, power,
cost, time-to-market) and design optimization
challenges - Distributed and Real-time systems - Week2 Key embedded system technologies
Integrated Circuit technology Microprocessor
technology CAD tool technology Sensor
technology - Week 3 Embedded system specification and
modeling Object-oriented specification
(UML/C/SystemC) Assignment 1 - Week 4 Computer Architecture Instruction sets
RISC vs. CISC pipelining - The ARM
microprocessor architecture - ARM assembly ARM
mode Thumb mode - ARM and Thumb instruction set
- ARM conditional execution - Week 5 Processor I/O Serial I/O Busy/wait
I/O Interrupts Exceptions Traps ARM
memory mapped I/O - Caches Memory Management
Units Protection Units ARM cache and MMU
Assignment 2
5Course Outline (2/2)
- Week 6 Assignment 1
- Week 7 Programme design and analysis DFGs
CDFGs Compilers Assemblers Linkers Basic
compiler optimizations/code transformations
Measuring programme speed Trace-driven
performance analysis Energy optimization
programme size optimization - Week 8 Code transformations Loop unrolling
loop merging loop tiling performance
optimizing transformations - Week 9 Test
- Week 10 Assignment 2
6Course Assessment
- Final exam 40
- Coursework 60
- Assignment 1 15
- Assignment 2 15
- Quizzes 10
- Test 10
- Lab exercises 10
7References
- Books
- W. Wolf, Computers as Components
- W. Wolf, High-Performance Embedded Computing
- H. Kopetz, Real-Time Systems Design Principles
for Distributed Embedded Applications - S. Furber, ARM System-on-Chip Architecture
- P. Panda, Memory Issues in Embedded
Systems-on-Chip - F. Vahid and T. Givargis, Embedded System
Design A Unified Hardware/Software Introduction - F. Catthoor, Data Access and Storage Management
for Embedded Programmable Processors
8Microprocessors for Embedded systems
- Computing systems are everywhere
- Most of us think of desktop computers
- PCs
- Laptops
- Mainframes
- Servers
- But theres another type of computing system
- Far more common...
9Embedded systems overview
- Embedded computing systems
- Computing systems embedded within electronic
devices - Hard to define. Nearly any computing system other
than a desktop computer - Billions of units produced yearly, versus
millions of desktop units - Perhaps 50 per household and per automobile
Computers are in here...
and here...
and even here...
Lots more of these, though they cost a lot less
each.
10A short list of embedded systems
Anti-lock brakes Auto-focus cameras Automatic
teller machines Automatic toll systems Automatic
transmission Avionic systems Battery
chargers Camcorders Cell phones Cell-phone base
stations Cordless phones Cruise control Curbside
check-in systems Digital cameras Disk
drives Electronic card readers Electronic
instruments Electronic toys/games Factory
control Fax machines Fingerprint identifiers Home
security systems Life-support systems Medical
testing systems
Modems MPEG decoders Network cards Network
switches/routers On-board navigation Pagers Photoc
opiers Point-of-sale systems Portable video
games Printers Satellite phones Scanners Smart
ovens/dishwashers Speech recognizers Stereo
systems Teleconferencing systems Televisions Tempe
rature controllers Theft tracking systems TV
set-top boxes VCRs, DVD players Video game
consoles Video phones Washers and dryers
- And the list goes on and on
11Some common characteristics of embedded systems
- Single-functioned
- Executes a single program, repeatedly
- Tightly-constrained
- Low cost, low power, small, fast, etc.
- Reactive and real-time
- Continually reacts to changes in the systems
environment - Must compute certain results in real-time without
delay
12An embedded system example Digital camera
- Single-functioned -- always a digital camera
- Tightly-constrained -- Low cost, low power,
small, fast - Reactive and real-time -- only to a small extent
13Embedded Software Development Requires as
Much/More Design Effort Than Hardware
14A System-on-a-Chip Example
Courtesy Philips
15Design at a crossroadSystem-on-a-Chip
- Embedded applications where cost, performance,
and energy are the real issues! - DSP and control intensive
- Mixed-mode
- Combines programmable and application-specific
modules - Software plays crucial role
16Disciplines involved in Embedded System Design
- Digital System Design
- Software Design
- Analog/Mixed-Signal/RF System Design
- Operating Systems
- Microprocessors/Computer Architecture
- Verification
- Testing
- etc
17Languages traditionally used in Embedded System
Design
- Software design
- C/C
- Java
- Assembly
- Verification
- VHDL/Verilog
- SystemVerilog
- Tcl/tk
- Vera
- Specification/modeling
- UML
- SDL
- C/C
- Hardware design
- VHDL
- Verilog
18Design challenge optimizing design metrics
- Obvious design goal
- Construct an implementation with desired
functionality - Key design challenge
- Simultaneously optimize numerous design metrics
- Design metric
- A measurable feature of a systems
implementation - Optimizing design metrics is a key challenge
19Design challenge optimizing design metrics
- Common metrics
- Unit cost the monetary cost of manufacturing
each copy of the system, excluding NRE cost - NRE cost (Non-Recurring Engineering cost) The
one-time monetary cost of designing the system - Size the physical space required by the system
- Performance the execution time or throughput of
the system - Power the amount of power consumed by the system
- Flexibility the ability to change the
functionality of the system without incurring
heavy NRE cost
20Design challenge optimizing design metrics
- Common metrics (continued)
- Time-to-prototype the time needed to build a
working version of the system - Time-to-market the time required to develop a
system to the point that it can be released and
sold to customers - Maintainability the ability to modify the system
after its initial release - Correctness, safety, many more
21Design metric competition -- improving one may
worsen others
- Expertise with both software and hardware is
needed to optimize design metrics - Not just a hardware or software expert, as is
common - A designer must be comfortable with various
technologies in order to choose the best for a
given application and constraints
22Time-to-market a demanding design metric
- Time required to develop a product to the point
it can be sold to customers - Market window
- Period during which the product would have
highest sales - Average time-to-market constraint is about 8
months - Delays can be costly
23Losses due to delayed market entry
- Simplified revenue model
- Product life 2W, peak at W
- Time of market entry defines a triangle,
representing market penetration - Triangle area equals revenue
- Loss
- The difference between the on-time and delayed
triangle areas
24Losses due to delayed market entry (cont.)
- Area 1/2 base height
- On-time 1/2 2W W
- Delayed 1/2 (W-DW)(W-D)
- Percentage revenue loss (D(3W-D)/2W2)100
- Try some examples
- Lifetime 2W52 wks, delay D4 wks
- (4(326 4)/2262) 22
- Lifetime 2W52 wks, delay D10 wks
- (10(326 10)/2262) 50
- Delays are costly!
25The performance design metric
- Widely-used measure of system, widely-abused
- Clock frequency, instructions per second not
good measures - Digital camera example a user cares about how
fast it processes images, not clock speed or
instructions per second - Latency (response time)
- Time between task start and end
- e.g., Cameras A and B process images in 0.25
seconds - Throughput
- Tasks per second, e.g. Camera A processes 4
images per second - Throughput can be more than latency seems to
imply due to concurrency, e.g. Camera B may
process 8 images per second (by capturing a new
image while previous image is being stored). - Speedup of B over S Bs performance / As
performance - Throughput speedup 8/4 2
26Three key embedded system technologies
- Technology
- A manner of accomplishing a task, especially
using technical processes, methods, or knowledge - Three key technologies for embedded systems
- Processor technology
- IC technology
- Design technology
27Processor technology
- The architecture of the computation engine used
to implement a systems desired functionality - Processor does not have to be programmable
- Processor not equal to general-purpose
processor
Datapath
Controller
Datapath
Controller
Datapath
Controller
Control logic
index
Registers
Control logic and State register
Register file
Control logic and State register
total
Custom ALU
State register
General ALU
IR
PC
IR
PC
Data memory
Data memory
Program memory
Program memory
Data memory
Assembly code for total 0 for i 1 to
Assembly code for total 0 for i 1 to
Single-purpose (hardware)
General-purpose (software)
Application-specific
28Processor technology
- Processors vary in their customization for the
problem at hand
total 0 for i 1 to N loop total
Mi end loop
Desired functionality
General-purpose processor
Single-purpose processor
Application-specific processor
29General-purpose processors
- Programmable device used in a variety of
applications - Also known as microprocessor
- Features
- Program memory
- General datapath with large register file and
general ALU - User benefits
- Low time-to-market and NRE costs
- High flexibility
- Pentium the most well-known, but there are
hundreds of others
30Single-purpose processors
- Digital circuit designed to execute exactly one
program - a.k.a. coprocessor, accelerator or peripheral
- Features
- Contains only the components needed to execute a
single program - No program memory
- Benefits
- Fast
- Low power
- Small size
31Application-specific processors
- Programmable processor optimized for a particular
class of applications having common
characteristics - Compromise between general-purpose and
single-purpose processors - Features
- Program memory
- Optimized datapath
- Special functional units
- Benefits
- Some flexibility, good performance, size and power
32IC technology
- The manner in which a digital (gate-level)
implementation is mapped onto an IC - IC Integrated circuit, or chip
- IC technologies differ in their customization to
a design - ICs consist of numerous layers (perhaps 10 or
more) - IC technologies differ with respect to who builds
each layer and when
33IC technology Design Approaches
34Full-custom design
- All layers are optimized for an embedded systems
particular digital implementation - Placing transistors
- Sizing transistors
- Routing wires
- Benefits
- Excellent performance, small size, low power
- Drawbacks
- High NRE cost (e.g., 300k), long time-to-market
35The Custom Approach
Intel 4004
Courtesy Intel
36Transition to Automation and Regular Structures
Courtesy Intel
37(No Transcript)
38IC technology Design Approaches
IC Technology Implementation Approaches
Custom
Semicustom
Cell-based
Array-based
Standard Cells
Pre-diffused
Pre-wired
Macro Cells
Compiled Cells
(Gate Arrays)
(FPGA's)
39Semi-custom
- Lower layers are fully or partially built
- Designers are left with routing of wires and
maybe placing some blocks - Benefits
- Good performance, good size, less NRE cost than a
full-custom implementation (perhaps 10k to
100k) - Drawbacks
- Still require weeks to months to develop
40Cell-based Design (or standard cells)
Routing channel requirements are reduced by
presence of more interconnect layers
41Standard Cell Example
Brodersen92
42Standard Cell - Example
3-input NAND cell (from ST Microelectronics) C
Load capacitance T input rise/fall time
43IC technology Design Approaches
IC Technology Implementation Approaches
Custom
Semicustom
Cell-based
Array-based
Standard Cells
Pre-diffused
Pre-wired
Macro Cells
Compiled Cells
(Gate Arrays)
(FPGA's)
44Programmable Logic Devices
- All layers (diffusion, polysilicon, multi-
metal) may exist - Designers can purchase an IC
- Connections on the IC are either created or
destroyed to implement desired functionality - Field-Programmable Gate Array (FPGA) and recently
Gate Arrays are very popular - Benefits
- Low NRE costs, almost instant IC availability
- Drawbacks
- Bigger, expensive (perhaps 30 per unit), power
hungry, slower
45Gate Array Sea-of-gates
Uncommited Cell
Committed Cell(4-input NOR)
46Sea-of-gate Primitive Cells
Using oxide-isolation
Using gate-isolation
47Sea-of-gates
Random Logic
Memory Subsystem
LSI Logic LEA300K (0.6 mm CMOS)
48Prewired Arrays
- Classification of prewired arrays (or
field-programmable devices) - Based on Programming Technique
- Fuse-based (program-once)
- Non-volatile EPROM based
- RAM based
- Programmable Logic Style
- Array-Based
- Look-up Table
- Programmable Interconnect Style
- Channel-routing
- Mesh networks
49Altera MAX
From Smith97
50Altera MAX Interconnect Architecture
row channel
column channel
LAB
Array-based (MAX 3000-7000)
Mesh-based (MAX 9000)
51LUT-Based Logic Cell
4
C
....C
1
4
xx
xxxx
xxxx
xxxx
Bits
D
xxxx
4
control
Logic
xx
xx
D
xx
xx
function
x
x
3
xx
of
xx
D
2
xxx
D
1
Logic
xx
x
xx
function
x
x
of
x
x
xxx
F
4
Bits
xxxx
Logic
control
F
xx
xx
3
xx
function
xx
x
x
xx
F
of
xx
2
xxx
F
1
xx
xx
x
xxxxx
x
H
x
P
Multiplexer Controlled
Xilinx 4000 Series
by Configuration Program
52Array-Based Programmable Wiring
Vertical tracks
53Transistor Implementation of Mesh
Courtesy Dehon and Wawrzyniek
54RAM-based FPGA
Xilinx XC4000ex
55Design Technology
- The manner in which we convert our concept of
desired system functionality into an
implementation
Compilation/ Synthesis
Libraries/ IP
Test/ Verification
System specification
System synthesis
Hw/Sw/ OS
Model simulat./ checkers
Compilation/Synthesis Automates exploration and
insertion of implementation details for lower
level.
Behavioral specification
Behavior synthesis
Cores
Hw-Sw cosimulators
Libraries/IP Incorporates pre-designed
implementation from lower abstraction level into
higher level.
RT specification
RT synthesis
RT components
HDL simulators
Test/Verification Ensures correct functionality
at each level, thus reducing costly iterations
between levels.
Logic specification
Logic synthesis
Gates/ Cells
Gate simulators
To final implementation
56The co-design ladder
- In the past
- Hardware and software design technologies were
very different - Recent maturation of synthesis enables a unified
view of hardware and software - Hardware/software codesign
The choice of hardware versus software for a
particular function is simply a tradeoff among
various design metrics, like performance, power,
size, NRE cost, and especially flexibility there
is no fundamental difference between what
hardware or software can implement.
57Independence of processor and IC technologies
- Basic tradeoff
- General vs. custom
- With respect to processor technology or IC
technology - The two technologies are independent
58Design Decision Trade-offs
59Generalised Design Flow
60Architecture ReUse
- Silicon System Platform
- Flexible architecture for hardware and software
- Specific (programmable) components
- Network architecture
- Software modules
- Rules and guidelines for design of HW and SW
- Has been successful in PCs
- Dominance of a few players who specify and
control architecture - Application-domain specific (difference in
constraints) - Speed (compute power)
- Dissipation
- Costs
- Real / non-real time data
61Platform-Based Design
Only the consumer gets freedom of
choice designers need freedom from
choice (Orfali, et al, 1996, p.522)
- A platform is a restriction on the space of
possible implementation choices, providing a
well-defined abstraction of the underlying
technology for the application developer - New platforms will be defined at the
architecture-micro-architecture boundary - They will be component-based, and will provide a
range of choices from structured-custom to fully
programmable implementations - Key to such approaches is the representation of
communication in the platform model
SourceR.Newton
62Platform-based Design System-on-Chip
- Use of predefined Intellectual Property (IP)
- A platform-based system consists of a RISC
processor, memories, busses and a common language - Platform-based design poses the problem of
partitioning a solution between hardware (HDL)
and software (programming processors)
63Platforms Enable Simplified SoC Design
- Customer demands
- Fast turn-around time
- Easy access to pre-qualified building blocks
- Web enabled
- Design technology
- Core platforms
- Big IP
- Emerging SoC bus standards
- Embedded software
- HW/SW co-verification
64And Automation of IP Selection Integration
65Heterogeneous Programmable Platforms
FPGA Fabric
Embedded memories
Embedded PowerPc
Hardwired multipliers
Xilinx Vertex-II Pro
High-speed I/O
66Xilinxs products
67Xilinxs products
68Comparison of CMOS design methods
Design Method NRE Unit Cost Power Dissipation Complexity of Implementation Time-to-Market Performance Flexibility
µProcessor/DSP low medium high low low low high
PLA low medium medium low low medium low
FPGA low high medium medium medium medium medium
Gate/Array medium medium low medium medium medium medium
Cell Based high low low high high high low
Custom Design high low low high high Very high low
Platform Based high Low/medium low high Medium/low high medium
69Impact of Implementation Choices
70Design Economics (1)
- The selling price of an IC ?StotalCtotal/(1-m),
Ctotal is manufacturing cost for a single IC, m
desired profit margin - Costs for produce an IC
- Non-recurring engineering costs (NREs)
- Recurring engineering costs
- Fixed costs
71Design Economics (2)
- Non-recurring engineering costs (NREs)
- Engineering design cost
- Prototype manufacturing cost
- Recurring costs
- Process
- Package
- Test
72NRE and unit cost metrics
- Costs
- Unit cost the monetary cost of manufacturing
each copy of the system, excluding NRE cost - NRE cost (Non-Recurring Engineering cost) The
one-time monetary cost of designing the system - total cost NRE cost unit cost of
units - per-product cost total cost / of units
- (NRE cost / of units) unit cost
- Example
- NRE2000, unit100
- For 10 units
- total cost 2000 10100 3000
- per-product cost 2000/10 100 300
73NRE and unit cost metrics
- Compare technologies by costs -- best depends on
quantity - Technology A NRE2,000, unit100
- Technology B NRE30,000, unit30
- Technology C NRE100,000, unit2
- But, must also consider time-to-market
74Wafer and die cost
Die yield number of good dies/total number of
dies
75Example
- Assuming
- 20 engineers are employed full-time for a year
with a 50,000/year average salary - Additional 200,000 overhead costs of which
100,000 for total testing - A wafer cost of 200 per wafer
- A 2 packaging cost per chip
- 10 dies/wafer
- 70 die yield
- 98 final test yield
- A market for 100,000 items
- Calculate the minimum shelf price of the chip
76Design productivity exponential increase
100,000
10,000
1,000
100
Productivity (K) Trans./Staff Mo.
10
1
0.1
0.01
1981
1995
1997
2007
1983
1987
1989
1991
1993
1999
2001
2003
1985
2005
2009
- Exponential increase over the past few decades
77The growing design-productivity gap
Design Productivity Crisis (SRC 1997) Potential
Design Complexity and Designer Productivity
Moores Law Standard cell density and speed
Equivalent Added Complexity
58 / yr compounded Complexity Growth Rate
Density (Kgates / mm2)ASIC clock (MHz)
Logic Transistor per Chip ( M )
Productivity ( K) Trans./Staff Mo.
21 / yr compounded Productivity Growth Rate
78Design productivity gap
- 1981 leading edge chip required 100 designer
months - 10,000 transistors / 100 transistors/month
- 2002 leading edge chip requires 30,000 designer
months - 150,000,000 / 5000 transistors/month
- Designer cost increase from 1M to 300M
- While designer productivity has grown at an
impressive rate over the past decades, the rate
of improvement has not kept pace with chip
capacity
79The mythical man-month
- The situation is even worse than the productivity
gap indicates - In theory, adding designers to team reduces
project completion time - In reality, productivity per designer decreases
due to complexities of team management and
communication - In the software community, known as the mythical
man-month (Brooks 1975) - At some point, can actually lengthen project
completion time! (Too many cooks)
- 1M transistors, 1 designer5000 trans/month
- Each additional designer reduces for 100
trans/month - So 2 designers produce 4900 trans/month each
80Summary
- Embedded systems are everywhere
- Key challenge optimization of design metrics
- Design metrics compete with one another
- A unified view of hardware and software is
necessary to improve productivity - Three key technologies
- Processor general-purpose, application-specific,
single-purpose - IC Full-custom, semi-custom, PLD
- Design Compilation/synthesis, libraries/IP,
test/verification
81Real-time and distributed systems
82What is real-time? Is there any other kind?
- A real-time computer system is a computer system
where the correctness of the system behavior
depends not only on the logical results of the
computations, but also on the physical time when
these results are produced. - By system behavior we mean the sequence of
outputs in time of a system.
83Real-time means reactive
- A real-time computer system must react to stimuli
from its environment - The instant when a result must be produced is
called a deadline. - If a result has utility even after the deadline
has passed, the deadline is classified as soft,
otherwise it is firm. - If severe consequences could result if a firm
deadline is missed, the deadline is called hard. - Example Consider a traffic signal at a road
before a railway crossing. If the traffic signal
does not change to red before the train arrives,
an accident could result.
84Reliability
- The Reliability R(t) of a system is the
probability that a system will provide the
specified service until time t, given that the
system was operational at the beginning (t-t0) - The probability that a system will fail in a
given interval of time is expressed by the
failure rate, measured in FITs (Failure In Time).
- A failure rate of 1 FIT means that the mean time
to a failure (MTTF) of a device is 109 h, i.e.,
one failure occurs in about 115,000 years. - If a system has a constant failure rate of ?
failures/h, then the reliability at time t is
given by - R(t) exp(-?(t-to))
- MTTF 1/?
85Example
- What must be the system failure rate so that 99
of the systems in the field work reliably for the
first 100,000 hours?
86Safety
87Maintainability
88Name some hard, firm and soft deadline embedded
systems
89Example
- an automotive company produces 2,000,000
electronic engine controllers of a special type. - The following design alternatives are discussed
- (a) Construct the engine control unit as a single
SRU with the application software in Read Only
Memory (ROM).The production cost of such a unit
is 250. In case of an error, the complete unit
has to be replaced. - (b) Construct the engine control unit such that
the software is contained in a ROM that is placed
on a socket and can be replaced in case of a
software error. The production cost of the unit
without the ROM is 248. The cost of the ROM is
5. - (c) Construct the engine control unit as a single
SRU where the software is loaded in a Flash EPROM
that can be reloaded. The production cost of such
a unit is 255. - The labor cost of repair is assumed to be 50 for
each vehicle. (It is assumed to be the same for
each one of the three alternatives). - Calculate the cost of a software error for each
one of the three alternative designs if 300,000
cars have to be recalled because of the software
error (example in Sect. 1.6.1). - Which one is the lowest cost alternative if only
1,000 cars are affected by a recall?
90Distributed RT system model
- From the POV of an outside observer, a real-time
(RT) system can be decomposed into three
communicating subsystems - a controlled object (the physical subsystem, the
behavior of which is governed by the laws of
physics), - a distributed computer subsystem (the cyber
system, the behavior of which is governed by the
programs that are executed on digital computers) - a human user or operator
- The distributed computer system consists of
computational nodes that interact by the exchange
of messages. - A computational node can host one or more
computational components.
91Event-Triggered Control Versus Time-Triggered
Control
92(No Transcript)