Title: Embedded System HW
1Embedded System HW
2Processor Technology
- General Purpose (software)
- Application Specific
- Single Purpose (Hardware)
- IC technology
- Full Custom/VLSI
- Semi-custom ASIC (gate-array, standard cell)
- PLD
3Custom single-purpose processors Hardware
4Outline
- Introduction
- Combinational logic
- Sequential logic
- Custom single-purpose processor design
- RT-level custom single-purpose processor design
- Read chapter 2 in Embedded System Design A
unified Hardware/Software Introduction, Frank
Vahid and Tony Givargis.
5Introduction
- Processor
- Digital circuit that performs a computation tasks
- Controller and datapath
- General-purpose variety of computation tasks
- Single-purpose one particular computation task
- Custom single-purpose non-standard task
- A custom single-purpose processor may be
- Fast, small, low power
- But, high NRE, longer time-to-market, less
flexible
6Custom single-purpose processor basic model
7Example greatest common divisor
- First create algorithm
- Convert algorithm to complex state machine
- Known as FSMD finite-state machine with datapath
- Can use templates to perform such conversion
(c) state diagram
(b) desired functionality
0 int x, y 1 while (1) 2 while
(!go_i) 3 x x_i 4 y y_i 5 while
(x ! y) 6 if (x lt y) 7
y y - x else 8
x x - y 9 d_o x
8State diagram templates
9Creating the datapath
- Create a register for any declared variable
- Create a functional unit for each arithmetic
operation - Connect the ports, registers and functional units
- Based on reads and writes
- Use multiplexors for multiple sources
- Create unique identifier
- for each datapath component control input and
output
10Creating the controllers FSM
- Same structure as FSMD
- Replace complex actions/conditions with datapath
configurations
11Splitting into a controller and datapath
go_i
Controller
!1
1
0000
1
!(!go_i)
2
0001
!go_i
2-J
0010
x_sel 0 x_ld 1
3
0011
y_sel 0 y_ld 1
4
0100
x_neq_y0
5
0101
x_neq_y1
6
0110
x_lt_y1
x_lt_y0
y_sel 1 y_ld 1
x_sel 1 x_ld 1
7
8
0111
1000
6-J
1001
5-J
1010
d_ld 1
9
1011
1-J
1100
12Controller state table for the GCD example
13Completing the GCD custom single-purpose
processor design
- We finished the datapath
- We have a state table for the next state and
control logic - All thats left is combinational logic design
- This is not an optimized design, but we see the
basic steps
14Summary
- Custom single-purpose processors
- Straightforward design techniques
- Can be built to execute algorithms
- Typically start with FSMD
- CAD tools can be of great assistance
15General-Purpose Processors Software
16Introduction
- General-Purpose Processor
- Processor designed for a variety of computation
tasks - Low unit cost, in part because manufacturer
spreads NRE over large numbers of units - Motorola sold half a billion 68HC05
microcontrollers in 1996 alone - ARM processors 1.5 billion processors
- Carefully designed since higher NRE is acceptable
- Can yield good performance, size and power
- Low NRE cost, short time-to-market/prototype,
high flexibility - User just writes software no processor design
- a.k.a. microprocessor micro used when they
were implemented on one or a few chips rather
than entire rooms
17Why use microprocessors?
- Alternatives field-programmable gate arrays
(FPGAs), custom logic, etc. (Custom
Single-purpose Processor or HW Logic) - Microprocessors are often very efficient can use
same logic to perform many different functions. - Microprocessors simplify the design of families
of products.
18The performance paradox
- Microprocessors use much more logic to implement
a function than does custom logic. - But microprocessors are often at least as fast
- heavily pipelined
- large design teams
- aggressive VLSI technology.
19Power
- Custom logic is a clear winner for low power
devices. - Modern microprocessors offer features to help
control power consumption. - Software design techniques can help reduce power
consumption.
20Basic Architecture
21Basic Architecture
- Control unit and datapath
- Note similarity to single-purpose processor
- Key differences
- Datapath is general
- Control unit doesnt store the algorithm the
algorithm is programmed into the memory
22Superscalar and VLIW Architectures
- Performance can be improved by
- Faster clock (but theres a limit)
- Pipelining slice up instruction into stages,
overlap stages - Multiple ALUs to support more than one
instruction stream - Superscalar
- Scalar non-vector operations
- Fetches instructions in batches, executes as many
as possible - May require extensive hardware to detect
independent instructions - VLIW each word in memory has multiple
independent instructions - Currently growing in popularity
- Relies on the compiler to detect and schedule
instructions
23Pipelining Increasing Instruction Throughput
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
Wash
Non-pipelined
Pipelined
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
Dry
Time
Time
non-pipelined dish cleaning
pipelined dish cleaning
Fetch-instr.
1
2
3
4
5
6
7
8
Decode
1
2
3
4
5
6
7
8
Fetch ops.
1
2
3
4
5
6
7
8
Pipelined
Execute
1
2
3
4
5
6
7
8
Instruction 1
Store res.
1
2
3
4
5
6
7
8
Time
pipelined instruction execution
24Two Memory Architectures
- Princeton
- Fewer memory wires
- Harvard
- Simultaneous program and data memory access
25Princeton vs. Harvard
- Harvard cant use self-modifying code.
- Harvard allows two simultaneous memory fetches.
- Most DSPs use Harvard architecture for streaming
data - greater memory bandwidth
- more predictable bandwidth.
26Cache Memory
- Memory access may be slow
- Cache is small but fast memory close to processor
- Holds copy of part of memory
- Hits and misses
27Application-Specific Instruction-Set Processors
(ASIPs)
28Application-Specific Instruction-Set Processors
(ASIPs)
- General-purpose processors
- Sometimes too general to be effective in
demanding application - e.g., video processing requires huge video
buffers and operations on large arrays of data,
inefficient on a GPP - But single-purpose processor has high NRE, not
programmable - ASIPs targeted to a particular domain
- Contain architectural features specific to that
domain - e.g., embedded control, digital signal
processing, video processing, network processing,
telecommunications, etc. - Still programmable
29Microprocessor varieties
- Microcontroller includes I/O devices, on-board
memory. - Digital signal processor (DSP) microprocessor
optimized for digital signal processing. - Typical embedded word sizes 8-bit, 16-bit,
32-bit.
30Embedded Processors
- ???? ????
- ??? ????????? ??
- ????????? ??? ????? ??
- CPU ??, ???, ?? ??, ?????? ??? ??? ???? ??? ????
??
Netsilicon NETARM Embedded Processor
31Many Types of Programmable Processors
- Past
- Microprocessor
- Microcontroller
- DSP
- Graphics Processor
- Now / Future
- Network Processor
- Sensor Processor
- Cryptoprocessor
- Game Processor
- Wearable Processor
- Mobile Processor
32A Common ASIP Microcontroller
- For embedded control applications
- Reading sensors, setting actuators
- Mostly dealing with events (bits) data is
present, but not in huge amounts - e.g., VCR, disk drive, digital camera (assuming
SPP for image compression), washing machine,
microwave oven - Microcontroller features
- On-chip peripherals
- Timers, analog-digital converters, serial
communication, etc. - Tightly integrated for programmer, typically part
of register space - On-chip program and data memory
- Direct programmer access to many of the chips
pins - Specialized instructions for bit-manipulation and
other low-level operations
33Another Common ASIP Digital Signal Processors
(DSP)
- For signal processing applications
- Large amounts of digitized data, often streaming
- Data transformations must be applied fast
- e.g., cell-phone voice filter, digital TV, music
synthesizer - DSP features
- Several instruction execution units
- Multiple-accumulate single-cycle instruction,
other instrs. - Efficient vector operations e.g., add two
arrays - Vector ALUs, loop buffers, etc.
34Trend Even More Customized ASIPs
- In the past, microprocessors were acquired as
chips - Today, we increasingly acquire a processor as
Intellectual Property (IP) - e.g., synthesizable VHDL model
- Opportunity to add a custom datapath hardware and
a few custom instructions, or delete a few
instructions - Can have significant performance, power and size
impacts - Problem need compiler/debugger for customized
ASIP - Remember, most development uses structured
languages - One solution automatic compiler/debugger
generation - e.g., www.tensillica.com
- Another solution retargettable compilers
- e.g., www.improvsys.com (customized VLIW
architectures)
35Reconfigurable SoC
Other Examples Atmels FPSLIC(AVR
FPGA) Alteras Nios(configurable RISC on a PLD)
36Selecting a Microprocessor
- Issues
- Technical speed, power, size, cost
- Other development environment, prior expertise,
licensing, etc. - Speed how evaluate a processors speed?
- Clock speed but instructions per cycle may
differ - Instructions per second but work per instr. may
differ - Dhrystone Synthetic benchmark, developed in
1984. Dhrystones/sec. - MIPS 1 MIPS 1757 Dhrystones per second (based
on Digitals VAX 11/780). A.k.a. Dhrystone MIPS.
Commonly used today. - So, 750 MIPS 7501757 1,317,750 Dhrystones
per second - SPEC set of more realistic benchmarks, but
oriented to desktops - EEMBC EDN Embedded Benchmark Consortium,
www.eembc.org - Suites of benchmarks automotive, consumer
electronics, networking, office automation,
telecommunications
37 Processors ??
Sources Intel, Motorola, MIPS, ARM, TI, and IBM
Website/Datasheet Embedded Systems Programming,
Nov. 1998
38Summary
- General-purpose processors
- Good performance, low NRE, flexible
- Controller, datapath, and memory
- Structured languages prevail
- But some assembly level programming still
necessary - Many tools available
- Including instruction-set simulators, and
in-circuit emulators - ASIPs
- Microcontrollers, DSPs, network processors, more
customized ASIPs - Choosing among processors is an important step
- Designing a general-purpose processor is
conceptually the same as designing a
single-purpose processor
39Instruction Sets
40RISC vs. CISC
- Complex instruction set computer (CISC)
- many addressing modes
- many operations.
- Reduced instruction set computer (RISC)
- load/store
- pipelinable instructions.
41CISC ????
- Intel ?? ????????? ?? ? ??
?? ???? ?? ????? ?? ??
1971 4004 2,250 ??? ? ???? ????, Busicom ???? ??
1972 8008 2,500 Mark-8?? ??, ??? ??? ???
1974 8080 5,000 Altair?? ??
1978 8086/8088 29,000 IBM-PC XT?? ??, ??? ????? ??
1982 80286 120,000 IBM-PC AT?? ??, 6?? ? 5??? ??
1985 80386 275,000 32?? ?? ??? ??
1989 80486 1,180,000 ?? ?? ???? ??
1993 Pentium 3,100,000 ??, ??? ?? ?? ??
1995 Pentium Pro 5,500,000 Dynamic Execution ?? ??
1997 Pentium 2 7,500,000 MMX ?? ??
1999 Pentium 3 24,000,000 SIMD ??, 12 ???? ?????
2001 Itanium 25,000,000 64??, Explicitly Parallel Instruction Computing(EPIC)
2002 Pentium 4 55,000,000 20 ???? ??? ?????, ??? ???
2003 Itanium 2 410,000,000 Machine Check Architecture, EPIC, 6MB L3 ??
42CISC - History Packaging?? ??
43CISC - History
44Instruction set characteristics
- Fixed vs. variable length.
- Addressing modes.
- Number of operands.
- Types of operands.
45ARM data processing Instruction Format(RISC)
Data processing immediate shift
Data processing register shift
Data processing 32-bit immediate
46Intel IA-32 Instruction Format (CISC)
47Programming model
- Programming model registers visible to the
programmer. - Some registers are not visible (IR).
48Multiple implementations
- Successful architectures have several
implementations - varying clock speeds
- different bus widths
- different cache sizes
- etc.
49ARM Architecture
- Advanced RISC Machines(1990)
- (ACORN and Apple Computer)
50ARM Architecture
- ARM versions.
- ARM assembly language.
- ARM programming model.
51ARM versions
- ARM architecture has been extended over several
versions. - We will concentrate on ARMv5
52Evolution of the ARM architecture versions
53ARMv6 Improvement
- Memory management
- Multiprocessing
- Multimedia support SIMD capability
54Evolution of the ARM architecture
ARM11
55Introduction
- To allow very small, yet high-performance
implementations - RISC
- Large uniform register file
- Load/store architecture
- Simple addressing modes
- Uniform and fixed-length instr fields
- Auto-increment and auto-decrement addr modes
- Conditional execution of all instrcutions
56Extension of the RISC rules
- High code density, low power, and small die size
- Variable cycle execution
- Multiple load and store
- Improve code density, reduce Ifs, and reduces
overall power consumption - Inline barrel shifter
- Conditional execution
- 16-bit Thumb instruction set
- Enhance DSP instructions
- 16X16 multiply, arithmetic saturation
- DSP-specific routines
57ARM assembly language
- Fairly standard assembly language
- LDR r0,r8 a comment
- label ADD r4,r0,r1
58Programming Model
59ARM data types
- Byte
- Halfword 16 bits
- Must be aligned to two-byte boundaries
- Word 32 bits
- Must be aligned to four-byte boundaries
- ARM addresses can be 32 bits long.
- Address refers to byte.
- Address 4 starts at byte 4.
- Can be configured at power-up as either little-
or bit-endian mode.
60Processor modes
- User usr Normal program execution modes
- FIQ fiq Supports a high-speed data transfer or
channel process - IRQ irq Used for general-purpose interrupt
handling - Supervisor svc A protected mode for OS
- Abort abt Implements VM and/or memory
protection - Undefined und Supports software emulation of
HW coprocessors - System sys Runs privileged OS tasks
- fiq, irq, svc, abt, und exception modes
61Registers
r0
r8
r1
r9
0
31
r2
r10
CPSR
r3
r11
r4
r12
r5
r13
r6
r14
r7
r15 (PC)
Link register
unbanked registers
banked registers
62(No Transcript)
63Endianness
- Relationship between bit and byte/word ordering
defines endianness
bit 31
bit 0
bit 0
bit 31
byte 3
byte 2
byte 1
byte 0
byte 0
byte 1
byte 2
byte 3
little-endian
big-endian
64ARM status bits
- Every arithmetic, logical, or shifting operation
may set CPSR (current program statues register)
bits - N (negative), Z (zero), C (carry), V (overflow).
- Examples
- -1 1 0 NZCV 0110.
- 231-11 -231 NZCV 0101.
65ARM data processing operand addressing
- Instruction syntax
- ltopcodegtltcondgtS ltRdgt, ltRngt, ltshifter-operandgt
- ltshifter-operandgt has 11 options
66Condition field
- Almost all ARM instrs. conditionally executed
67ARM data processing operand addressing
Data processing immediate shift
Data processing register shift
Data processing 32-bit immediate
68Shifter operand
- Immediate
- 8-bit constant and a 4-bit rotate (0,2,4,8,,30)
- mov r0, 0
- add r9, r9,1
- Register operand
- mov r2, r0
- Shifted register operand
- ASR, LSL, LSR, ROR, RRX (by one bit)
- mov r2, r0, LSL 2 shift r0 left by 2, write
to r2 (r2r0x4) - sub r10,r9,r8, LSR 4 r10 r9 - r8/16
- sov r10,r9,r8, ROR r3 r10 r9 - (r8 rotated by
value of r3)
69ARM data-processing
- AND
- EOR
- SUB Rd Rn - shifter operand
- RSB Rd shifter operand - Rn
- ADD
- ADC (with carry)
- SBC
- RSC (reverse SBC)
- TST update flags after Rn AND shifter operand
- TEQ
- CMP
- CMN copmare negated
- ORR (logical OR)
- MOV
- BIC
- MVN (mov not)
70ARM data-processing
- Shift, Rotate ? shifter-operand
- LSL, LSR logical shift left/right
- ASR arithmetic shift left/right
- ROR rotate right
- RRX rotate right extended with C
71Data operation varieties
- Logical shift
- fills with zeroes.
- Arithmetic shift
- fills with sign extension
- RRX performs 33-bit rotate, including C bit from
CPSR above sign bit.
72Load and Store instructions
- Two types
- 32-bit word or an 8-bit unsigned byte
- Load and store halfword and load signed byte
- Addressing modes
- Base register
- Any one of GPR (including the PC)
- Offset
- Three format
73Addressing modes
- Offset
- Immediate unsigned number (12 bits or 8 bits)
- Register GPR (not the PC)
- Scaled register shifted by an immediate value
- LSL, LSR, ASR, ROR, RRX
- Three ways to form the memory address
- EA Base register or Offset
- Offset
- Pre-indexed
- Post-indexed
74Addressing modes
- Base-plus-offset addressing
- LDR r0,r1,16
- Loads from location r116
- Pre-indexing increments base register
- LDR r0,r1,16!
- Post-indexing fetches, then does offset
- LDR r0,r1,16
- Loads r0 from r1, then adds 16 to r1.
75Load and store
- LDR
- LDRB
- LDRH
- LDRSB (signed byte)
- LDRSH (signed halfw)
76Examples
- LDR R1, R0 load R1 from the address in R0
- LDR R8, R3, 4 EA R3 4
- LDR R8, R3, -4 EA R3 4
- STRB R10, R7, -R4 EA R7 R4
- LDR R11, R3, R5, LSL 2 EA R3 (R5x4)
- LDR R3, R9, 4 EA R9, R9 R9 4
post-indexed - LDR R1, R0, 2 ! EA R02, R0R02
pre-indexed - LDR R0, PC, 40 load R0 from PC0x40 (
address of the instruction 8 0x40)
77Load and store multiple
- Addressing modes
- IA increment after
- IB increment before
- DA decrement after
- DB decrement before
78Load and store multiple
- LDM
- STM
- Examples
- LDMIA r0, r5 r8
load multiple r5-r8 from
the
address in r0 - STMDA r1!, r2, r5, r7 r9, r11
update r1
79Branch instructions
- Conditional branch forwards or backwards up to 32
MB - Sign-extending the 24-bit imm_data to 32 bits
- Shifting the result left two bits
- Adding this to the PC (the addr of branch 8)
- Approximately 32MB
- B, BL
80Examples
- B label
- BCC label branch if carry flag is clear
- BEQ label if zero flag is set
- MOV PC, 0 branch to location zero
- BL func subroutine call
- MOV PC,LR return
- MOV LR, PC
- LDR PC, func
81ARM ADR pseudo-op
- Cannot refer to an address directly in an
instruction. - Generate value by performing arithmetic on PC.
- ADR pseudo-op generates instruction required to
calculate address - ADR r1,FOO
82Examples
- start MOV r0, 10
- ADR r4, start gt SUB r4,pc,0xc
- start pc - 4 - 8 pc - 12 pc - 0xc
83Example C assignments
- C
- x (a b) - c
- Assembler
- ADR r4,a get address for a
- LDR r0,r4 get value of a
- ADR r4,b get address for b, reusing r4
- LDR r1,r4 get value of b
- ADD r3,r0,r1 compute ab
- ADR r4,c get address for c
- LDR r2r4 get value of c
84C assignment, contd.
- SUB r3,r3,r2 complete computation of x
- ADR r4,x get address for x
- STR r3r4 store value of x
85Example C assignment
- C
- y a(bc)
- Assembler
- ADR r4,b get address for b
- LDR r0,r4 get value of b
- ADR r4,c get address for c
- LDR r1,r4 get value of c
- ADD r2,r0,r1 compute partial result
- ADR r4,a get address for a
- LDR r0,r4 get value of a
86C assignment, contd.
- MUL r2,r2,r0 compute final value for y
- ADR r4,y get address for y
- STR r2,r4 store y
87Example C assignment
- C
- z (a ltlt 2) (b 15)
- Assembler
- ADR r4,a get address for a
- LDR r0,r4 get value of a
- MOV r0,r0,LSL 2 perform shift
- ADR r4,b get address for b
- LDR r1,r4 get value of b
- AND r1,r1,15 perform AND
- ORR r1,r0,r1 perform OR
88C assignment, contd.
- ADR r4,z get address for z
- STR r1,r4 store value for z
89Example if statement
- C
- if (a lt b) x 5 y c d else x c - d
- Assembler
- compute and test condition
- ADR r4,a get address for a
- LDR r0,r4 get value of a
- ADR r4,b get address for b
- LDR r1,r4 get value for b
- CMP r0,r1 compare a lt b
- BGE fblock if a gt b, branch to false block
90If statement, contd.
- true block
- MOV r0,5 generate value for x
- ADR r4,x get address for x
- STR r0,r4 store x
- ADR r4,c get address for c
- LDR r0,r4 get value of c
- ADR r4,d get address for d
- LDR r1,r4 get value of d
- ADD r0,r0,r1 compute y
- ADR r4,y get address for y
- STR r0,r4 store y
- B after branch around false block
91If statement, contd.
- false block
- fblock ADR r4,c get address for c
- LDR r0,r4 get value of c
- ADR r4,d get address for d
- LDR r1,r4 get value for d
- SUB r0,r0,r1 compute a-b
- ADR r4,x get address for x
- STR r0,r4 store value of x
- after ...
92Example Conditional instruction implementation
- true block
- MOVLT r0,5 generate value for x
- ADRLT r4,x get address for x
- STRLT r0,r4 store x
- ADRLT r4,c get address for c
- LDRLT r0,r4 get value of c
- ADRLT r4,d get address for d
- LDRLT r1,r4 get value of d
- ADDLT r0,r0,r1 compute y
- ADRLT r4,y get address for y
- STRLT r0,r4 store y
93Conditional instruction implementation, contd.
- false block
- ADRGE r4,c get address for c
- LDRGE r0,r4 get value of c
- ADRGE r4,d get address for d
- LDRGE r1,r4 get value for d
- SUBGE r0,r0,r1 compute a-b
- ADRGE r4,x get address for x
- STRGE r0,r4 store value of x
94Example FIR filter
- C
- for (i0, f0 iltN i)
- f f cixi
- Assembler
- loop initiation code
- MOV r0,0 use r0 for I
- MOV r8,0 use separate index for arrays
- ADR r2,N get address for N
- LDR r1,r2 get value of N
- MOV r2,0 use r2 for f
95FIR filter, cont.d
- ADR r3,c load r3 with base of c
- ADR r5,x load r5 with base of x
- loop body
- loop LDR r4,r3,r8 get ci
- LDR r6,r5,r8 get xi
- MUL r4,r4,r6 compute cixi
- ADD r2,r2,r4 add into running sum
- ADD r8,r8,4 add one word offset to array
index - ADD r0,r0,1 add 1 to i
- CMP r0,r1 exit?
- BLT loop if i lt N, continue
96Nested subroutine calls
- Nesting/recursion requires coding convention
- f1 LDR r0,r13 load arg into r0 from stack
- call f2()
- STR r14,r13! store f1s return adrs
- STR r0,r13! store arg to f2 on stack
- BL f2 branch and link to f2
- return from f1()
- SUB r13,4 pop f2s arg off stack
- LDR r15,r13! restore register and return
97Summary
- Load/store architecture
- Most instructions are RISCy, operate in single
cycle. - Some multi-register operations take longer.
- All instructions can be executed conditionally.
98MPC850
99Reference Manuals
- MPC850 Family User Manual
- PowerPC Programming Environment Manual
- Course Home Page http//calab.kaist.ac.kr/maeng/c
s310/micro02.htm - Motorola Home Page
- http//e-www.motorola.com
100Overview
- Versatile, one-chip, integrated communication
processor - Embedded PowerPC core
- Versatile memory controller
- Communication processor module (CPM)
- Serial communication controllers (SCCs)
- One USB
- Etc.
101(No Transcript)
102Embedded PowerPC core
- Single issue, 32-bit version
- Branch folding and prediction
- 2-K byte I-cache, 1K byte D-cache
- 2-way set-associative
- Physical
- MMUs with 8-entry TLBs
- 4K, 16K, 256K, 512K, and 8MB page sizes
103Other Features
- Dynamic data bus sizing 8-, 16-, 32-bit
- CPU clock 0-80MHz
- System Integration Unit (SIU)
- Memory Controller
- General Purpose timer
- CPM, SCCs, SMCs, etc.
104PowerPC Architecture
105PowerPC instruction set
- Overview
- Operand Conventions
- PowerPC Registers and programming model
- Addressing Modes
- Instruction Set
- Cache model
- Exception Model
- Memory management model
106PowerPC Architecture
- Motorola, IBM, Apple computer
- Power Architecture RS/6000 family
- 64-bit architecture with a 32-bit subset
- Three Levels of the architecture
- Flexibility degrees of SW compatibility
- UISA (User instruction set architecture)
- VEA (Virtual environment architecture)
- OEA (Operating environment architecture)
107Features not defined by the PowerPC Architecture
- For flexibility
- System bus interface signals
- Cache design
- The number and the nature of execution units
- Other internal micro-architecture issues
108Endianness
- Relationship between bit and byte/word ordering
defines endianness
bit 31
bit 0
bit 0
bit 31
byte 3
byte 2
byte 1
byte 0
byte 0
byte 1
byte 2
byte 3
little-endian
big-endian
PowerPC, IBM, Motorola
ARM, Intel
109Programming Model Registers
110(No Transcript)
111PowerPC programming model - Register Set
- User Model UISA (32-bit architecture)
Condition register
GPR0(32)
FGPR0(64)
CR(32)
GPR1(32)
FGPR1(64)
FP status and control register
GPR31(32)
FPSCR(32)
FGPR31(64)
XER register
Link register
Count register
CTR(64/32)
XER(32)
LR(64/32)
112Condition Registers (CR)
- For testing and branching
CR0
CR1
CR7
CR6
CR5
CR4
CR3
CR2
0
31
FP
Condition register CRn Field Compare Instruction
For all integer instrs. Bit0 Negative(LT) Bit1
Positive(GT) Bit2 Zero (EQ) Bit3 Summary
Overflow(SO)
back
113XER Register (XER)
back
114XER Register (XER), contd
115Link Register (LR), Count Register (CTR)
bclrx (bc to link register) Branch with link
update
116Counter Register
117VEA Register Set Time Base
118OEA Register Set
119Machine State Register (MSR)
120(No Transcript)
121(No Transcript)
122Addressing Modes
- Effective Address Calculation
- Register indirect with immediate index mode
- Register indirect with index mode
- Register indirect mode
123Register Indirect with Immediate Index Addressing
back
124Register Indirect with Index
back
125Register Indirect
back
126Instruction Formats
- 4 bytes long and word-aligned
- Bits 0-5 always specify the primary opcode
- Extended opcode
127Instruction set
- Integer
- Floating-point
- Load and store
- Flow control
- Processor control
- Memory synchronization
- Memory control
- External control
128Summary
- UISA, VEA, OEA
- Register set
- Fixed size instruction - RISC
- Load and store architecture
- 3 addressing modes
- Condition Register Update Rc field
- 8 condition registers
- Branch addressing modes
- BO, BI fields
- Relative, absolute, LR, CTR
129RISC Xscale Microarchitecture Features
- Arm Architecture Version 5TE ISA ??
- ??? ???(?? 400MHz)
- Modified Harvard Architecture
- instruction cache? data cache? ??(2 caches)
- 32KB Instruction Cache
- 32KB Data Cache
- Intel Media Processing Technology
- Instruction and Data Memory Management Unit
- Branch Target Buffer
- Debug Capability via JTAG Port
- 0.35µm 3 Layer metal CMOS, 2.6 million transistor
- 256 PBGA package (17 x 17mm)
130RISC Xscale System Integration Features
- Memory controller
- Power management controller
- Normal, idle, sleep mode ??
- USB client
- Multi channel DMA controller
- ????? ???? ??, ?? DMA ??
- LCD controller
- AC97 codec
- Multimedia card serial interface to standard
memory card, FIFO ?? - FIR communication ??? ?? ??
- Synchronous serial protocol port
- I2C
131RISC Xscale System Integration Features
- 85 GPIO ports
- irq, wake up interrupt ??
- UART
- Real-time clock and timer
- 32?? ???, 32.7kHz ????, ??? /- 5sec/mon
- OS timer with alarm register
- Pulse width modulation
- Interrupt controller
- ?? ??? ????? ???
132RISC XScale ???
133Internal Structure
134RISC - Xscale ??
- Palm size device - Example
135 PXA255 Pin
UDC-
L_DD(150)
Serial Channel 0 (USB)
UDC
L_FCLK
RXD_1
L_LCLK
LCDControl
Serial Channel 1
TXD_1
L_PCLK
RXD_2
L_BIAS
Intel? XScale PXA250 256-pins
Serial Channel 2 (IrDA)
TXD_2
GP(270)
GPIO Ports
RXD_3
nCAS/ DQM(30)
Serial Channel 3 (UART)
TXD_3
nRAS/ nSDCS(30)
TXD_C
nOE
RXD_C
nWE
Serial Channel 4(CODEC)
SFRM_C
nCS(50)
Memory Control
SCLK_C
RDY
BATT_FAULT
nSDRAS
VDD_FAULT
nSDCAS
Power Management
PWR_EN
SDCKElt10gt
SDCLKlt20gt
TCK_BYP
RD/nWR
Transceiver Control
TESTCLK
nPOE
PEXTAL
nPWE
PXTAL
nPIOR
nPIOW
TEXTAL
nPCElt21gt
PCMCIA Bus Signals
Clocks, Reset and Test
TXTAL
PSKTSEL
nPREG
nRESET
nPWAIT
nRESET_OUT
nIOIS16
SMROM_EN
Address Bus
Alt250gt
ROM_SEL
TCK
Dlt310gt
Data Bus
TDI
TDO
VDD
JTAG
TMS
VDDX
Supply
nTRST
VSS/VSSX
136RISC Xscale running modes
137 PXA255 Processor
- XScale Core
- 32Bit RISC
- 32Bit registers
- 32Bit instructions
- Longword aligned
- 32Bit datapaths
- 78 stage pipeline