Chapter 8 CPU and Memory: Design, Implementation, and Enhancement

About This Presentation
Title:

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement

Description:

... performance but no space available for them Modern CISC and RISC architectures are becoming similar VLIW Architecture Transmeta Crusoe CPU 128-bit instruction ... –

Number of Views:143
Avg rating:3.0/5.0
Slides: 39
Provided by: anvariNe68
Category:

less

Transcript and Presenter's Notes

Title: Chapter 8 CPU and Memory: Design, Implementation, and Enhancement


1
Chapter 8CPU and MemoryDesign, Implementation,
and Enhancement
  • The Architecture of Computer Hardware and Systems
    Software An Information Technology Approach
  • 3rd Edition, Irv Englander
  • John Wiley and Sons ?2003

2
CPU Architecture Overview
  • CISC Complex Instruction Set Computer
  • RISC Reduced Instruction Set Computer
  • CISC vs. RISC Comparisons
  • VLIW Very Long Instruction Word
  • EPIC Explicitly Parallel Instruction Computer

3
CISC Architecture
  • Examples
  • Intel x86, IBM Z-Series Mainframes, older CPU
    architectures
  • Characteristics
  • Few general purpose registers
  • Many addressing modes
  • Large number of specialized, complex instructions
  • Instructions are of varying sizes

4
Limitations of CISC Architecture
  • Complex instructions are infrequently used by
    programmers and compilers
  • Memory references, loads and stores, are slow and
    account for a significant fraction of all
    instructions
  • Procedure and function calls are a major
    bottleneck
  • Passing arguments
  • Storing and retrieving values in registers

5
RISC Features
  • Examples
  • Power PC, Sun Sparc, Motorola 68000
  • Limited and simple instruction set
  • Fixed length, fixed format instruction words
  • Enable pipelining, parallel fetches and
    executions
  • Limited addressing modes
  • Reduce complicated hardware
  • Register-oriented instruction set
  • Reduce memory accesses
  • Large bank of registers
  • Reduce memory accesses
  • Efficient procedure calls

6
CISC vs. RISC Processing
7
Circular Register Buffer
8
Circular Register Buffer- After Procedure Call
9
CISC vs. RISC Performance Comparison
  • RISC ? Simpler instructions
  • ? more instructions
  • ? more memory accesses
  • RISC ? more bus traffic and
  • increased cache memory misses
  • More registers would improve CISC performance but
    no space available for them
  • Modern CISC and RISC architectures are becoming
    similar

10
VLIW Architecture
  • Transmeta Crusoe CPU
  • 128-bit instruction bundle molecule
  • 4 32-bit atoms (atom instruction)
  • Parallel processing of 4 instructions
  • 64 general purpose registers
  • Code morphing layer
  • Translates instructions written for other CPUs
    into molecules
  • Instructions are not written directly for the
    Crusoe CPU

11
EPIC Architecture
  • Intel Itanium CPU
  • 128-bit instruction bundle
  • 3 41-bit instructions
  • 5 bits to identify type of instructions in bundle
  • 128 64-bit general purpose registers
  • 128 82-bit floating point registers
  • Intel X86 instruction set included
  • Programmers and compilers follow guidelines to
    ensure parallel execution of instructions

12
Paging
  • Managed by the operating system
  • Built into the hardware
  • Independent of application

13
Logical vs. Physical Addresses
  • Logical addresses are relative locations of data,
    instructions and branch target and are separate
    from physical addresses
  • Logical addresses mapped to physical addresses
  • Physical addresses do not need to be consecutive

14
Logical vs. Physical Address
15
Page Address Layout
16
Page Translation Process
17
Memory Enhancements
  • Memory is slow compared to CPU processing speeds!
  • 2Ghz CPU 1 cycle in ½ of a billionth of a
    second
  • 70ns DRAM 1 access in 70 millionth of a second
  • Methods to improvement memory accesses
  • Wide Path Memory Access
  • Retrieve multiple bytes instead of 1 byte at a
    time
  • Memory Interleaving
  • Partition memory into subsections, each with its
    own address register and data register
  • Cache Memory

18
Memory Interleaving
19
Why Cache?
  • Even the fastest hard disk has an access time of
    about 10 milliseconds
  • 2Ghz CPU waiting 10 millisecondswastes 20
    million clock cycles!


20
Cache Memory
  • Blocks 8 or 16 bytes
  • Tags location in main memory
  • Cache controller
  • hardware that checks tags
  • Cache Line
  • Unit of transfer between storage and cache memory
  • Hit Ratio ratio of hits out of total requests
  • Synchronizing cache and memory
  • Write through
  • Write back

21
Step-by-Step Use of Cache
22
Step-by-Step Use of Cache
23
Performance Advantages
  • Hit ratios of 90 common
  • 50 improved execution speed
  • Locality of reference is why caching works
  • Most memory references confined to small region
    of memory at any given time
  • Well-written program in small loop, procedure or
    function
  • Data likely in array
  • Variables stored together

24
Two-level Caches
  • Why do the sizes of the caches have to be
    different?

25
Cache vs. Virtual Memory
  • Cache speeds up memory access
  • Virtual memory increases amount of perceived
    storage
  • independence from the configuration and capacity
    of the memory system
  • low cost per bit

26
Modern CPU Processing Methods
  • Timing Issues
  • Separate Fetch/Execute Units
  • Pipelining
  • Scalar Processing
  • Superscalar Processing

27
Timing Issues
  • Computer clock used for timing purposes
  • MHz million steps per second
  • GHz billion steps per second
  • Instructions can (and often) take more than one
    step
  • Data word width can require multiple steps

28
Separate Fetch-Execute Units
  • Fetch Unit
  • Instruction fetch unit
  • Instruction decode unit
  • Determine opcode
  • Identify type of instruction and operands
  • Several instructions are fetched in parallel and
    held in a buffer until decoded and executed
  • IP Instruction Pointer register
  • Execute Unit
  • Receives instructions from the decode unit
  • Appropriate execution unit services the
    instruction

29
Alternative CPU Organization
30
Instruction Pipelining
  • Assembly-line technique to allow overlapping
    between fetch-execute cycles of sequences of
    instructions
  • Only one instruction is being executed to
    completion at a time
  • Scalar processing
  • Average instruction execution is approximately
    equal to the clock speed of the CPU
  • Problems from stalling
  • Instructions have different numbers of steps
  • Problems from branching

31
Branch Problem Solutions
  • Separate pipelines for both possibilities
  • Probabilistic approach
  • Requiring the following instruction to not be
    dependent on the branch
  • Instruction Reordering (superscalar processing)

32
Pipelining Example
33
Superscalar Processing
  • Process more than one instruction per clock cycle
  • Separate fetch and execute cycles as much as
    possible
  • Buffers for fetch and decode phases
  • Parallel execution units

34
Superscalar CPU Block Diagram
35
Scalar vs. Superscalar Processing
36
Superscalar Issues
  • Out-of-order processing dependencies (hazards)
  • Data dependencies
  • Branch (flow) dependencies and speculative
    execution
  • Parallel speculative execution or branch
    prediction
  • Branch History Table
  • Register access conflicts
  • Logical registers

37
Hardware Implementation
  • Hardware operations are implemented by logic
    gates
  • Advantages
  • Speed
  • RISC designs are simple and typically implemented
    in hardware

38
Microprogrammed Implementation
  • Microcode are tiny programs stored in ROM that
    replace CPU instructions
  • Advantages
  • More flexible
  • Easier to implement complex instructions
  • Can emulate other CPUs
  • Disadvantage
  • Requires more clock cycles
Write a Comment
User Comments (0)
About PowerShow.com