Unit -4 Memory System Design - PowerPoint PPT Presentation

About This Presentation
Title:

Unit -4 Memory System Design

Description:

Unit -4 Memory System Design Memory System There are two basic parameters that determine Memory systems Performance Access Time: Time for a processor request to be ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 104
Provided by: syst3151
Category:

less

Transcript and Presenter's Notes

Title: Unit -4 Memory System Design


1
Unit -4Memory System Design
2
Memory System
  • There are two basic parameters that determine
    Memory systems Performance
  • Access Time Time for a processor request to be
    transmitted to the memory system, access a datum
    and return it back to the processor.( Depends on
    physical parameter like bus delay, chip delay
    etc.)
  • Memory Bandwidth Ability of the memory to
    respond to requests per unit of time. ( depends
    on memory system organization, No of memory
    modules etc.

3
Memory System Organization
4
Memory System Organization
  • No. of memory banks each consisting of no of
    memory modules, each capable of performing one
    memory access at a time.
  • Multiple memory modules in a memory bank share
    the same input and out put buses.
  • In one bus cycle, only one module with in a
    memory bank can begin or complete a memory
    operation.
  • Memory cycle time should be greater than the bus
    cycle time.

5
Memory System Organization
  • In systems with multiple processors or with
    complex single processors, multiple requests may
    occur at the same time causing bus or network
    congestion.
  • Even in single processor system requests arising
    from different buffered sources may request
    access to same memory module resulting in memory
    systems contention degrading the bandwidth.

6
Memory System Organization
  • The maximum theoretical bandwidth of the memory
    system is given by the number of memory modules
    divided by memory cycle time.
  • The Offered Request Rate is the rate at which
    processor would be submitting memory requests if
    memory had unlimited bandwidth.
  • Offered request rate and maximum memory bandwidth
    determine maximum Achieved Memory Bandwidth

7
Achieved vs. Offered Bandwidth
  • Offered Request Rate
  • Rate that processor(s) would make requests if
    memory had unlimited bandwidth and no contention

8
Memory System Organization
  • The offered request rate is not dependent on
    organization of memory system.
  • It depends on processor architecture and
    instruction set etc.
  • The analysis and modeling of memory system
    depends on no of processors that request service
    from common shared memory system.
  • For this we use a model where n simple processors
    access m independent modules.

9
Memory System Organization
  • Contention develops when multiple processors
    access the same module.
  • A single pipelined processor making n requests to
    memory system during a memory cycle resembles the
    n processor m modules memory system.

10
The Physical Memory Module
  • Memory module has two important parameters
  • Module Access Time Amount of time to retrieve a
    word into output memory buffer of the module,
    given a valid address in its address register.
  • Module Cycle Time Minimum time between requests
    directed at the same module.
  • Memory access Time is the total time for the
    processor to access a word in memory. In a large
    interleaved memory system it includes module
    access time plus transit time on bus, bus
    accessing overhead, error detection and
    correction delay etc.

11
Semiconductor Memories
  • Semiconductor memories fall into two categories.
  • Static RAM or SRAM
  • Dynamic RAM or DRAM
  • The data retention methods of SRAM are static
    where as for DRAM its Dynamic.
  • Data in SRAM remains in stable state as long as
    power is on.
  • Data in DRAM requires to be refreshed at regular
    time intervals.

12
DRAM Cell
Address Line
Capacitor
Ground
Data Line
13
SRAM Vs DRAM
  • SRAM cell uses 6 transistor and resembles flip
    flops in construction.
  • Data information remains in stable state as long
    as power is on.
  • SRAM is much less dense than DRAM but has much
    faster access and cycle time.
  • In a DRAM cell data is stored as charge on a
    capacitor which decays with time requiring
    periodic refresh. This increases access and cycle
    times

14
SRAM Vs DRAM
  • DRAM cells constructed using a capacitor
    controlled by single transistor offer very high
    storage density.
  • DRAM uses destructive read out process so data
    readout must be amplified and subsequently
    written back to the cell
  • This operation can be combined with periodic
    refreshing required by DRAMS.
  • The main advantage of DRAM cell is its small
    size, offering very high storage density and low
    power consumption.

15
Memory Module
  • Memory modules are composed of DRAM chips.
  • DRAM chip is usually organized as 2n X 1 bit,
    where n is an even number.
  • Internally chip is a two dimensional array of
    memory cells consisting of rows and columns.
  • Half of memory address is used to specify a row
    address, (one of 2 n/2 row lines)
  • Other half is similarly used to specify one of 2
    n/2 column lines.

16
A Memory Chip
17
Memory Module
  • To save on pinout for better overall density the
    row and column addresses are multiplexed on the
    same lines.
  • Two additional lines RAS (Row Address Strobe) and
    CAS (Column Address Strobe) gate first the row
    address and then column address into the chip.
  • The row and column address are then decoded to
    select one out of 2n/2 possible lines.
  • The intersection of active row and column lines
    is the desired bit of information.

18
Memory Module
  • The column lines signals are then amplified by a
    sense amplifier and transmitted to the out put
    pins Dout during a Read Cycle.
  • During a Write Cycle, the write enable signal
    stores the contents on Din at the selected bit
    address.

19
Memory Chip Timing
20
Memory Timing
  • At the beginning of Read Cycle, RAS line is
    activated first and row address is put on address
    lines.
  • With RAS active and CAS inactive the information
    is stored in row address register.
  • This activates the row decoder and selects row
    line in memory array.
  • Next CAS is activated and column address put on
    address lines.

21
Memory Timing
  • CAS gates the column address into column address
    register.
  • The column address decoder then selects a column
    line .
  • Desired data bit lies at the intersection of
    active row and column address lines.
  • During a Read Cycle the Write Enable is inactive
    ( low) and the output line D out is at high
    impedance state until its activated high or low
    depending on contents of selected location.

22
Memory Timing
  • Time from beginning of RAS until the data output
    line is activated is called the chip access time.
    ( t chip access).
  • T chip cycle is the time required by the row and
    column address lines to recover before next
    address can be entered and read or write process
    initiated.
  • This is determined by the amount of time that RAS
    line is active and minimum amount of time that
    RAS must remain inactive to let chip and sense
    amplifiers to fully recover for next operation.

23
Memory Module
  • In addition to memory chips a memory module
    consists of a Dynamic Memory Controller and a
    Memory Timing Controller to provide following
    functions.
  • Multiplex of n address bits into row and column
    address.
  • Creation of correct RAS and CAS signal lines at
    the appropriate time
  • Provide timely refresh to memory system.

24
Memory Module
p bits
n address bits
Memory Chip 2n x 1
Dynamic Memory Controller
n/2 address bits
D out
Memory Timing Controller
Bus Drivers
p bits
25
Memory Module
  • As memory read operation is completed the data
    out signals are directed at bus drivers which
    interface with memory bus, common to all the
    memory modules.
  • The access and cycle time of module differ from
    chip access and cycle times.
  • Module access time includes the delays due to
    dynamic memory controller, chip access time and
    delay in transitioning through the output bus
    drivers.

26
Memory Module
  • So in a memory system we have three access and
    cycle times.
  • Chip access and Chip cycle time
  • Module access and Module Cycle time
  • Memory (System) access and cycle time.
  • (Each lower item includes the upper items)

27
Memory Module
  • Two important features found on number of memory
    chips are used to improve the transfer rates of
    memory words.
  • Nibble Mode
  • Page Mode

28
Nibble Mode
  • A single address is presented to memory chip and
    the CAS line is toggled repeatedly.
  • Chip interprets this CAS toggling as mod 2w
    progression of low order column addresses.
  • For w2, four sequential words can be accessed at
    a higher rate from the memory chip.
  • 00 ---01----10-----11

29
Page Mode
  • A single row is selected and non sequential
    column addresses may be entered at a higher rate
    by repeatedly activating the CAS line
  • Its slower than nibble mode but has greater
    flexibility in addressing multiple words in a
    single address page
  • Nibble mode usually refers to access of four
    consecutive words. Chips that feature retrieval
    of more than four consecutive words call this
    feature as fast page mode

30
Error Detection and Correction
  • DRAM cells using very high density have very
    small size.
  • Each cell thus carries very small amount of
    charge to determine data state.
  • Chances of corruptions are very high due to
    environmental perturbations, static electricity
    etc.
  • Error detection and correction is thus intrinsic
    part of memory system design.

31
Error Detection and Correction
  • Simplest type of error detection is Parity.
  • A bit called parity bit is added to each memory
    word, which ensures that the sum of the number of
    1s in the word is even (or odd).
  • If a single error occurs to any bit in the word,
    the sum modulo 2 of the number of 1s in the word
    is inconsistent with parity assumption and word
    is known to have been corrupted.

32
Error Detection and Correction
  • Most modern memories incorporate hardware to
    automatically correct single errors ( ECC error
    correcting codes)
  • The simplest code of this type might consist of a
    geometric block code
  • The message bits to be checked are arranged in a
    roughly square pattern and each column and row is
    augmented with a parity bit.
  • If a row and column indicate a flaw when decoded
    at receiver end, then fault lies at the
    intersection bit which can be simply inverted for
    error correction.

33
Two Dimensional ECC
Row
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
C0 C1 C2 C3 C4 C5 C6 C7
Col
(Data)
Column Parity
P0 P1 P2 P3 P4 P5 P6 P7 P8
Row Parity
34
Error Detection and Correction
  • For 64 message bits we need to add 17 parity
    bits, 8 for each of the rows and column and one
    additional parity bit to compute parity on the
    parity row and column.
  • If failure is noted in a single row or a single
    column or multiple rows and columns then it is a
    case of multi bit failure and a non correctable
    state is entered.

35
Achieved Memory Bandwidth
  • Two factors have substantial effect on achieved
    memory bandwidth.
  • Memory Buffers Buffering should be provided for
    memory requests in the processor or memory system
    until the memory reference is complete. This
    maximizes requests made by the processor
    resulting in possible increase in achieved
    bandwidth.
  • Partitioning of Address Space The memory space
    should be partitioned in such a manner that
    memory references are equally distributed across
    memory modules.

36
Assignment of Address Space to m Memory Modules
m-1
0
1
2
m
m1
m2
2m-1
2m
2m1
2m2
3m-1
37
Interleaved Memory System
  • Partitioning memory space in m memory modules is
    based on the premise that successive references
    tend to be successive memory locations.
  • Successive memory locations are assigned to
    distinct memory modules.
  • For m memory modules an address x is assigned to
    a module x mod m.
  • This partitioning strategy is termed interleaved
    memory system and no of modules m is the degree
    of interleaving.

38
Interleaved Memory System
  • Since m is a power of two so x mod m results in
    memory module to be referenced, being determined
    by low order bits of the memory address.
  • This is called low order interleaving.
  • Memory addresses can also be mapped to memory
    modules by higher order interleaving
  • In higher order interleaving upper bits of memory
    address define a module and lower bits define a
    word in that module

39
Interleaved Memory System
  • In higher order interleaving most of the
    references tend to remain in a particular module
    whereas in low order interleaving the references
    tend to be distributed across all the modules.
  • Thus low order interleaving provides for better
    memory bandwidth whereas higher order
    interleaving can be used to increase the
    reliability of memory system by reconfiguring
    memory system.

40
Memory Systems Design
  • High performance memory system design is an
    iterative process.
  • Bandwidth and partitioning of the system are
    determined by evaluation of cost , access time
    and queuing requirements.
  • More modules provide more interleaving and more
    bandwidth, reduce queuing delay and improve
    access time.
  • But it increases system cost and interconnect
    network becomes more complex, expensive and
    slower.

41
Memory Systems Design
  • The Basic design steps are as follows
  • Determine number of memory modules and the
    partitioning of memory system.
  • Determine offered bandwidth. Peak instruction
    processing rate multiplied by expected memory
    references per instruction multiplied by number
    of processors.
  • Decide interconnection network Physical delay
    through the network plus delays due to network
    contention cause reduced bandwidth and increased
    access time. High performance time multiplexed
    bus or crossbar switch can reduce contention but
    increases cost.

42
Memory Systems Design
  • 4. Assess Referencing Behavior Program behavior
    in its sequence of requests to memory can be
  • - Purely sequential each request follows a
    sequence.
  • - Random requests uniformly distributed across
    modules.
  • -Regular Each access separated by a fixed
    number ( Vector or array references)
  • Random request pattern is commonly used in memory
    systems evaluation.

43
Memory Systems Design
  • 5. Evaluate memory model Assessment of Achieved
    Bandwidth and actual memory access time and the
    queuing required in the memory system in order to
    support the achieved bandwidth.

44
Memory Models
  • Nature of Processor
  • Simple Processor Makes a single request and
    waits for response from memory.
  • Pipelined Processor Makes multiple requests for
    various buffers in each memory cycle
  • Multiple Processors Each requesting once every
    memory cycle.
  • Single processor with n requests per memory cycle
    is asymptotically equivalent to n processors each
    requesting once every memory cycle.

45
Memory Models
  • Achieved Bandwidth Bandwidth available from
    memory system.
  • B (m) or B (m, n) Number of requests that are
    serviced each module service time Ts Tc , (m is
    the number of modules and n is number of requests
    each cycle.)
  • B (w) Number of requests serviced per second.
  • B (w) B (m) / Ts

46
Hellermans Model
  • One of the best known memory model.
  • Assumes a single sequence of addresses.
  • Bandwidth is determined by average length of
    conflict free sequence of addresses. (ie. No
    match in w low order bit positions where w log
    2 m m is no of modules.)
  • Modeling assumption is that no address queue is
    present and no out of order requests are
    possible.

47
Hellermans Model
  • Under these conditions the maximum available
    bandwidth is found to be approximately.
  • B(m) m
  • and B(w) m /Ts
  • The lack of queuing limits the applicability of
    this model to simple unbuffered processors with
    strict in order referencing to memory.

48
Streckers Model
  • Model Assumptions
  • n simple processor requests made per memory cycle
    and there are m modules.
  • There is no bus contention.
  • Requests random and uniformly distributed across
    modules. Prob of any one request to a particular
    module is 1/m.
  • Any busy module serves 1 request
  • All unserviced requests are dropped each cycle
  • There are no queues

49
Streckers Model
  • Model Analysis
  • Bandwidth B(m,n) is average no of memory requests
    serviced per memory cycle.
  • This equals average no of memory modules busy
    during each memory cycle.
  • Prob that a module is not referenced by one
    processor (1-1/m).
  • Prob that a module is not referenced by any
    processor (1-1/m)n.
  • Prob that module is busy 1-(1-1/m)n.
  • So B(m,n) average no of busy modules
  • m1 - (1 - 1/m)n

50
Streckers Model
  • Achieved memory bandwidth is less than the
    theoretical due to contention.
  • Neglecting congestion carried over from previous
    cycles results in calculated bandwidth to be
    still higher.

51
Processor Memory Modeling Using Queuing Theory
  • Most real life processors make buffered requests
    to memory.
  • Whenever requests are buffered the effect of
    contention and resulting delays are reduced.
  • More powerful tools like Queuing Theory are
    needed to accurately model processor memory
    relationships which can incorporate buffered
    requests.

52
Queuing Theory
  • A statistical tool applicable to general
    environments where some requestors desire service
    from a common server.
  • The requestors are assumed to be independent from
    each other and they make requests based on
    certain request probability distribution
    function.
  • Server is able to process requests one at a time
    , each independently of others, except that
    service time is distributed according to server
    probability distribution function.

53
Queuing Theory
  • The mean of the arrival or request rate (measured
    in items per unit of time) is called ?.
  • The mean of service rate distribution is called
    µ.( Mean service time Ts 1/µ )
  • The ratio of arrival rate (?) and service rate
    (µ) is called the utilization or occupancy of the
    system and is denoted by ?.(?/µ)
  • Standard deviation of service time (Ts)
    distribution is called s.

54
Queuing Theory
  • Queue models are categorized by the triple.
  • Arrival Distribution / Service Distribution /
    Number of servers.
  • Terminology used to indicate particular
    probability distribution.
  • M Poisson / Exponential c1
  • MB Binomial c1
  • D Constant c0
  • G General c arbitrary

55
Queuing Theory
  • C is coefficient of variance.
  • C variance of service time / mean service
    time.
  • s / (1/µ) sµ.
  • Thus M/M/1 is a single server queue with poisson
    arrival and exponential service distribution.

56
Queue Properties
N
Size
Q
?
µ
Ts
Tw
Time
T
57
Queue Properties
  • Average time spent in the system (T) consists of
    average service time(Ts) plus waiting time (Tw).
  • T Ts Tw
  • Average Q length ( including requests being
    serviced)
  • N ? T ( Littles formula).
  • Since N consists of items in the queue and an
    item in service
  • N Q ? (? is system occupancy or average no of
    items in service)

58
Queue Properties
  • Since N ?T
  • Q? ? (TsTw)
  • ? (1/µ Tw)
  • ?/µ ? Tw
  • ? ? Tw
  • Or Q ? Tw
  • The Tw (Waiting Time ) and Q (No of items
    waiting in Queue) are calculated using standard
    queue formulae for various type of Queue
    Combinations.

59
Queue Properties
  • For M/G/1 Queue Model
  • Mean waiting time Tw (1/?) ?2(1c2)/2(1-?)Mea
    n items in queue Q ? Tw ?2(1c2)/2(1-?)
  • For M/M/1 Queue Model C2 1
  • Tw (1/?) ?2/ (1-?)
  • Q ?2/(1-?)
  • For M/D/1 Queue Model C2 0
  • Tw (1/?) ?2/ 2(1-?)
  • Q ?2/2(1-?)

60
Queue Properties
  • For MB/D/1 Queue Model
  • Tw (1/?) (?2-p?)/2(1-?)
  • Q (?2-p?)/2(1-?)
  • For simple binomial p 1/m (Prob of processor
    making request each Tc is 1)
  • For d (Delta) binomial model p d /m where d is
    the probability of processor making request )

C2 0
61
Open, Closed and Mixed Queue Models
  • Open queue models are the simplest queuing form.
    These models assume
  • Arrival rate Independent of service rate
  • This results in a queue of unbounded length as
    well as a unbounded waiting time.
  • In a processor memory interaction, processors
    request rate decreases with memory congestion
    thus arrival rate is a function of total service
    time ( including waiting)

62
Open, Closed and Mixed Queue Models
  • This situation can be modeled by a queue with
    feedback

Qc
?a
?a
?0
µ

?0 - ?a
Such systems are called closed queue as they have
bounded size and waiting time
63
Open, Closed and Mixed Queue Models
  • Certain systems can behave as open queue up to a
    certain queue size and then behave as closed
    queues.
  • Such systems are called Mixed Queue systems

64
Open Queue ( Flores) Memory Model
  • Open queue model is not very suitable for
    processor memory interaction but its most simple
    model and can be used as initial guess to
    partition of memory modules.
  • This model was originally proposed by flores
    using M/D/1 queue but MB/D/1 queue is more
    appropriate.

65
Open Queue ( Flores) Memory Model
  • The total processor request rate ?s is assumed to
    split uniformly over m modules.
  • So request rate at module ? ?s /m
  • Since µ 1/Tc (Tc is memory cycle time)
  • So ? ? / µ (?s / m) . Tc
  • We can now use MB /D/1 model to determine Tw and
    Q0 (Per module buffer size)

66
Open Queue ( Flores) Memory Model
  • Design Steps
  • Find peak processor instruction execution rate in
    MIPS.
  • MIPS refrences / instruction MAPS
  • Choose m so that ? 0.5 and m2k ( k an integer)
  • Calculate Tw and Q0.
  • Total memory access time Tw Ta
  • Average open Q size m .Q0

67
Open Queue ( Flores) Memory Model
  • Example
  • Design a memory system for a processor with peak
    performance of 50 MIPS and one instruction
    decoded per cycle.
  • Assume memory module has Ta 200 ns and Tc 100
    ns. And 1.5 references per instruction.

68
Open Queue ( Flores) Memory Model
  • Solution
  • MAPS 1.5 50 75 MAPS
  • Now ? ?s / m Tc
  • So ? 75 x 106 x 1/m x 0.1 x 10 -6 7.5 /m
  • Now choose m so that ? 0.5
  • If m 16 then ? 0.47
  • For MB/D/1 model Tw 1/? (?2 ?p)/ 2(1-?)
  • Tc (? 1/m)/ 2 (1-?)
  • 38 ns

69
Open Queue ( Flores) Memory Model
  • Total memory access time Ta Tw 238 ns
  • Q0 ?2 ?p / 2 (1 ?) 0.18
  • So total mean Q size m x Q0 16 x .18 3

70
Closed Queues
  • Closed queue model assumes that arrival rate is
    immediately affected by service contention.
  • Let ? be the offered arrival rate and ?a is the
    achieved arrival rate.
  • Let ? is the occupancy for ? and ?a for ?a .
  • Now (? - ?a ) is the no of items in closed Qc.

71
Closed Queues
  • Suppose we have an n, m system in overall
    stability.
  • Average Q size (including items in service)
    denoted by N n/m and
  • closed Q size Qc n/m ?a ? ?a where ?a
    is achieved occupancy.
  • From discussion on open queue we know that
  • Average Q size N Q0 ?

72
Closed Queues
  • Since in closed Queue Achieved Occupancy is ?a,
    and for M/D/1, Q0 is ?2 /2(1- ?), so we have
  • N n/m ?a2 /2(1- ?a) ?a
  • Solving for ?a
  • we have ?a (1n/m) (n/m)2 1
  • Bandwidth B (m,n) m. ?a so
  • B (m,n) mn n2m2
  • This solution is called the Asymptotic Solution

73
Closed Queues
  • Since N n/m is the same as open Queue occupancy
    ?. We can say
  • ?a (1?) ?2 1
  • Simple Binomial Model While deriving asymptotic
    solution , we had assumed m and n to be very
    large and used M/D/1 model.
  • For small n or m the binomial rather than poisson
    is a better characterization of the request
    distribution .

74
Binomial Approximation
  • Substituting queue size for MB/D/1
  • N n/m (?a2 - p?a) / 2(1- ?a ) ?a
  • Since Processor makes one request per Tc
  • p 1/m ( prob of request to one module)
  • Substituting this and solving for ?a
  • ?a 1n/m 1/2m (1n/m-1/2m)2 -2n/m)
  • and B(m,n) m . ?a
  • B(m,n) mn-1/2 - (mn - 1/2)2 - 2mn

75
Binomial Approximation
  • Binomial approximation is useful whenever we have
  • Simple processor memory configuration ( a
    binomial arrival distribution)
  • n gt 1 and m gt 1.
  • Request response behavior where processor makes
    exactly n requests per Tc

76
The (d) Binomial Model
  • If simple processor is replaced with a pipelined
    processor with buffer ( I-buffer,register set ,
    cache etc) the simple binomial model may fail.
  • Simple binomial model can not distinguish between
    single simple processor making one request per Tc
    with probability 1, and two processors each
    making 0.5 requests per Tc.
  • In second case there can be contention and both
    processors may make request with varying
    probability.

77
The (d) Binomial Model
  • To correct this d binomial model is used.
  • Here the probability of a processor access during
    Tc is not 1 but d, so p d /m
  • Substituting this we get a more general
    definition
  • B(m,n,?) m n - ? /2 - (m n - ? /2)2 -2mn

78
The (d) Binomial Model
  • This model is useful in many processor designs
    where the source is buffered or makes requests on
    a statistical basis
  • If n is the mean request rate and z is the no. of
    sources, then ? n/z

79
The (d) Binomial Model
  • This model can be summarized as follows
  • Processor makes n requests per Tc.
  • Each processor request source makes a request
    with probability d.
  • Offered bandwidth per Tc Bw n/Tc m?
  • Achieved Bandwidth B(m,n,d) per Tc.
  • Achieved bandwidth per second
  • B(m,n,d) / Tc m ?a.
  • Achieved Performance ?a /? (offered
    performance)

80
Using the d- Binomial Performance Model
  • Assume a processor with cycle time of 40ns.
    Memory request each cycle are made as per
    following
  • Prob (IF in any cycle) 0.6
  • Prob (DF in any cycle) 0.4
  • Prob (DS in any cycle) 0.2
  • Execution rate is 1 CPI., Ta 120ns, Tc 120 ns
  • Determine Achieved Bandwidth / Achieved
    Performance (Assuming Four way Interleaving)

81
Using the d- Binomial Performance Model
  • M4, Compute n(Mean no of requests per Tc)
  • so n requests/per cycle x cycles per Tc
  • (0.60.40.2) x 120/40
  • 3.6 requests / Tc
  • Compute d z cp x Tc/ processor cycle time
  • Where cp is no of processor sources.
  • So z 3 x 120/40 9
  • So d n/z 3.6 /9 0.4

82
Using the d- Binomial Performance Model
  • Compute B(m,n,d)
  • B(m,n,?) m n - ? /2 - (m n - ? /2)2 -2mn
  • 2.3 Requests/ Tc
  • So processor offers 3.6 requests each Tc but
    memory system can deliver only 2.3. this has
    direct effect on processor performance.
  • Performance achieved 2.3/3.6 (offered Perf.)
  • At 1cpi at 40 ns cycle offered perf 25 MIPS.
  • Achieved Performance 2.3/3.6 (25) 16MIPS.

83
Comparison of Memory Models
  • Each model is valid for a particular type of
    processor memory interaction.
  • Hellermans model represents simplest type of
    processor. Since processor can not skip over
    conflicting requests and has no buffer, it
    achieves lowest bandwidth.
  • Streckers model anticipates out of order
    requests but no queues. Its applicable to
    multiple simple un buffered processors.

84
Comparison of Memory Models
  • M/D/1 open (Flores) Model has limited accuracy
    still it is useful for initial estimates or in
    mixed queue models.
  • Closed Queue MB/D/1 model represent a processor
    memory in equilibrium, where queue length
    including the item in service equals n/m on a per
    module basis.
  • Simple binomial model is suitable only for
    processors making n requests per Tc

85
Comparison of Memory Models
  • The d binomial model is suitable for simple
    pipelined processors where n requests per Tc are
    each made with probability d.

86
Review and Selection of Queuing Models
  • There are basically three dimensions to simple
    (single) server queuing models.
  • These three represent the statistical
    characterization of arrival Rate, Service rate
    and amount of buffering present before system
    saturates.
  • For arrival rate, if the source always requests
    service during a service interval, Use MB or
    simple binomial model.

87
Review and Selection of Queuing Models
  • If the particular requestor has diminishingly
    small probability of making a request during a
    particular service interval, use poisson arrival.
  • For service rate if service time is fixed , use
    constant (D) service distribution.
  • If service time varies but variance is unknown,
    (choose c21 for ease of analysis) use
    exponential (M) service distribution.

88
Review and Selection of Queuing Models
  • If variance is known and C2 can be calculated use
    M/G/1 model.
  • The third parameter determining the simple
    queuing model is amount of buffering available to
    the requestor to hold pending requests.

89
Processors with Cache
  • The addition of a cache to a memory system
    complicates the performance evaluation and
    design.
  • For CBWA caches, the requests to memory consists
    of line read and line write requests.
  • For WTNWA caches, its line read requests and word
    write requests.
  • In order to develop models of memory systems with
    caches two basic parameters must be evaluated

90
Processors with Cache
  1. T line access ,time it takes to access a line in
    memory.
  2. Tbusy , potential contention time (when memory is
    busy and processor/cache is able to make requests
    to memory)

91
Accessing a Line T line access
  • Consider a pipelined single processor system
    using interleaving to support fast line access.
  • Assume cache has line size of L physical words(
    bus word size) and memory uses low order
    interleaving of degree m.
  • Now if m gt L, the total time to move a line (for
    both read and write operations)
  • T line access Ta (L-1) T bus.
  • Where Ta is word access time T bus is bus cycle
    time.

92
Accessing a Line T line access
  • If L gt m, a module has to be accessed more than
    once so module cycle time Tc plays a role.
  • If Tc lt (m . T bus ), module first used will
    recover before it is to be used again so even for
    L gt m
  • T line access Ta (L-1)T bus
  • But for L gt m and Tc gt (m. T bus), memory cycle
    time dominates the bus transfer

93
Accessing a Line T line access
  • The line access time now depends on relationship
    between Ta and Tc and we can now use.
  • Tline access Ta Tc . ( (L/m) 1) T
    bus.((L-1) mod m).
  • The first word in the line is available in Ta,
    but module is not available again until Tc. A
    total of L/m accesses must be made to first
    module with first access being accounted for in
    Ta. So additional (L/m -1) cycles are required.

94
Accessing a Line T line access
  • Finally ((L-1) mod m) bus cycles are required for
    other modules to complete the line transfer.
  • If we have single module memory system (m1),
    with nibble mode or FPM enabled module. Assume v
    is the no of fast sequential acceses and Tv is
    the time between each access
  • T line access Ta Tc ((L/v) -1) (max (T bus
    ,Tv)(L-L/v).

Tv
Ta
Tc
95
Accessing a Line T line access
  • Now consider a mixed case ie mgt1 and nibble mode
    or FPM mode.
  • T line access Ta Tc(( L/m.v)-1)
  • Tbus (L-(L/m.v))

96
Computing T line access
  • Case 1 Ta 300ns, Tc200ns, m2, Tbus50 ns and
    L8.
  • Here we have Lgtm and Tc gt m.T bus
  • So T line acces Ta Tc((L/m) -1)Tbus ((L-1)
    mod m).
  • 300200(4-1)50(1) 950ns

97
Computing T line access
  • Case 2 Ta200ns, Tc150ns, Tv40ns,T bus 50 ns,
    L8, v4, m1.
  • T line access Ta Tc((L/v)-1) max(Tbus, Tv)(
    L-L/v).
  • 200 150((8/4 )-1) 50(8-(8/4))
  • 200 150 300
  • 650 ns

98
Computing T line access
  • Case 3 Ta200ns, Tc150ns, Tv50ns,T bus 25 ns,
    L16, v4, m2.
  • T line access Ta Tc((L/m.v)-1) (Tbus)(
    L-L/m.v).
  • 200 150((16/2.4 )-1) 25(16-(16/2.4))
  • 200 150 350
  • 700 ns

99
Contention Time Copy back Caches
  • In a simple copy back cache processor ceases on
    cache miss and does not resume until dirty line
    (w probability of dirty line) is written back to
    main memory and new line read into the cache.
  • The Miss time penalty thus is
  • T miss (1w) T line access

100
Contention Time Copy back Caches
  • Miss time may be different for cache and main
    memory.
  • Tc.miss Time processor is idle due to
    cache miss.
  • T m.miss Total time main memory takes to process
    a miss.
  • T busy T m.miss T c.miss Potential
    Contention time.
  • T busy is 0 for normal CBWA cache

101
Contention Time Copy back Caches
  • Consider a case when dirty line is written to a
    write buffer when new line is read into cache.
    When processor resumes dirty line is written back
    to memory from buffer.
  • T m.miss (1w) T line access.
  • T c.mis T line access
  • So T busy w. T line access.
  • In case of wrap around load.
  • T busy (1w) T line access - Ta

102
Contention Time Copy back Caches
  • If processor creates a miss during T busy we
    call additional delay as T interference.
  • T interference Expected number of misses during
    T busy.
  • No of requests during T busy x prob of miss.
  • ?p . T busy . F where ?p is processor
    request rate.
  • The delay factor given a miss during Tbusy is
    simply estimated as Tbusy /2
  • So T interference ?p .T busy. F. Tbusy/2

103
Contention Time Copy back Caches
  • T interference ?p . f . (Tbusy)2 / 2 and total
    miss time seen from processor
  • T miss T c.miss T interference. And Relative
    processor performance
  • Perf rel 1/ 1f ?p T miss
Write a Comment
User Comments (0)
About PowerShow.com