11-May-04 <1> - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

11-May-04 <1>

Description:

Function of each pipeline stage and arbitrate unit. A Constructive Proof. Some Results ... Two arbitrations are made ... The Address Arbitration Unit ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 24
Provided by: qyz
Category:
Tags: arbitrate

less

Transcript and Presenter's Notes

Title: 11-May-04 <1>


1
A Distributed Colouring Algorithm for Control
Hazards in Asynchronous Pipelines
Qianyi Zhang School of Computer Science,
University of Birmingham (Supervisor Dr Georgios
Theodoropoulos)
2
Outline
  • Asynchronous Hardware
  • Handling the Control Hazards Problem
  • In a pipeline architecture
  • In asynchronous hardware
  • In a multistage asynchronous pipeline
  • A Generic Distributed Colouring Solution
  • Multi-colour vector
  • Function of each pipeline stage and arbitrate
    unit
  • A Constructive Proof
  • Some Results
  • Summary

3
The Problem of Synchronisation
  • Digital system
  • a collection of subsystems performing different
    computations and communicating to exchange
    information.
  • Before a communication transaction the
    subsystems need to synchronise wait for a
    common control state to be reached which
    guarantees the validity of data exchanged.
  • Synchronous Global clock defines the points in
    time when communication can take place (Time
    Driven)
  • Problems
  • Clock Skew
  • Power Skew
  • Modularity
  • Performance

4
Asynchronous Logic
  • An alternative digital design philosophy
  • It allows each sub-system to operates at its own
    rate and communicates with its peers only when
    it needs to exchange information.
  • The synchronisation is achieved by the
    communication protocol local request and
    acknowledge signals which provide information
    regarding the validity of data signals.

5
Asynchronous Logic
  • Asynchronous design techniques have been explored
    since1950s but failed to become mainstream
    difficulty to enforce specific orderings of
    operations and to deal with circuit hazards and
    dynamic states
  • Last decade has witnessed a resurgence of
    interest in Asynchronous Logic
  • Solution to clock skew problem No clock no
    skew!
  • Potential for low power Circuit components
    activated only when necessary
  • potential for higher performance lower power
    allows increased supply voltages average case
    optimisation
  • Potential for better technology migration
    Modularity
  • Better EMC generate low, uncorrelated
    Electro-Magnetic Interference
  • However, its also bring new problems
  • may result in a larger circuits REQ ACK
    signals
  • More difficult to design and understand their
    behaviour

6
Handling Control Hazards
  • The Control hazards problem in a pipelined
    architecture
  • Control hazards arise when an instruction such as
    a branch a jump, or the occurrence of an
    unpredictable event such as an exception, changes
    the flow of control.
  • In a pipeline architecture the prefetched
    instructions following a hazard must be removed
    from the pipeline before the new stream comes.
  • The processor must be able to distinguish between
    instructions originating from the branch or the
    exception target and instructions already
    prefetched

SUB 2, 1, 3 AND 3, 2, 4
OR 4, 1, 2 JR 25
SW 15, 100(2)
7
Handling Control Hazards
  • Control hazards in Synchronous vs Asynchronous
    Hardware
  • Synchronous the depth of prefetching is defined
    by the clock cycles and is therefore
    deterministic.
  • Asynchronous the exact number of the prefetched
    instructions is nondeterministic and therefore
    unpredictable the depth of the prefetching
    depends on the precise point that the branch or
    the exception takes place.

SUB 2, 1, 3 AND 3, 2, 4
OR 4, 1, 2 JR 25
SW 15, 100(2)
Need a new strategy !
8
Using Colour
AND 3, 2, 4 OR 4, 1, 2
JR 25
New stream
0 1
0
  • Technique devised for AMULET1 processor
    (Manchester)
  • When a control hazard occurs the colour of the
    processor changes
  • Each instruction address issued to memory,
    carries the latest operating colour of the
    processor which will be used to mark the
    corresponding fetched instruction.
  • The colour bit of an instruction which arrives at
    the datapath for execution, is compared with the
    current colour of the processor and if a match is
    not found, the instruction is discarded.

9
Control Hazard in Multiple Stages
  • One colour bit is not enough
  • How many colours we need?
  • How to arbitrate if more than
  • one stages send requests
  • simultaneously
  • Two basic observations
  • The state of the system is distributed
  • Stages that are deeper in the pipeline have
    higher priority than stages before them a
    control transfer event that occurs at a pipeline
    renders other events that may occur in pipeline
    stages earlier in the pipeline irrelevant and
    invalid, event if the latter precede the former
    in time.

10
A Generic Distributed Technique
  • A colour vector with priorities
  • One colour bit per stage
  • A vector C (c1, c2, c3, ,cn,)
  • in the set Cn, where C is the
  • set of colours C 0,1, n is
  • the number of stages in the
  • pipeline and ci is the colour
  • of the stage i.
  • Priority of ci gt Priority of cj, igtj
  • Two arbitrations are made
  • An Address Arbitration Unit (AAU) reject the
    invalid control hazard request
  • Each Stage discard the prefetched instructions
    following the hazard

S1
S2
S3
S4
11
A Generic Distributed Technique
  • The Address Arbitration Unit
  • Operates as an autonomous unit issuing to memory
    instruction addresses as they arrive from the
    Program Counter (normal operation) or from the
    pipeline stages (in the case a control hazard
    occurs).
  • Keeps a record of the colour state of the
    processor (vector c)
  • If a new transfer address arrives from stage Sk
  • If any higher priority colour bit (cj where jgtk)
    in the address is different than the
    corresponding colour bit of the AAU, rejects the
    address
  • Otherwise lets it through and updates own copy of
    vector c

12
A Generic Distributed Technique
  • Each stage Sk in the pipeline
  • Keeps a record of the colour state of the
    processor (vector c) which it reads from the
    instructions as they get through
  • For each new instruction that arrives
  • If any higher priority colour bit (cj where jgtk)
    in the instruction is different than the
    corresponding colour bit of the stage lets
    instruction through and
  • updates own copy of
  • vector c
  • Otherwise
  • If own colour bit different
  • rejects instruction
  • Otherwise executes
  • instruction

S1
S2
S3
S4
13
A Constructive Proof
(1)
AAU
000000000000000000000000 0000000000000000 000000
0000000000
0000
0100


0000
0000
0100
14

A Constructive Proof
(1)
AAU
0100

0100

0000
0100
0100
15
A Constructive Proof
(2)
AAU
000000000000000000000000 0000000000000000 000000
0000000000
000000000000000000000000 0100010001000100 000000
0000000000
000000000000000000000000 0100010001000100 000100
0100010001
0000
0100

0001



0000
0000
0001
0100
16

A Constructive Proof
(2)
AAU
000000000000000000000000 0100010001000100 000100
0100010001
0001




0001
0100
17

A Constructive Proof
(2)
AAU
000000000000000000000000 0100010001000100 000100
0100010001
0001


0100
0001


0001
0100
0001
18
A Constructive Proof
(2)
AAU
000000000000000000000000 0000000000000000 000000
0000000000
000000000000000000000000 0000000000000000 000100
0100010001
0000
0001




0000
0000
0001
0100
19

A Constructive Proof
(2)
AAU
000000000000000000000000 0000000000000000 000100
0100010001
0001


0001


0001
0100
0001
20
An Integrated Framework for Formal Verification
and Distributed Simulation of Asynchronous
HardwareEPSRC Project No. GR/S11091/01
GR/S11084/01
  • 380,000 - for 3 years starting April 2003
  • Objectives
  • Exploit compositionality of designs to enable
    automatic support for refinement checking,
    equivalence checking and deadlock detection.
  • Investigate applicability of data independence as
    a means to automate datapath abstraction and
    verification of parameterised component
    descriptions.
  • Investigate applicability of semi-formal
    techniques in the context of asynchronous
    hardware.
  • Develop algorithms and techniques for
    partitioning, load balancing, synchronisation and
    monitoring to support the distributed simulation.
  • Develop a prototype CSP-oriented integrated
    environment for the specification, distributed
    simulation and formal verification of
    asynchronous VLSI systems.
  • Develop test cases and conduct experiments to
    test and evaluate our approach.

21
Evaluation
  • Synthesisable asynchronous
  • implementation of MIPS R3000
  • processor core
  • Compatible Instruction Set with
  • R3000
  • 5-stage pipeline datapath
  • With precise exceptions
  • Balsa
  • a synthesis tool for Asynchronous Hardware,
    developed by AMULET group
  • A asynchronous hardware description language
    based on CSP
  • A discrete event simulator on RTL level
  • A compiler for gate level netlist

SAMIPS
22
Evaluation
  • The Balsa model of S1
  • The Balsa model of AAU

Cost comparison with SAMIPS
Cost Estimation of Stages
23
Summary and Future Work
  • A distributed colouring algorithm for dealing
    with control hazards in asynchronous pipeline
  • The main advantages
  • It provides flexibility in designing the pipeline
    of the processor, enabling perfeching at any
    depth
  • Low extra cost introduced in terms of silicon
    area
  • This approach has just been integrated to SAMIPS
    and proved correct in functionality.
  • We will evaluate the performance of this approach
    and the overhead it imposes in terms of time and
    power
Write a Comment
User Comments (0)
About PowerShow.com