Reconfigurable Computing Systems: An Overview - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Reconfigurable Computing Systems: An Overview

Description:

... (Synopsys), Handel C ... DK1 design suite (handel C) RC1000 plug-in card, ... Handel C. Source Files. Compile. Generate. EDIF (netlist) Generate. VHDL/Verilog ... – PowerPoint PPT presentation

Number of Views:177
Avg rating:3.0/5.0
Slides: 45
Provided by: deimosEos
Category:

less

Transcript and Presenter's Notes

Title: Reconfigurable Computing Systems: An Overview


1
Reconfigurable Computing SystemsAn Overview
  • Presented by
  • Gurwant Kaur Koonar
  • Vijay Pandya
  • 14th March 2003

2
Introduction
  • Reconfigurable Computing (RC) is an emerging
    paradigm for digital systems design. The key
    feature of which is the ability to perform
    computations in hardware to achieve performance
    of ASIC and flexibility of GP processors.
  • Technology improvements have made possible new
    programmable logic devices (FPGAs, CPLDs). 
  • Objective of the talk Give an overview and the
    hardware architectures of reconfigurable
    computing, and the software that targets these
    machines, such as compilation tools.

3
Definition
  • Reconfigurable Computing (RC) is a computing
    paradigm in which algorithms are implemented as a
    temporally and spatially ordered set of very
    complex tasks. These tasks are executed on a
    large set of interconnected programmable hardware
    elements

4
Definition(contd)
  • computing paradigm - defines the basic RC
    computing model without reference to
    implementation.
  • very complex tasks commonly referred to as
    configurations RC tasks require more time than
    general purpose computing instructions and more
    area than the typical general purpose execution
    unit.
  • Spatial and temporal partitioning algorithms
    are decomposed into tasks in both the space and
    time domains.
  • hardware elements - at their core RC devices
    consist of a very large set of simple
    programmable elements collectively called
    Reconfigurable Execution Unit (REU)

5
  • General Characteristics of RC
  • Stored configuration algorithms
  • No software
  • Pipeline architectures are common
  • Real-time applications
  • Advantages
  • Flexible
  • Configurable
  • Cost comparable to GPP
  • Hardware is readily available
  • Shorter development cycle than ASICs
  • Parallelism
  • Algorithm parallelism exploited in custom
    architecture
  • Problem specific operators and control
  • High-performance
  • Reduced memory dependence and exploit
    fine-grained algorithm parallelism.
  • Timesharing
  • Hardware can be time multiplexed by multiple
    applications

6
Disadvantages
  • Additional area requirements
  • Configuration memory (internal/external),
    Internal switches and other hardware overhead
  • Time Overhead
  • Device configuration, and internal switches

7
Traditional Computing
  • Using Application-Specific Integrated Circuits
    (ASICs) to hard-wire an algorithm in hardware. 
  • Extremely fast
  • Require less Silicon area
  • Less power hungry than GP architectures
  • Extremely inflexible
  • Expensive both in design and fabrication
  • Errors are difficult to correct
  • ExamplesConsumer Electronics, Telecommunications,
    Automotive Industry 

8
Traditional Computing(Cont'd)
  • General-purpose hardware, combined with
    application-specific software
  • Extremely flexible due to versatile instruction
    set.
  • Much less expensive to develop.
  • Poor performance compared to ASICs.
  • Errors can be dynamically patched.
  • Examples Commodity PC hardware running
    commercial software. 

9
Reasons for Poor Software Performance 
  • Fetching of instructions
  • Interpretation of instructions
  • Scheduling of instructions
  • Wrong mix of hardware resources to suit a
    particular applications needs
  • Therefore Reconfigurable computing is intended to
    fill the gap between HW and SW.

10
Flexibility and Efficiency Tradeoffs
11
Can we call FPGAs to be Reconfigurable
Processing unit ?
  • Traditional FPGAs are configurable, but not
    run-time reconfigurable
  • Traditional FPGAs expect to read their
    configuration out of a serial EEPROM, one bit at
    a time.
  • Therefore, FPGA must be reprogrammed in its
    entirety and that its previous internal state
    cannot be captured beforehand.

12
Features for Reconfigurable Hardware
  • On-the-Fly Reprogrammability
  • Partial Reprogrammability
  • Externally-Visible Internal State

13
Kress ALU Array-III(KrAA-III)
  • instruction level parallelism
  • transparently scalable
  • fast routing and placement (seconds only)
  • dynamically and partially reconfigurable
    (microseconds)
  • suitable for full custom design
  • on microprocessor chip much higher acceleration
    than by caches
  • on microprocessor chip fast and low power by
    full custom design
  • acceleration by massive run time to compile time
    migration

14
Kress ALU Array-III(KrAA-III)
  • KrAA-III consists of PEs called rDPU-III
    (reconfigurable DataPath Unit III) arranged in a
    NEWS network.
  • Figure shows the KrAAIII chip containing 9
    rDPUs.

15
Basic Architecture of todays commercial
reconfigurable processor
16
Devices which combined FPGA with Standard
processor core
  • Triscends E5 and A7
  • Alteras two Excalibur families
  • Atmels FPSLIC
  • Chameleon Systems CS2000

17
Zippy Architecture
  • It is used to develop reconfigurable processor
    technology for domain of handheld and wearable
    computing.
  • To investigate new trade offs between
    performance, power consumption and system cost
  • It is an international research effort lead by
    Swiss Federal Institute of Technology

18
Reconfigurable Computing Merging Efficiency and
Versatility
19
Hardware Design steps
20
ExamplesSPLASH IIMulti FPGA parallel computer
with orchestrated systolic communications to
perform inter- FPGA data transfer
21
GarpFor general purpose loop acceleration
22
CMC Rapid Prototyping Platform
23
RC Applications
  • RC has demonstrated gt10x performance density
    advantage over microprocessors and DSPs
  • Pattern matching
  • Data encryption
  • Data compression
  • Video and image processing
  • Commercial Push
  • Handheld devices - PDAs, mobile Phones,
    specialized tools
  • Networks - telecom switches, network routers,
    network bridges
  • High-performance Computing super computers,
    medical appliances, robot navigation and planning
  • Defense Ballistic Missiles, KV navigation,
    Spacecraft processing

24
RC Implementations
  • Hardware
  • Catalina Research Incorporated -
    http//www.catalinaresearch.com/Chameleon
  • Annapolis Microsystems - http//www.annapmicro.com
    /Wildstar
  • Alpha Data Parallel Systems - http//www.alpha-dat
    a.com
  • Tools
  • Celoxica - http//www.celoxica.com
  • Star Bridge Systems - http//www.starbridgesystem
    s.com
  • Annapolis Microsystems - http//www.annapmicro.co
    m/CoreFire

25
Content
  • Coupling Approaches (Reconfigurable Hardware with
    General Processor)
  • Granularity of the FPGA as an RCS
  • Implementation Approaches
  • Compile Time Reconfiguration
  • Run Time Reconfiguration
  • Some more advantages
  • Challenges
  • Software like Design environment

26
Coupling Approaches for Reconfigurable Hardware
(RH)
  • RH can be coupled to GP as
  • A functional unit (Tight Coupling)
  • A Co-processor
  • An Attached processing unit
  • A Standalone processing unit (Loosely coupled)

27
Coupling Approaches Contd
  • As a Functional Unit
  • Within a host processor (General purpose GP)
  • Uses data-path of a host machine
  • As a Coprocessor
  • Without constant supervision of the GP
  • GP initializes the RH
  • Independent parallel computation
  • Less communication overhead

28
Coupling Approaches Contd
  • As an attached processing unit
  • Behaves as an additional processor
  • Memory Cache not visible
  • Independent Computation but high communication
    overhead
  • As a Standalone
  • The most loosely coupled to GP
  • Infrequent Communication with the GP
  • Independent computation for very long period of
    time

29
Different levels of coupling
Workstation
Attached Processing Unit
Coprocessor
Standalone Processing Unit
I/O Interface
CPU
Memory Caches
FU
30
Pros and Cons of different coupling approaches
  • The tight integration
  • Very less communication overhead
  • RH can not operate alone for long period of
    time
  • Amount of Reconfig. Logic is limited
  • The loose integration
  • Greater parallelism
  • Higher communication overhead

31
Logic Block Granularity
  • Referred to the size and complexity of the CLB
  • Fine grained logic block
  • Less complex, Altera Flex 10k consists of single
    4 input LUT with flip-flop
  • Useful for bit-level manipulation
  • Exceed the performance of GP in case of operation
    on variable bit data width
  • Smaller area, high amount of computation
    (Compact)
  • Encryption and image processing application

32
Logic Block Granularity contd
  • Coarse grained logic block
  • Larger granularity of the CLB
  • Helps perform more complex operations
  • Four 2-bit inputs (GARP) and multiplier in each
    logic block for 4 x 4 multiplication
  • Finite State Machine
  • Word-width (16 bit) data path circuits
    implementation in Very coarse-grained structure
  • Logic block closer to small processor

33
Implementation Approaches
  • Compile Time Reconfiguration (CTR)
  • Static implementation strategy
  • Single system wide configuration
  • Configuration doesnt change during computation
  • Similar to using ASIC for application
    acceleration
  • Run Time Reconfiguration (RTR)
  • Dynamic implementation strategy
  • Multiple time-exclusive configurations
  • Dynamic hardware allocation (run-time)

34
RTR
  • Main Task Dividing algorithm into time-exclusive
    segments
  • Global RTR
  • Allocates whole FPGA resources for each
    configuration
  • Single system wide configuration for each phase
  • Local RTR
  • Locally reconfigure subsets of logic at run-time
  • Partial reconfiguration, flexibility
  • Functional division of labor

35
RTR Contd
Global RTR
EXE. A
LOAD B
EXE. B
LOAD C
EXE. C
LOAD A
Local RTR
A
A
D
EXE.
EXE.
B
C
36
Implementation Issues
  • Temporal partitions a iterative process
  • Possibly inefficient usage of FPGA resources in
    global RTR
  • Simulation
  • Efficient usage of hardware in local RTR
  • Current CAD tools poor match for local RTR
  • (Examples of Local RTR RRANN-2 and DISC )

37
Power Savings in RC system
  • Exploitation of numerical properties of an
    application
  • Higher number of operations per clock due to deep
    pipelines
  • Sensor/actuator pre-conditioning and glue logic
    functions on chip

38
Some Challenges
  • Access to the development of RCS restricted to
    hardware developers
  • Run-time environment, RTR scheduling
  • Difficulties in routing for RC hardware having
    large number of CLBs
  • Connection scheme in multi-FPGA system

39
Software Aspect
  • Software like design environment
  • System C (Synopsys), Handel C (Celoxica)
  • Hardware-Software co-design (ARM Rapid
    Prototyping Platform (RPP)
  • Generation of detail gate level description
    (netlist) by HLL (High level language)
  • Technology mapping, Placement and Routing
  • Generation of .bit files (language of the FPGA)

40
Software Aspect Contd
  • Programming language/HDL
  • SoC consists 50 to 90 software
  • Wide acceptability of C/C
  • Simulation timing
  • Simulation takes long time in current CAD tools
  • C/C debugger very efficient

41
RC1000 Celoxica platform
  • DK1 design suite (handel C)
  • RC1000 plug-in card, PCI bus interfacing
  • Xilinx Virtex-1000 FPGA (1 million gates)
  • Design Flow

Handel C Source Files
Generate VHDL/Verilog
Simulate netlist
Compile
Generate EDIF (netlist)
Place Route Tools
Generation BitStream
42
Hardware-Software Co-design
  • Amdahls Law
  • T 1
  • (1 a) a / s
  • T Overall speedup
  • a Fraction of the original program that could
    be enhanced by transferring to h/w
  • s Speedup obtained for particular fraction of
    program

43
Summary
  • RCS to bridge the gap between Software and
    hardware (flexibility and performance)
  • FPGA ideal candidate for an RH
  • Spatial Execution
  • Reprogrammability
  • Design time
  • Design and synthesis flow for CAD tools
  • Hybrid Architecture
  • Recent advancement in CAD tools

44
Questions?????????????
Write a Comment
User Comments (0)
About PowerShow.com