Embedded Software Architecture for Low Power - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Embedded Software Architecture for Low Power

Description:

A Hybrid CMOS/NAnoTUbe REconfigurable Architecture. Motivation. Background on CNT and NRAM ... Lack of a mature fabrication process. Defects and run-time failures ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 24
Provided by: tatke
Category:

less

Transcript and Presenter's Notes

Title: Embedded Software Architecture for Low Power


1
(No Transcript)
2
NATURE Non-Volatile Nanotube RAM based
Field-Programmable Gate Arrays
  • Wei Zhang, Niraj K. Jha and Li Shang Dept.
    of Electrical EngineeringPrinceton University
    Dept. of Electrical and Computer
    EngineeringQueens University

3
A Hybrid CMOS/NAnoTUbe REconfigurable Architecture
  • Motivation
  • Background on CNT and NRAM
  • Architecture of NATURE
  • Logic Folding
  • Experimental Results
  • Conclusions

4
Motivation
  • Moores Law Whats Next?
  • Carbon nanotubes (CNTs)
  • Nanowires
  • Single electron devices
  • ...
  • Challenges in nano-circuits/architectures
  • Lack of a mature fabrication process
  • Defects and run-time failures
  • Reconfigurable architectures, such as an FPGA,
    favored
  • Regular structures ease fabrication
  • Fault tolerance through reconfiguration

5
Motivation (Contd.)
  • Problems of existing reconfigurable architectures
  • High reconfiguration time overhead
  • Low area efficiency
  • Some recent works on programmable nanofabrics
  • Molecular logic array (Goldstein et al.
    ICCAD 2002)
  • Nanowire PLA (Dehon et al. FPGA 2004)
  • CMOS/nanowire hybrid architecture CMOL
    (Strukov et al. Nanotechnology 2005)
  • Fabrication problem not yet solved

6
Advantages of NATURE
  • Hybrid design leverages beneficial aspects of
    both CMOS and CNT technologies
  • NRAMs are distributed in NATURE to store
    multi-context reconfiguration bits
  • Fine-grain reconfiguration (even cycle-by-cycle)
  • Enables temporal logic folding
  • Flexibility to perform area-performance
    trade-offs
  • One-to-two orders of magnitude increase in logic
    density

CMOS fabrication compatible
NRAM-based
NATURE
Run-time reconfiguration
Temporal logic folding
Logic density
Design flexibility
7
Background
  • Carbon nanotube (CNT)
  • Metallic or semiconducting
  • Single-wall or multi-wall
  • Diameter 1-100nm
  • Length up to millimeters
  • Ballistic transport
  • Excellent thermal conductivity
  • Very high current density
  • High chemical stability
  • Robust to environment

Source Euronanotrade
8
Background (Contd.)
Source Nantero
  • Non-volatile nanotube random-access memory (NRAM)
  • Mechanically bent or not determines bistable
    on/off states
  • Fully CMOS-compatible manufacturing process
  • Prototype chip 10 Gbit NRAM
  • Will be ready for the market in the near future

9
NRAMs
  • Properties of NRAMs
  • Non-volatile
  • Similar speed to SRAM
  • Similar density to DRAM
  • Chemically and mechanically stable
  • NATURE not tied to NRAMs
  • Phase change RAM
  • Magnetoresistive RAM
  • Ferroelectric RAM

10
Architecture of NATURE
  • Island-style logic blocks (LBs) connected by
    various levels of interconnects
  • An LB contains a super macroblock (SMB) and a
    local switch matrix

11
Architecture of a Super Macroblock (SMB)
  • n1 macroblocks (MBs) comprise an SMB, here n1 4

12
Architecture of a Macroblock (MB)
  • n2 logic elements (LEs) comprise an MB, here n2
    4

13
Logic Element and Interconnect
  • An LE implements a computation and contains
  • An m-input look-up table (LUT)
  • A flip-flop
  • A pass transistor
  • Interconnect
  • Mixed wire segment scheme
  • 25, 50 and 25 distribution for length-1,
    length-4 and long wires
  • Direct links from one LB to its 4 neighbors

14
Support for Reconfiguration
  • Reconfiguration time short 160ps
  • Area overhead of NRAMs
  • k no. of reconfiguration sets per NRAM, assume k
    16
  • Area overhead 20.5 per LB, assuming 100nm
    technology for CMOS logic and nanotube length
  • Logic density k (conf. copies) x area per
    configuration 16(1-0.205)12.75
  • Appropriate value for k obtained through design
    space exploration

15
Temporal Logic Folding
  • Basic idea one can use NRAM-enabled run-time
    reconfiguration to realize different Boolean
    functions in the same logic element (LE) every
    few cycles

16
Example
Without logic folding
With logic folding
Num of LEs 2
Num of LEs 6
Delay 4clock_period
Delay 4 LE delays Interconnect delay
Clock period LE delay Reconfiguration Intercon
nect delay
17
Folding Levels
  • Logic folding can be performed at different
    levels of granularity, providing flexibility to
    perform area-performance trade-offs
  • A level-p folding implies reconfiguration of the
    LE after the execution of p LUT computations

(a) level-1 folding
(b) level-2 folding
18
Choosing the Folding Level
Clock period increases Routing delay
increases Number of clock cycles
decreases Reconfiguration time decreases
Total delay typically decreases
Folding level
Number of LEs increases
Area increases
  • Advantages of logic folding
  • Significant flexibility for performing
    area-performance trade-offs
  • Ability to map much larger circuits using the
    same number of LEs
  • Significant improvement in the area/circuit delay
    product
  • Reduction in the need for global routing

19
Experimental Setup
  • Instance of architecture 4 MBs in an SMB, 4 LEs
    in an MB, and LEs contain a 4-input LUT
  • Number of reconfiguration copies k varied in
    order to compare implementations corresponding to
    selected folding levels level-1, level-2,
    level-4 and no logic folding
  • Results based on 100nm CMOS technology parameters

20
Experimental Results
Average area-time product advantage 2X
Maximum area-time product advantage 3X
21
Experimental Results (Contd.)
16-RCA 16-bit ripple carry adder
16-CLA 16-bit carry lookahead adder 16-CSA
16-bit carry select adder 8-MUL
8-bit multiplier
Average area-time product advantage 13X
Maximum area-time product advantage 35X
22
Experimental Results (Contd.)
  • Flexibility in performing area-performance
    trade-off
  • For area-time (AT) product, larger the circuit
    depth, more the advantages of level-1 folding
    relative to no folding
  • For the 64-bit ripple-carry adder, this advantage
    is about 35X
  • LE utilization and logic density very high, with
    a reduced need for a deep interconnect hierarchy

23
Conclusions
  • NATURE A novel high-performance run-time
    reconfigurable architecture
  • Introduction of NRAMs into the architecture
    enables cycle-by-cycle reconfiguration and logic
    folding
  • Choice of different folding levels allows the
    flexibility of performing area-performance
    trade-offs
  • Logic density and area-time product improved
    significantly
  • Can be very useful for cost-conscious embedded
    systems and future FPGA improvement
Write a Comment
User Comments (0)
About PowerShow.com