CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications

About This Presentation
Title:

CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications

Description:

CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer Science –

Number of Views:92
Avg rating:3.0/5.0
Slides: 37
Provided by: Sangye4
Category:

less

Transcript and Presenter's Notes

Title: CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications


1
CA-RAMA High-Performance Memory Substrate for
Search-Intensive Applications
  • Sangyeun Cho, J. R. Martin, R. Xu,
  • M. H. Hammoud and R. Melhem

Dept. of Computer Science University of Pittsburgh
2
Search ops in applications
  • Search (or lookup) operations represent an
    important common function
  • Network packet processing
  • For each arriving packet, determine the output
    port
  • Given packet information, find a matching
    classification rule
  • Each look up can incur many memory accesses
  • Speech recognition
  • Searching (e.g., dictionary lookup) takes up 24
    of CPU cycles
  • Forthcoming RMS (Recognition, Mining, and
    Synthesis) apps

3
Search performance and power
  • Search performance must match increasing line
    speeds
  • For OC-768, up to 104M packets must be processed
    per second
  • Network traffic has doubled every year
    McKeown03
  • Routing tables (200K prefixes in a core router)
    are growing RIS
  • IPv6
  • Power and thermal issue already a critical
    limiting factor in network processing device
    design McKeown03
  • Search in battery-operated devices should be
    energy-efficient
  • Conventional search solutions
  • Software methods (tries, hash table, )
  • Hardware methods (CAM, TCAM, )

4
IP lookup using a trie
? Consider an IP address 0 1 0 0 0 1 1 0
  • ? Software approach is flexible
  • ? high memory capacity requirement
  • high memory bandwidth requirement
  • ? not SCALABLE

5
IP lookup using TCAM
? Consider an IP address 0 1 0 0 0 1 1 0
110100 110101 110111 01000 01100 01101 11011
0100 0110 1101 10 0
  • ? high bandwidth, constant time lookup
  • ? TCAMs are relatively small, expensive
  • power consumption very high
  • ? not SCALABLE

choose the first among the matched
6
CA-RAM a hybrid approach
  • Can we do better than the existing conventional
    schemes?
  • CAM-like search performance
  • RAM-like cost and power
  • CA-RAM combines hashing w/ hardware parallel
    matching
  • CA-RAM design goals
  • High lookup performance
  • Low power consumption
  • Smaller chip area per stored datum
  • Straightforward system-level integration

7
Talk roadmap
  • What is CA-RAM?
  • Prototype design
  • Case study 1 IP lookup
  • Case study 2 Trigram lookup for speech
    recognition

8
CA-RAM Content Addressable RAM
Conventional CAM/TCAM
CA-RAM
Memory cells
Match logic
  • Separate match logic and memory
  • Match logic for a single row, not every row
  • Allows the use of dense RAM technology
  • Enables highly reconfigurable match logic
  • Keep keys sorted in each row, not in entire array

9
Very simple, yet efficient
  • Use hashing to store keys in a particular row
  • To look up, hash the search key and retrieve one
    row
  • Perform matching on entire row in parallel
  • Achieve full content addressability w/o paying
    overhead!

search key

Keyi1
Keyi2
Index generator

Keyj2
Keyj1

Match processor1
Match processor2
10
Pipelined CA-RAM operation
Keyi1
Keyi2
Keyi3
Keyj2
Keyj1
Keyj3
Index
Key matching
Index generation
Memory access
Result forwarding
11
Dealing w/ bucket overflows
  • Careful design of hash function
  • Increase bucket size
  • Reduce load factor (?) ? of occupied entries
    / of total entries
  • Use chaining store overflows in subsequent
    rows
  • Multiple accesses per lookup
  • Use a small overflow CAM, accessed in parallel
  • Similar to popular victim caching
  • Use two-level hashing and employ multiple CA-RAM
    banks

12
CA-RAM reconfig. opportunities
  • Reconfigurable match logic allows
  • Adapting key size to apps
  • Same hardware to support multiple apps or
    standards

13
Adapting key size
Keyi1
Keyi2
Keyi3
  • ? Adapting key size is straightforward
  • Will benefit supporting multiple apps/
  • standards

Keyj2
Keyj1
Keyj3
Match information
14
CA-RAM reconfig. opportunities
  • Reconfigurable match logic allows
  • Adapting key size to apps
  • Same hardware to support multiple apps or
    standards
  • Binary and ternary matching
  • Some apps require ternary matching, some dont

15
Supporting binary/ternary matching
Keyi1
Keyi2
Maski1
  • Developed configurable comparator
  • T-matching requires 2 bits / 1 symbol
  • Supporting different types of matching
  • in different bit positions feasible

Keyj2
Keyj1
Maskj1
Search key
Match information
16
CA-RAM reconfig. opportunities
  • Reconfigurable match logic allows
  • Adapting key size to apps
  • Same hardware to support multiple apps or
    standards
  • Binary and ternary matching
  • Some apps require ternary matching, some dont
  • Storing data and keys in a CA-RAM module
  • Cuts of memory accesses for a lookup by half

17
Simult. key matching data access
Keyi1
Keyi2
Datai1
  • Data access follows TCAM lookup
  • CA-RAM supports data embedding
  • Cuts memory traffic latency by half

Keyj2
Keyj1
Dataj1
Search key
Match information
Match result Data
18
CA-RAM reconfig. opportunities
  • Reconfigurable match logic allows
  • Adapting key size to apps
  • Same hardware to support multiple apps or
    standards
  • Binary and ternary matching
  • Some apps require ternary matching, some dont
  • Storing data and keys in a CA-RAM module
  • Cuts of memory accesses for IP lookup by half
  • Providing range checking capabilities
  • Beneficial for rule-based packet filtering

19
Supporting range checking
Keyi1
Rangei1
  • (Range checking causes troubles)
  • (Entries must be expanded)
  • CA-RAM can upport range checking efficiently

Rangej1
Keyj1
Search key
Match information
20
CA-RAM-based memory subsystem
21
Prototype implementation
  • We implemented a prototype CA-RAM slice design
    (w/ a degree of reconfigurability) and evaluated
    its power and area advantages over
    state-of-the-art TCAMs
  • We used a standard cell (0.16?m) based ASIC
    design flow

Step cells Area, ?m2 Delay, ns
Expand search key 3,804 66,228 (0.89)
Calculate match vector 5,252 10,591 0.95
Decode match vector 899 1,970 1.91
Extract result 6,037 21,775 1.99
Total 15,992 100,564 4.85
22
Area and power CA-RAM vs. TCAM
Cell area (?m2) _at_130nm CMOS
  • ? CA-RAM area advantage 4.5x11x
  • CA-RAM power advantage 4x14x

Power (W) 4.5Mb _at_143MHz
23
Performance CA-RAM vs. (T)CAM
24
Case study 1 IP lookup
25
Problem description
  • Given
  • A set of prefixes (each prefix is associated with
    output port number)
  • IP address
  • Find a prefix that matches with input IP address
    and return output port number associated with it
  • In the presence of multiple matching prefixes,
    choose the longest
  • Procedure
  • Find a good hash function to distribute prefixes
  • Determine CA-RAM organization

26
Data set and hashing method
  • IP core routers table having 186,760 entries
  • Bit selection scheme Zane et al. 03
  • 98 of prefixes are at least 16 bits long
  • Select hash bits from the first 16 bits
    (low-order bits)

27
Shaping CA-RAM
2,048 rows ? (32 entries)
  • Consider multiple design points

4,096 rows ? (64 entries)
Design A
(? 0.47)
Design B
(? 0.40)
Design C
(? 0.36)
Design D
(? 0.36)
Design F
(? 0.36)
Design E
(? 0.24)
28
Performance
Spilled entries
  • ? With a properly chosen ?,
  • CA-RAM achieves near-constant AMAL

(? 0.47)
(? 0.40)
(? 0.36)
(? 0.36)
(? 0.24)
(? 0.36)
Uniform traffic
Average memory access latency
Skewed traffic
29
Area and power
Design B
Relative area or power
? CA-RAM advantageous over TCAM
30
Case study 2 Trigram lookup in speech recognition
31
Problem, data set, and hashing
  • Problem
  • Look up a trigram in the trigram database
  • Data set
  • A subset of the Sphinx trigram database
  • We picked up entries having 1316 characters
  • Still 5,385,231 entries or 86MB
  • Hashing
  • DJB, an efficient string hash function
  • (Used in Sphinx)

32
Result
33
Data distribution
34
Area comparison
Relative area
CAM
CA-RAM
35
CA-RAM conclusions
  • Compared w/ software methods
  • Less of memory accesses higher lookup
    performance
  • Compared w/ CAM or TCAM
  • Higher density matching that of DRAM ? large
    lookup table
  • Competitive performance
  • Low power a critical advantage for
    cost-effective system design
  • Reconfigurable
  • Can accommodate apps having different key/record
    sizes, binary vs. ternary searching requirements,
    range checking,
  • Can adopt new standards much more easily, e.g.,
    IPv6
  • Two case studies show the efficacy of the CA-RAM
    approach
  • 35 improvement in area and power, compared with
    CAM/TCAM

36
CA-RAMA High-Performance Memory Substrate for
Search-Intensive Applications
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com