Compressing State: New Approaches and Results - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Compressing State: New Approaches and Results

Description:

Number of bits in signature = k ... Now that we have generated the signature bit-map for the set S, we can use it to ... k d bits of a flow' signature (d q) ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 21
Provided by: NIC8179
Category:

less

Transcript and Presenter's Notes

Title: Compressing State: New Approaches and Results


1
Compressing StateNew Approaches and Results
Balaji Prabhakar
Balaji Prabhakar balaji_at_stanford.edu
  • NSF Workshop, Boston University
  • May 25, 2006

2
Background
  • State of a computer system (a rough definition)
  • It is the dynamically generated information the
    system stores at any time to help process all
    possible future inputs
  • State can be very large
  • In distributed systems, state may have locality
    a part of the state may be generated at one
    location and later used (at perhaps another)
    location
  • Generating and maintaining state
  • Can be very useful for speeding up processing
  • State is compact when the class of inputs is
    small, hence easy to maintain
  • Can be extremely expensive to store if input
    space is big
  • Can be difficult to maintain consistently in a
    large distributed system
  • The Internet and Ethernet are large distributed
    systems with nearly arbitrary inputs

3
State in networks
  • Is essentially non-existent inside the network,
    usually pushed out to the edge
  • This idea is rooted in the famous end-to-end
    design design philosophy
  • Perform all flow-related processing at the edge,
    leave network to process packets dumb network,
    smart end-systems
  • In fact, Ethernet is so stateless that its
    switches actively age out forwarding information
  • e.g. your Ethernet switch will forget you if you
    did not send or receive a packet in the most
    recent 20 mins forward-and-forget
  • E2e gradually used by router manufacturers to
    simplify router design
  • Maintaining per-flow state can be expensive
  • That was then

4
Now
  • Various new applications (bandwidth partitioning,
    security, accounting, etc) require the network to
    recognize flows
  • This talk is about some simple schemes which
    enable the network to recognize flows
  • The key is to give up on accurate flow state
    maintenance
  • Fuzzy state gives a tremendous boost in
    performance at a low cost in many cases
  • In the rest of the talk
  • We overview some recent results on how the
    network can retain fuzzy flow state
  • And mention the use of these ideas in wireless
    sensor networks, which are particularly
    resource-limited

5
Flow recognition
  • Approximate flow-state two varieties
  • Nearly exact state for a small number of flows
  • Approximate state for all flows fuzzy flow
    memories
  • Yi Lu, B.P., Flavio Bonomi, Allerton 05.

6
Bloom filters
  • Invented by Bloom in 1970 as a data structure for
    supporting set membership queries at v.high
    speed.
  • Finds applications in databases, networking,
    other places.
  • In networking, quite useful for designing
    high-speed hardware algorithms
  • White lists, blacklists (access control)
  • Flow-level measurements
  • Deep packet inspection
  • Address lookup
  • Many other uses

7
Bloom filters
  • Work as follows.
  • Suppose we were given a set, S, of items that we
    would like to store compactly.
  • Notation
  • Number of items in S n
  • Size of bit-map m
  • Number of bits in signature k
  • (Signatures chosen IID uniformly from d in
    0,1m d has exactly k 1s in it)

8
Bloom filters Querying
  • Now that we have generated the signature bit-map
    for the set S, we can use it to answer queries

?
?
  • False positive probability minimized by choosing
  • Clever argument of Paul Cuff Use Rate-Distortion
    Theory to obtain a lower bound on the amount of
    space needed. This bound is better than the
    above by the factor of just ln2 BFs are nearly
    optimal.

9
Deletions and false negatives
  • Suppose the red item is to be deleted from the
    set
  • Not possible to simply unset its signature bits
    in the bit-map
  • The blue and orange flows will become false
    negatives
  • Counting Bloom filters solve this problem by
    using extra space

10
Deletions and false negatives
  • In practice
  • Counting BFs need 3 or 4 bit counters to keep
    false negatives small enough this could be too
    much space for high-speed hardware
    implementations
  • So, if the set S varies with time, we need a BF
    which tracks it inexpensively
  • We propose the idea of variable-length flow
    signatures that help cope with deletions
    naturally and have other uses

11
VBFs Bloom filters with variable-length
signatures
  • Definition of VBFs
  • Insertion set t ? k bits of a flows signature
    in the bit-map
  • Query a flow is valid if at least q of its bits
    are present in bit-map B
  • Deletion unset k?d bits of a flow signature (d
    lt q)
  • Recover option to set additional bits in a valid
    flows signature (strengthening faint signatures)

12
Uses of VBFs
  • VBFs help perform deletions. Two other uses are
  • Approximately determine the length of a flow
  • signature length as a proxy (sufficient
    statistic) for flow length
  • Fuzzy flow memories

13
Bank of BFs
  • A BF can tell whether an item is in a set or not.
    It cannot identify the item.
  • Question How can we use BFs to identify
    items?
  • Blood test puzzle Exactly 1 person in a group of
    n people has a virus which can be detected via a
    blood test. What is the minimum number of blood
    tests needed to identify the infected person?
  • Maximum number n
  • Minimum number log2n
  • Classification problem Let the set S of flows
    be partitioned into subsets Sj, where Sj is the
    subset of flows to whose packets action Aj must
    be applied. Given a packet, identify the subset
    to which its flow belongs.

14
Bank of BFs
  • Solution 1 Consider a bank of BFs, one per
    action. Thus, total number of BFs A. Load
    signatures of flows corresponding to Aj into the
    jth BF. Scan incoming packets across the bank,
    determine the action. (Chang et al stated this
    explicitly, it seems well-known to the practice.)
  • Problems of Solution 1
  • Large number of filters
  • Uneven distribution of flows across actions
  • (clever load balancing trick of Chang et al
    results in a more even load)
  • Errors and false positives
  • Solution 2 Use log2n filters, encode the action
    and load signatures appropriately. This reduces
    the number of filters dramatically
  • Problems of Solution 2 Uses much bigger total
    space in bits for the same false positive and
    error rates (because of redundant loading of
    signatures)
  • Best to use action-encodings

15
Action encodings
  • Consider the following example
  • Number of flows 100,000
  • Number of actions 1000
  • Each flow belongs to one of the actions u.a.r.
  • 1-encoding Take 1000 BFs.
  • Encode action Aj as the binary number
    (0,0,,1,0,0)
  • Load the signature of all flows corresponding to
    Aj into filter whose index equals the index of
    the 1 in the encoding for Aj
  • 2-encoding Take 45 BFs (because 45 \choose 2 ?
    1000)
  • Encode Aj as a 45-bit binary number with exactly
    two 1s in it.
  • Load signature of flows for Aj into filters
    corresponding to the 1s in the encoding
  • 3-encoding Take 20 BFs, etc

16
Comparing encoding schemes
17
Applications to Wireless Sensor Nets
  • Given their inexpensive nature, approximate
    memories seem to be made for resource-limited
    wireless sensor networks
  • We are considering the following two uses
  • Routing in mobile networks
  • Locating information (distributed database)

18
Routing in mobile networks
  • Consider the n x n two dimensional torus
  • Assume that wireless nodes move according to some
    process (e.g. random walk)
  • Assume that there is a switch or a router at each
    grid point
  • Problem We would like to route messages to the
    mobile wireless nodes, the underlying network is
    ad hoc
  • We assume each stationary node keeps a small
    history of recently seen nodes, say the 10 most
    recently seen mobiles
  • In the straight forward solution to this problem
  • A message follows its intended mobile in a
    directed random way
  • At each grid point, query neighbors to see if any
    one has seen your target
  • If yes, proceed to that node (breaking ties at
    random)
  • Else, choose one of the neighbors at random

19
Bloom filter based algorithm
  • Store the signatures of the recently seen flows
    in a VBF
  • Two advantages
  • Can store many more entries, chance of collision
    is small
  • More importantly, entries fade away, their
    presence in the memory is not binary similar to
    ants and pheromones
  • This is preliminary, will report progress in the
    near future

20
Conclusions
  • State is worth maintaining, leads to much better
    performance
  • Devices like Bloom filters are interesting
    candidates for building fuzzy flow memories
  • They seem particularly suited for
    resource-constrained wireless sensor networks
Write a Comment
User Comments (0)
About PowerShow.com