End-to-end data deduplication for the mobile Web - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

End-to-end data deduplication for the mobile Web

Description:

... Chunk division Karp-Rabin rolling hash Winnowing * End-to-end data deduplication for the mobile Web * Server Algorithm: Chunk division Karp-Rabin rolling hash ... – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 41
Provided by: ric140
Category:

less

Transcript and Presenter's Notes

Title: End-to-end data deduplication for the mobile Web


1
End-to-end data deduplicationfor the mobile Web
Ricardo Filipe e João Barreto
Distributed Systems Group INESC-ID/Instituto
Superior Técnico
rfilipe_at_gsd.inesc-id.pt
2
Motivation
3
Motivation
4
Motivation
5
Motivation
6
Problem?
7
Problem?
8
Solution Eliminate Redundant Data
9
Solution Eliminate Redundant Data
10
Basic Deduplication Techniques
  • Classical Caching
  • Detects only fully redundant resources

11
Basic Deduplication Techniques
  • Classical Caching
  • Detects only fully redundant resources
  • Gzip Compression
  • Detects redundant chunks only within the resource

12
Basic Deduplication Techniques
  • Classical Caching
  • Detects only fully redundant resources
  • Gzip Compression
  • Detects redundant chunks only within the resource
  • Delta Encoding
  • Detects redundant data only between pairs of
    resources
  • Slow delta computation/offline

13
Advanced Deduplication Techniques
  • Value Based Web Cache Rhea, WWW03

14
Advanced Deduplication Techniques
  • Value Based Web Cache Rhea, WWW03
  • An ISP proxy is not suitable for roaming clients

15
Advanced Deduplication Techniques
  • Value Based Web Cache Rhea, WWW03
  • An ISP proxy is not suitable for roaming clients
  • The ISP proxy is useless for encrypted data
    (HTTPS)

16
Advanced Deduplication Techniques
  • Value Based Web Cache Rhea, WWW03
  • An ISP proxy is not suitable for roaming clients
  • The ISP proxy is useless for encrypted data
    (HTTPS)
  • High resource usage on the client

17
Advanced Deduplication Techniques
  • Value Based Web Cache Rhea, WWW03
  • An ISP proxy is not suitable for roaming clients
  • The ISP proxy is useless for encrypted data
    (HTTPS)
  • High resource usage on the client
  • DedupHTTP solves all these limitations!
  • And some more ?

18
DedupHTTP in one slide
19
DedupHTTP in one slide
20
DedupHTTP in one slide
21
DedupHTTP in one slide
22
Server AlgorithmChunk division
  • Karp-Rabin rolling hash
  • Winnowing

23
Server AlgorithmChunk division
  • Karp-Rabin rolling hash
  • Winnowing
  • MurmurHash (or MD5, SHA1, etc.)

24
Server AlgorithmMetadata Storage
  • Chunk array
  • Chunk Hash Table

25
Server AlgorithmChunk Search
26
Server AlgorithmEncoding
27
Optimizations
  • Metadata Coalescing
  • Join contiguous chunk metadata blocks into one
    response metadata block
  • Especially relevant for (almost) fully redundant
    resources

28
Optimizations
  • Metadata Coalescing
  • Join contiguous chunk metadata blocks into one
    response metadata block
  • Especially relevant for (almost) fully redundant
    resources
  • Old resource versions only need to store their
    metadata on the server

29
DedupHTTP Advantages
  • Online Deduplication

30
DedupHTTP Advantages
  • Online Deduplication
  • Detects redundancy between different resources
    and versions of resources

31
DedupHTTP Advantages
  • Online Deduplication
  • Detects redundancy between different resources
    and versions of resources
  • High detail in redundancy detection

32
Evaluation
  • Implemented in proxies on a Web browser machine
    and a Web server machine, connected through
    Internet, no LAN
  • Compared Systems
  • DedupHTTP
  • Gzip
  • DedupHTTP Gzip (Hybrid)
  • Delta-Encoding
  • Classical Caching

Workload Name Number of resources Total Size
Cnn.com 337 42 MB
Engadget.com 335 36 MB
Huffingtonpost.com 401 62 MB
33
EvaluationAverage Chunk Size
34
EvaluationComparison of redundancy detected
35
EvaluationTime To Display Over Internet
36
Conclusions
  • HTTP transfers can be greatly improved

37
Conclusions
  • HTTP transfers can be greatly improved
  • DedupHTTP Solution that takes most of the good
    points of previous solutions and discards the bad
    ones

38
Conclusions
  • HTTP transfers can be greatly improved
  • DedupHTTP Solution that takes most of the good
    points of previous solutions and discards the bad
    ones
  • DedupHTTP was evaluated against reference
    solutions for Web deduplication in the access to
    relevant Web sites
  • Traffic savings of up to 94.5 without
    deterioration of Time To Display

39
Future Work
  • Reorganize and improve response metadata
  • Create a resource storage heuristic that works
    for sites that can be accessed through several
    Web Servers

40
Questions?
technologyfrom seed
http//www.gsd.inesc-id.pt/
rfilipe_at_gsd.inesc-id.pt
Write a Comment
User Comments (0)
About PowerShow.com