Title: Semantic Data Caching and Replacement
1Semantic Data Caching and Replacement
- Shaul Dar, Michael J. Frankin, Bjorn T. Jonsson,
- Divesh Srivastava, Michael Tan
Proceedings of the 22nd VLDB Conferences Mumbai
(Bombay), India, 1996
Presented by Kunhao Zhou
2Outline
- Motivation
- Client Caching Architecture
- Model of Semantic Caching
- Simulations and Results
- Conclusion and Future Work
3Motivation
- Distributed database
- Client are high-end workstations(fat client)
- High computational power.
- Big local storage
4Motivation (Contd.)
- Effectively use of client is a key to achieving
high performance. - Less network traffic.
- Faster response time.
- Higher server throughput.
- Better scalability.
5Client Caching Architecture
- Data-Shipping.
- Client process query.
- Data are bought on-demand from servers.
- Navigational access.
- Object ID (Tuple ID or Page ID).
- Can be categorized as tuple-based or page-based
- Cache Replacement Policies
- LRU.
- MRU.
6Client Caching Architecture (Contd.)
- Data-Shipping.
- Problem.
- Application require associative access to data.
Eg. As provided by relational query languages.
7Client Caching Architecture (Contd.)
- Query-Shipping.
- Associative access to data.
- Problems.
- Implementation doesnt support client caching.
(No caching).
8Client Caching Architecture (Contd.)
- Semantic Caching.
- A model that integrates support for associative
access into an architecture based on
data-shipping. - Advantage.
- Exploit the semantic information to effectively
manage client cache.
9Client Caching Architecture (Contd.)
- Semantic Caching.
- Semantic description of the data rather than use
record-id or page-id. - Can be used to generate remainder query to send
to server if the requested tuples are not
available locally. - Information for replacement is maintained as
semantic regions. - Low overhead, insensitive to bad clustering.
- Cache replacement use value function based on
semantic description. Not just LRU or MRU.
10Client Caching Architecture (Contd.)
11Model of Semantic Caching
- Remainder Query
- Semantic Regions
- Replacement Issues
12Remainder Query
- Relation Re, query Q, client cache V.
- Probe query P(Q,V) Q ÙV can be answered
locally. - Remainder query R(Q,V) QÙ(Ø V) should be sent
to the server. - Example
- Select from E where.
- salarylt 60,000 and salary gt30,000.
- Client cache all the tuples,
- which salary lt 50,000.
- Q (salarylt 60,000 ) Ù (salary gt30,000).
- V (salary lt50,000).
- P (salarylt50,000) Ù(salary gt30,000).
- R (salarygt50,000) Ù(salarylt 60,000 ).
P
R
Re
V
Q
13Semantic Regions
- Cache management and replacement unit.
- Grouped by semantic value. Each semantic region
has same replacement value. - Described by a constrained formula.
- Consideration
- Semantic region merge. (Always not merge)
(a)Original regions
(a)Regions after Q
14Semantic Regions
- Cache management and replacement unit.
- Grouped by semantic value. Each semantic region
has same replacement value. - Described by a constrained formula.
- Consideration
- Semantic region merge.(always merge)
(a)Original regions
(a)Regions after Q
15Replacement Issues
- Temporal locality
- LRU, MRU
16Replacement Issues (Contd.)
- Semantic locality
- Manhattan distance
- (Note) Manhattan distance Definition The
distance between two points measured along axes
at right angles. In a plane with p1 at (x1, y1)
and p2 at (x2, y2), it is x1 - x2 y1 - y2.
O
p1
O
O
o
p2
p1 p2 p2O p1O
17Simulation and Result
-
- Relation has three candidate keys, Unique2 is
indexed and clustered, Unique1 is indexed and
unclustered, Unique3 is unindexed and unclustered.
18Simulation and Result (Contd.)
- Unique2 (Clustered Index).
- Performance
- Almost the same.
- Page-based is slightly better.
- Reason
- Page-based overhead is smaller.
19Simulation and Result (Contd.)
- Unique1(Unclustered Index).
- Performance
- Tuple-based and semantic-based.
- are much better.
- Reason
- Page-based is sensitive to
- clustered.
20Simulation and Result (Contd.)
- Unique3(UnIndexed and Unclustered).
- Performance
- Semantic-based is better.
- Reason
- Remainder enables client and server.
- process query in parallel.
21Simulation and Result (Contd.)
- Semantic locality / Manhattan
- distance on Unique1.
- Performance
- Manhattan distance
- is better than LRU.
- Reason
- Cold regions will be replaced
- faster.
22Conclusion and Future Work
- Conclusion.
- A simple model with selection query, semantic
caching provides better performance. - Future work.
- Implementation issues for complex query, update,
deletion, and insertion - Concurrency control.
- Consistency.
- Completeness.
- A Predicate-based caching scheme for
client-server database architecture. (Arthur M.
Keller and Julie Basu)