Title: HiFi Systems: Network-Centric Query Processing for the Physical World
1HiFi SystemsNetwork-Centric Query Processing
for the Physical World
- Michael J. Franklin, Shawn R. Jeffrey, et al
- UC Berkeley TelegraphCQ Team
- 2nd CIDR Conf. 2005
2Table of Contents
- One line Comment
- Motivating Scenario
- HiFi System with CSAVA processing stage
- Internal Architecture of HiFi Node
- Critiques
- New Idea -1,2
3One line Comment
- Its a preliminary work describing the groups
vision to distribute their TelegraphCQ system to
a hierarchical network
4Motivating Scenario Supply Chain Management
- Smart Shelves continuously monitor item
addition and removal. - Info is sent back through the supply chain.
5Hi Fan-In system
Ursa-Major (TelegraphCQ w/Archiving)
Mid-tier Stargate Mid-tier Processing Node
6Characteristics of HiFi Systems
- High Fan-In, globally-distributed architecture
- Large data volumes generated at edges
- Filtering and cleaning must be done there
- Successive aggregation as you move inwards
- Summaries/anomalies continually, details later
- Strong temporal focus
- Strong spatial/geographic focus
- Streaming data and stored data
- Integration within and across enterprises
7A View on this example
Archiving (provenance and schema evolution)
Filtering,Cleaning,Alerts
Monitoring, Time-series
Data mining (recent history)
8High fan-in system levels with associated CSAVA
processing stages
Headquarters
Regional Centers
Warehouse
Warehouse Doors
Receptor
9Internal Architecture of a HiFi node
Query Placement Service
Query Listener
Control Manager
Data Disseminator
Query Planner
Metadata Repository
Data Stream Processor
Local View Manager
DSP Manager
Logical Query Planner
Archive Manager
Physical Query Planner
Cache Manager
Data Flow
Resource Manager
Query Dispatcher
Query Flow
Data Listener
HiFi Glue
Control Flow
10Critiques
- Strong Point
- They classify and formulate five distinct data
processing stage - They develop the prototype system (in VLDB 05)
- Weak Point
- Designing MDR is critical but no initial effort
is done - No new system requirement
- Solutions are not technically deep
11New Idea - 1
By-passing
SP Accel
Buffering
Filtered out
Data Source
CQ engine
Web Server
Clients
12New Idea related to SPAccel
- Designing front-end component (Cache??)
- Filtering out unwanted input data
- By-passing data matching query predicates
- Buffering data for windowed queries (views) or
distributed queries - Buffering Query Results
13Issues expected
- Cache replacement mechanism
- How to index cached elements
- What to cache?
- How much?
14New Idea -2 processing stream data for OLAP
queries
- OLTP OLAP
- Users Clerk, IT professional Knowledge
worker - Function Day to day operations decision
support - DB design application-oriented subject-oriente
d - Data current, up-to-date historical,
summarized - detailed, flat relational
multidimensional - isolated integrated, consolidated
- Usage repetitive ad-hoc
- Access read/write, lots of scans
- index/hash on prim. key
- Unit of work short, simple transaction comple
x query - Records accessed tens millions
- Users thousands hundreds
- DB size 100MB-GB 100GB-TB
- Metric transaction throughput query
throughput/response
15A Sample Data Cube
Date
1Q
2Q
3Q
4Q
camera
C o u n t r y
Product
video
USA
CD
Canada
Mexico
16New Idea - 2
- Stream data in terms of OLAP domain
- OLAP queries are
- Inherently multidimensional
- Spans a long time
- Need data from multiple sources
- Processing OLAP queries are
- Memory intensive
- Computation intensive
17Naïve Solution
- Pre-computing popular computation path
18Supplementary Silde
- Cleaning
- CREATE VIEW cleaned_rfid_stream AS
- ( SELECT receptor_id, tag_id
- FROM rfid_stream rs
- WHERE read_strength gt strength_T)
- Smoothing
- CREATE VIEW smoothed_rfid_stream AS
- ( SELECT receptor_id, tag_id
- FROM cleaned_rfid_stream
- GROUP BY receptor_id, tag_id
- HAVING count() gt count_T)