Query Processing and Networking Infrastructures

About This Presentation

Title:

Query Processing and Networking Infrastructures

Description:

Day 2: Seed some cross-fertilized research. Especially with networking ... Day 2: Research Synergies w/Networking. Queries as indirection, revisited ... – PowerPoint PPT presentation

Number of Views:182

Avg rating:3.0/5.0

Slides: 96

Provided by: joehell

Learn more at: https://dsf.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Query Processing and Networking Infrastructures

1
Query Processing and Networking Infrastructures

Day 1 of 2
Joe Hellerstein
UC Berkeley
Septemer 20, 2002

2
Two Goals

Day 1 Primer on query processing
Targeted to networking/OS folk
Bias systems issues
Day 2 Seed some cross-fertilized research
Especially with networking
Thesis dataflow convergence
query processing and routing
Clearly other resonances here
Dataflow HW architectures
Event-based systems designs
ML and Control Theory
Online Algorithms

3
(Sub)Space of Possible Topics
Distributed Federated QP
TransactionalStorage Networking
Active DBs (Trigger Systems)
TraditionalRelational QPOptimization
Execution
Data Model Query LanguageDesign
Parallel QP
NFNF Data Models (OO, XML, Semistructured)
Adaptive QP
Online and Approximate QP
Indexing
Data Reduction
Data Streams Continuous Queries
Visual Querying Data Visualization
Media Queries,Feature Extraction Similarity
Search
Compression
Statistical Data Analysis(Mining)
Boolean Text Search
Traditional TextRanking
HypertextRanking
4
Likely Topics Here
TransactionalStorage Networking
Distributed Federated QP
Active DBs (Trigger Systems)
TraditionalRelational QPOptimization
Execution
Data Model Query LanguageDesign
Parallel QP
NFNF Data Models (OO, XML, Semistructured)
Adaptive QP
Online and Approximate QP
Indexing
Data Reduction
Data Streams Continuous Queries
Visual Querying Data Visualization
Media Queries,Feature Extraction Similarity
Search
Compression
Statistical Data Analysis(Mining)
Boolean Text Search
Traditional TextRanking
HypertextRanking
5
Plus Some Speculative Ones
Distributed Federated QP
TraditionalRelational QPOptimization
Execution
ContentRouting
IndirectionArchitectures
Parallel QP
Adaptive QP
Online and Approximate QP
Indexing
Peer-to-PeerQP
Data Streams Continuous Queries
SensornetQP
NetworkMonitoring
Boolean Text Search
Traditional TextRanking
6
Outline

Day 1 Query Processing Crash Course
Intro
Queries as indirection
How do relational databases run queries?
How do search engines run queries?
Scaling up cluster parallelism and distribution
Day 2 Research Synergies w/Networking
Queries as indirection, revisited
Useful (?) analogies to networking research
Some of our recent research at the seams
Some of your research?
Directions and collective discussion

7
Getting Off on the Right Foot
8
Roots database and IR research

Top-down traditions (applications)
Usually begins with semantics and models.
Common Misconceptions
Query processing Oracle or Google.
Need not be so heavyweight or monolithic! Many
reusable lessons within
IR search and DB querying are fundamentally
different
Very similar from a query processing perspective
Many similarities in other data models as well
Querying is a synchronous, interactive process.
Triggers, rules and "continuous queries" not so
different from plain old queries.

9
So well go bottom-up

Focus on resuable building blocks
Attempt to be language- and model-agnostic
illustrate with various querying scenarios

10
Confession Two Biases

Relational query engines
Most mature and general query technology
Best documented in the literature
Conceptually general enough to capture most all
other models/schemes
Everybody does web searches
So its both an important app, and an inescapable
usage bias we carry around
It will inform our discussion. Shouldnt skew it
Lots of other query systems/languages you can
keep in mind as we go
LDAP, DNS, XSL/Xpath/XQuery, Datalog

11
What Are Queries For? I

Obvious answer search and analysis over big data
sets
Search select data of interest
Boolean expressions over content
sometimes with an implicit ordering on results
Analysis construct new information from base
data.
compute functions over each datum
concatenate related records (join)
partition into groups, summarize (aggregates)
aside Mining vs. Querying? As a rule of
thumb, think of mining as WYGIWIGY.
Not the most general, powerful answer

12
What Are Queries For? II

Queries bridge a (large!) level of indirection
Declarative programming what you want, not how
to get it
Easy (er) to express
Allows the how to change under the covers
A critical issue!
Not just for querying
Method invocation, data update, etc

?? !!
13
Motivation for this Indirection

Critical when rates of change differ across
layers
In particular, when dapp/dt ltlt denvironment/dt
E.g. DB apps are used for years, decades (!!)
E.g. networked env high rates of change (??)
DB lit calls this data independence

14
Data Independence Background

Bad Old Days
Hierarchical and Network (yep!) data models
Nesting pointers mean that apps explicitly
traverse data, become brittle when data layouts
change
Apps with persistent data have slow dapp/dt
And the database environments change faster!
Logical changes to representation (schema)
Physical changes in storage (indexes, layouts,
HW)
DBs often shared by multiple apps!
In B.O.D., all apps had to be rewritten on change

15
Its a SW Engineering Thing

Analogy imagine if your C structs were to
survive for decades
youd keep them very simple
encapsulation to allow future mods
Similar Analogy to NWs
protocol simplicity is good
soft state is good (discourages hardcoded refs to
transient resources)
But the fun systems part follows directly
Achieve the goal w/respectable performance over a
dynamic execution environment

16
Codds Data Independence

Ted Codd, IBM c. 1969 and forward
Turing award 1981
Two layers of indirection

Applications
Spanned by views and query rewriting
Logical Independence
Logical Representation(schema)
Spanned by queryoptimization andexecution
Physical Independence
Physical Representation (storage)
17
A More Architectural Picture
Bridges logical independence
Declarative queryover views
Query Rewriter
Bridges physical independence
Query Processor
Declarative queryover base tables
Optimizer
Query Plan (Procedural)
Executor
N.B. This classical QParchitecture raises
someproblems. To be revisited!
IteratorAPI
Access Methods
18
Access Methods Indexing
19
Access Methods

Base data access layer
Model Data stored in unordered collections
Relations, tables, one type per collection
Interface iterators
Open(predicate) -gt cursor
Usually simple predicates attribute op constant
op usually arithmetic (lt, gt, ), though well see
extensions (e.g. multi-d ops)
Next(cursor) -gt datum (of known type)
Close(cursor)
Insert(datum of correct type)
Delete(cursor)

20
Typical Access Methods

Heap files
unordered array of records
usually sequential on disk
predicates just save cross-layer costs
Traditional Index AMs
B-trees
actually, B-trees all data at leaves
Can scan across leaves for range search
predicates (lt,gt,, between) result in fewer I/Os
random I/Os (at least to find beginning of range)
Linear Hash index
Litwin 78. Supports equality predicates only.
This is it for IR and standard relational DBs
Though when IR folks say indexing, they
sometimes mean all of query processing

21
Primary Secondary Indexes
Directory
Directory
Data
(key, ptr) pairs
Data
22
Primary Secondary Indexes
Directory
(key, ptr) pairs
23
An Exotic Forest of Search Trees

Multi-dimensional indexes
For geodata, multimedia search, etc.
Dozens! E.g. R-tree family, disk-based
Quad-Trees, kdB-trees
And of course linearizations with B-trees
Path indexes
For XML and OO path queries
E.g. Xfilter
Etc.
Lots of one-off indexes, often many per workload
No clear winners here
Extensible indexing scheme would be nice

24
Generalized Search Trees (GiST)
Hellerstein et al., VLDB 95

What is a (tree-based) DB index? Typically
A clustering of data into leaf blocks
Hierarchical summaries (subtree predicates --
SPs) for pointers in directory blocks

25
Generalized Search Trees (GiST)

Can realize that abstraction with simple
interface
User registers opaque SP objects with a few
methods
Consistent(q, p) should query q traverse
subtree?
Penalty(d, p) how bad is it to insert d below p
Union (p1, p2) form SP that includes p1, p2
PickSplit(p1, , pn) partition SPs into 2
Tree maintenance, concurrency, recovery all
doable under the covers
Covers many popular multi-dimensional indexes
Most of which had no concurrency/recovery story
http//gist.cs.berkeley.edu

26
Some Additional Indexing Tricks
ONeil/Quass, SIGMOD 97

Bitmap indexing
Many matches per value in (secondary) index?
Rather than storing pointers to heap file in
leaves, store a bitmap of matches in a (sorted)
heap file.
Only works if file reorg is infrequent
Can make intersection, COUNT, etc. quicker during
query processing
Can mix/match bitmaps and lists in a single index
Works with any (secondary) index with duplicate
matches
Vertical Partitioning / Columnar storage
Again, for sorted, relatively static files
Bit-slice indexes

27
Query Processing Dataflow Infrastructures
28
Dataflow Infrastructure

Dataflow abstraction is very simple
box-and-arrow diagrams
(typed) collections of objects flow along edges
Details can be tricky
Push or Pull?
More to it than that
How do control-flow anddataflow interact?
Where does the data live?
Dont want to copy data
If passing pointers, where doesthe real data
live?

29
Iterators

Most uniprocessor DB engines use iterators
Open() -gt cursor
Next(cursor) -gt typed record
Close(cursor)
Simple and elegant
Control-flow and dataflow coupled
Familiar single-threaded, procedure-call API
Data refs passed on stack, no buffering
Blocking-agnostic
Works w/blocking ops -- e.g. Sort
Works w/pipelined ops
Note well-behaved iterators come up for air
in inner loops
E.g. for interrupt handling

g
f
S
R
30
Where is the In-Flight Data?

In standard DBMS, raw data lives in disk format,
in shared Buffer Pool
Iterators pass references to BufPool
A tuple slot per iterator input
Never copy along edges of dataflow
Join results are arrays of refs to base tables
Operators may pin pages in BufPool
BufPool never replaces pinned pages
Ops should release pins ASAP (esp. across Next()
calls!!)
Some operators copy data into their internal
state
Canspill this state to private disk space

31
Weaknesses of Simple Iterators

Evolution of uniprocessor archs to parallel archs
esp. shared-nothing clusters
Opportunity for pipelined parallelism
Opportunity for partition parallelism
Take a single box in the dataflow, and split it
across multiple machines
Problems with iterators in this environment
Spoils pipelined parallelism opportunity
Polling (Next()) across the network is
inefficient
Nodes sit idle until polled, and during comm
A blocking producer blocks its consumer
But would like to keep iterator abstraction
Especially to save legacy query processor code
And simplify debugging (single-threaded,
synchronous)

32
Exchange
Graefe, SIGMOD 90

Encapsulate partition parallelism asynchrony
Keep the iterator API between ops
Exchange operator partitions input data by
content
E.g. join or sort keys
Note basic architectural idea!
Encapsulate dataflowtricks in operators,
leavinginfrastructure untouched
Well see this again next week, e.g. in Eddies

33
Exchange Internals

Really 2 operators, XIN and XOUT
XIN is top of a plan, and pulls, pushing
results to XOUT queue
XOUT spins on its local queue
One thread becomes two
Producer graph XIN
Consumer graph XOUT
Routing table/fn in XINsupports partition
parallelism
E.g. for sort, join, etc.
Producer and consumer see iterator API
Queue thread barrier turns NW-based push into
iterator-style pull

XOUT
XIN
route
Exchange
34
Exchange Benefits?

Remember Iterator limitations?
Spoils pipelined parallelism opportunity
solved by Exchange thread boundary
Polling (Next()) across the network is
inefficient
Solved by XIN pushing to XOUT queue
A blocking producer blocks its consumer
Still a problem!

35
Exchange Limitations

Doesnt allow consumer work to overlap w/blocking
producers
E.g. streaming data sources, events
E.g. sort, some join algs
Entire consumer graph blocks if XOUT queue empty
Control flow coupled to dataflow, so XOUT wont
return without data
Queue is encapsulated from consumer
But
Note that exchange model is fine for most
traditional DB Query Processing
May need to be extended for new settings

36
Fjords
Madden/Franklin, ICDE 01

Thread of control per operator
Queues between each operator
Asynch or synch calls
Can do asynch poll-and-yield iteration in each
operator (for both consumer and producer)
Or can do synchronous get_next iteration
Can get traditional behavior if you want
Synch polls queue of size 1
? Iterators
Synch consumer, asynch producer
Exchange
Asynch calls solve the blocking problem of
Exchange

iterator
exchange
37
Fjords

Disadvantages
Lots of threads
Best done in an event-programming style, not OS
threads
Operators really have to come up for air
(yield)
Need to write your own scheduler
Harder to debug
But
Maximizes flexibility for operators at the
endpoints
Still provides a fairly simple interface for
operator-writers

38
Basic Relational Operators and Implementation
39
Relational Algebra Semantics

Selection sp(R)
Returns all rows in R that satisfy p
Projection pC(R)
Returns all rows in R projected to columns in C
In strict relational model, remove duplicate rows
In SQL, preserve duplicates (multiset semantics)
Cartesian Product R ? S
Union R ? S Difference R S
Note R, S must have matching schemata
Join R p S sp(R ? S)
Missing Grouping Aggregation, Sorting

40
Operator Overview Basics

Selection
Typically free, so pushed down
Often omitted from diagrams
Projection
In SQL, typically free, so pushed down
No duplicate elimination
Always pass the minimal set of columns downstream
Typically omitted from diagrams
Cartesian Product
Unavoidable nested loop to generate output
Union
Concat, or concat followed by dup. elim.

41
Operator Overview, Cont.

Unary operators Grouping Sorting
Grouping can be done with hash or sort schemes
(as well see)
Binary matching Joins/Intersections
Alternative algorithms
Nested loops
Loop with index lookup (Index N.L.)
Sort-merge
Hash Join
Dont forget have to write as iterators
Every time you get called with Next(), you adjust
your state and produce an output record

42
Unary External Hashing
Bratbergsengen, VLDB 84

E.g. GROUP BY, DISTINCT
Two hash functions, hc (coarse) and hf (fine)
Two phases
Phase 1 for each tuple of input, hash via hc
into a spill partition to be put on disk
B-1 blocks of memory used to hold output buffers
for writing a block at a time per partition

43
Unary External Hashing

Phase 2 for each partition, read off disk and
hash into a main-memory hashtable via hf
For distinct, when you find a value already in
hashtable, discard the copy
For GROUP BY, associate some agg state (e.g.
running SUM) with each group in the hash table,
and maintain

44
External Hashing Analysis

To utilize memory well in Phase 2, would like
each partition to be B blocks big
Hence works in two phases when B gt ?R
Same req as external sorting!
Else can recursively partition the partitions in
Phase 2
Can be made to pipeline, to adapt nicely to small
data sets, etc.

45
Hash Join (GRACE)
Fushimi, et al., VLDB 84

Phase 1 partition each relation on the join key
with hc, spilling to disk
Phase 2
build each partition of smaller relation into a
hashtable via hf
scan matching partition of bigger relation, and
for each tuple probe the hashtable via hf for
matches
Would like each partition of smaller relation to
fit in memory
So works well if B gt ?smaller
Size of bigger is irrelevant!! (Vs. sort-merge
join)
Popular optimization Hybrid hash join
Partition 0 doesnt spill -- it builds and
probes immediately
Partitions 1 through n use rest of memory for
output buffers
DeWitt/Katz/Olken/Shapiro/Stonebraker/Wood,
SIGMOD 84

46
Hash-Join
Original Relations
Partitions
OUTPUT
1
1
2
INPUT
2
hash function hc
. . .
B-1
B-1
B main memory buffers
Disk
Disk
Partitions of R S
Hash table for partition Ri (k lt B-1 pages)
hash
fn
hf
hf
Join Result
Output buffer
Input buffer for Si
B main memory buffers
Disk
47
Symmetric Hash Join
Mikillineni Su, TOSE 88 Wilschut Apers,
PDIS 91

Pipelining, in-core variant
Build and probe symmetrically
Correctness Each output tuple generated when
its last-arriving component appears
Can be extended to out-of-core case
Tukwila Ives HaLevy, SIGMOD 99
Xjoin Spill and read partitions multiple times
Correctness guaranteed by timestamping tuples and
partitions
Urhan Franklin, DEBull 00

48
Relational Query Engines
49
A Basic SQL primer
SELECT DISTINCT ltoutput expressionsgt FROM
lttablesgt WHERE ltpredicatesgt GROUP BY
ltgb-expressiongt HAVING lth-predicatesgt ORDER
BY ltexpressiongt

Join tables in FROM clause
applying predicates in WHERE clause
If GROUP BY, partition results by GROUP
And maintain aggregate output expressions per
group
Delete groups that dont satisfy HAVING clause
If ORDER BY, sort output accordingly

50
Examples

Single-table S-F-W
DISTINCT, ORDER BY
Multi-table S-F-W
And self-join
Scalar output expressions
Aggregate output expressions
With and without DISTINCT
Group By
Having
Nested queries
Uncorrelated and correlated

51
A Dopey Query Optimizer
spredicates

For each S-F-W query block
Create a plan that
Forms the cartesian product of the FROM clause
Applies the WHERE clause
Incredibly inefficient
Huge intermediate results!
Then, as needed
Apply the GROUP BY clause
Apply the HAVING clause
Apply any projections and output expressions
Apply duplicate elimination and/or ORDER BY

?

tables
52
An Oracular Query Optimizer

For each possible correct plan
Run the plan (infinitely fast)
Measure its performance in reality
Pick the best plan, and run it in reality

53
A Standard Query Optimizer

Three aspects to the problem
Legal plan space (transformation rules)
Cost model
Search Strategy

54
Plan Space

Many legal algebraic transformations, e.g.
Cartesian product followed by selection can be
rewritten as join
Join is commutative and associative
Can reorder the join tree arbitrarily
NP-hard to find best join tree in general
Selections should (usually) be pushed down
Projections can be pushed down
And physical choices
Choice of Access Methods
Choice of Join algorithms
Taking advantage of sorted nature of some streams
Complicates Dynamic Programming, as well see

55
Cost Model Selectivity Estimation

Cost of a physical operator can be modeled fairly
accurately
E.g. number of random and sequential I/Os
Requires metadata about input tables
Number of rows (cardinality)
Bytes per tuple (physical schema)
In a query pipeline, metadata on intermediate
tables is trickier
Cardinality?
Requires selectivity (COUNT) estimation
Wet-finger estimates
Histograms, joint distributions and other
summaries
Sampling

56
Search Strategy

Dynamic Programming
Used in most commercial systems
IBMs System R Selinger, et al. SIGMOD 79
Top-Down
Branch and bound with memoization
Exodus, Volcano Cascades Graefe, SIGMOD 87,
ICDE 93, DEBull 95
Used in a few commercial systems (Microsoft SQL
Server, especially)
Randomized
Simulated Annealing, etc. Ioannidis Kang
SIGMOD 90

57
Dynamic Programming

Use principle of optimality
Any subtree of the optimal plan is itself optimal
for its sub-expression
Plans enumerated in N passes (if N relations
joined)
Pass 1 Find best 1-relation plan for each
relation.
Pass 2 Find best way to join result of each
1-relation plan (as outer) to another relation.
(All 2-relation plans.)
Pass N Find best way to join result of a
(N-1)-relation plan (as outer) to the Nth
relation. (All N-relation plans.)
This gives all left-deep plans. Generalization
is easy
A wrinkle physical properties (e.g. sort orders)
violate principle of optimality!
Use partial-order dynamic programming
I.e. keep undominated plans at each step --
optimal for each setting of the physical
properties (each interesting order)

58
Relational Architecture Review
Query Parsing and Optimization
Query Executor
Access Methods
Buffer Management
Disk Space Management
DB
59
Text Search
60
Information Retrieval

A research field traditionally separate from
Databases
Goes back to IBM, Rand and Lockheed in the 50s
G. Salton at Cornell in the 60s
Lots of research since then
Products traditionally separate
Originally, document management systems for
libraries, government, law, etc.
Gained prominence in recent years due to web
search
Today simple IR techniques
Show similarities to DBMS techniques you already
know

61
IR vs. DBMS

Seem like very different beasts
Under the hood, not as different as they might
seem
But in practice, you have to choose between the 2

IR DBMS
Imprecise Semantics Precise Semantics
Keyword search SQL
Unstructured data format Structured data
Read-Mostly. Add docs occasionally Expect reasonable number of updates
Page through top k results Generate full answer
62
IRs Bag of Words Model

Typical IR data model
Each document is just a bag of words (terms)
Detail 1 Stop Words
Certain words are considered irrelevant and not
placed in the bag
e.g. the
e.g. HTML tags like ltH1gt
Detail 2 Stemming
Using English-specific rules, convert words to
their basic form
e.g. surfing, surfed --gt surf
Detail 3 we may decorate the words with other
attributes
E.g. position, font info, etc.
Not exactly bag of words after all

63
Boolean Text Search

Find all documents that match a Boolean
containment expression
Windows AND (Glass OR Door) AND NOT
Microsoft
Note query terms are also filtered via stemming
and stop words
When web search engines say 10,000 documents
found, thats the Boolean search result size.

64
Text Indexes

When IR folks say index or indexing
Usually mean more than what DB people mean
In our terms, both tables and indexes
Really a logical schema (i.e. tables)
With a physical schema (i.e. indexes)
Usually not stored in a DBMS
Tables implemented as files in a file system

65
A Simple Relational Text Index

Create and populate a table
InvertedFile(term string, docID int64)
Build a B-tree or Hash index on
InvertedFile.term
May be lots of duplicate docIDs per term
Secondary index list compression per term
possible
This is often called an inverted file or
inverted index
Maps from words -gt docs
whereas normal files map docs to the words in the
doc (?!)
Can now do single-word text search queries

66
Handling Boolean Logic

How to do term1 OR term2?
Union of two docID sets
How to do term1 AND term2?
Intersection (ID join) of two DocID sets!
How to do term1 AND NOT term2?
Set subtraction
Also a join algorithm
How to do term1 OR NOT term2
Union of term1 and NOT term2.
Not term2 all docs not containing term2.
Yuck!
Usually forbidden at UI/parser
Refinement what order to handle terms if you
have many ANDs/NOTs?

67
Boolean Search in SQL
Windows AND (Glass OR Door) AND NOT
Microsoft

(SELECT docID FROM InvertedFile WHERE word
window INTERSECT SELECT docID FROM
InvertedFile WHERE word glass OR word
door)EXCEPTSELECT docID FROM InvertedFile
WHERE wordMicrosoftORDER BY magic_rank()
Really theres only one query (template) in IR
Single-table selects, UNION, INTERSECT, EXCEPT
Note that INTERSECT is a shorthand for equijoin
on a key
Often theres only one query plan in the system,
too!
magic_rank() is the secret sauce in the search
engines

68
Fancier Phrases and Near

Suppose you want a phrase
E.g. Happy Days
Add a position attribute to the schema
InvertedFile (term string, docID int64, position
int)
Index on term
Enhance join condition in query
Cant use INTERSECT syntax, but query is nearly
the same
SELECT I1.docID FROM InvertedFile I1,
InvertedFile I
WHERE I1.word HAPPY AND I2.word DAYS
AND I1.docID I2.docID AND I2.position -
I1.position 1ORDER BY magic_rank()
Can relax to term1 NEAR term2
Position lt k off

69
Classical Document Ranking

TF ? IDF (Term Freq. ? Inverse Doc Freq.)
For each term t in the query
QueryTermRank occurrences of t in q
TF ?
log((total docs)/(docs with this term)) IDF
? normalization-factor
For each doc d in the boolean result
DocTermRank occurrences of t in d
TF
? log((total docs)/(docs with this term))
IDF ? normalization-factor
Rank DocTermRankQueryTermRank
Requires more to our schema
InvertedFile (term string, docID int64, position
int, DocTermRank float)
TermInfo(term string, numDocs int)
Can compress DocTermRank non-relationally
This basically works fine for raw text
There are other schemes, but this is the standard

70
Some Additional Ranking Tricks

Phrases/Proximity
Ranking function can incorporate position
Query expansion, suggestions
Can keep a similarity matrix on terms, and
expand/modify peoples queries
Document expansion
Can add terms to a doc
E.g. in anchor text of refs to the doc
Not all occurrences are created equal
Mess with DocTermRank based on
Fonts, position in doc (title, etc.)

71
Hypertext Ranking

Also factor in graph structure
Social Network Theory (Citation Analysis)
Hubs and Authorities (Clever), PageRank
(Google)
Intuition recursively weighted in-degrees,
out-degrees
Math eigenvector computation
PageRank sure seems to help
Though word on the street is that other factors
matter as much
Anchor text, title/bold text, etc.

72
Updates and Text Search

Text search engines are designed to be
query-mostly
Deletes and modifications are rare
Can postpone updates (nobody notices, no
transactions!)
Updates done in batch (rebuild the index)
Cant afford to go offline for an update?
Create a 2nd index on a separate machine
Replace the 1st index with the 2nd
Can do this incrementally with a level of
indirection
So no concurrency control problems
Can compress to search-friendly,
update-unfriendly format
For these reasons, text search engines and DBMSs
are usually separate products
Also, text-search engines tune that one SQL query
to death!
The benefits of a special-case workload.

73
Architectural Comparison
Search String Modifier
Ranking Algorithm

The Query

Simple DBMS
The Access Method
OS
Buffer Management
Disk Space Management
Concurrencyand RecoveryNeeded
DB
DBMS
Search Engine
74
Revisiting Our IR/DBMS Distinctions

Data Modeling Query Complexity
DBMS supports any schema queries
Requires you to define schema
Complex query language (hard for folks to learn)
Multiple applications at output
RowSet API (cursors)
IR supports only one schema query
No schema design required (unstructured text)
Trivial query language
Single application behavior
Page through output in rank order, ignore most of
output
Storage Semantics
DBMS online, transactional storage
IR batch, unreliable storage

75
Distribution Parallelism
76
Roots

Distributed QP vs. Parallel QP
Distributed QP envisioned as a k-node intranet
for k 10
Sound old-fashioned? Think of multiple hosting
sites (e.g. one per continent)
Parallel QP grew out of DB Machine research
All in one room, one administrator
Parallel DBMS architecture options
Shared-Nothing
Shared-Everything
Shared-Disk
Shared-nothing is most general, most scalable

77
Distributed QP Semi-Joins
Bernstein/Goodman 79

Main query processing issue in distributed DB
lit use semi-joins
R S pR(R S)
Observe that R S ? (R p(S)) S
Assume each table lives at one site, R is bigger.
To reduce communication
Ship Ss join columns to Rs site, do semijoin
there, ship result to Ss site for the join
Notes
Im sloppy about dups in my defns above
Semi-joins arent always a win
Extra cost estimation task for a distributed
optimizer

78
Bloom Joins
Babb, TODS 79

A constant optimization on semi-joins
Idea (R p(S)) S is redundant
Semi-join can safely return false hits from R
Rather than shipping p(S), ship a superset
A particular kind of lossy set compression
allowed
Bloom Filter (B. Bloom, 1970)
Hash each value in a set via k independent hash
functions onto an array of n bits
Check membership correspondingly
By tuning k and n, can control false hit rate
Rediscovered recently in web lit, some new
wrinkles (Mitzenmachers compressed B.F.s,
Rheas attenuated B.F.s)

79
Sideways Information Passing

These ideas generalize more broadly

Set o Stuff
CostlySet Generator
80
Sideways Information Passing

These ideas generalize more broadly
E.g. magic sets rewriting in datalog SQL
Tricky to do optimally in those settings, but
wins can be very big

Set o Stuff
81
Parallelism 101
See DeWitt Gray, CACM 92

Pipelined vs. Partitioned
Pipelined typically inter-operator
Nominal benefits in a dataflow
Partition typically intra-operator
E.g. hash join or sort using k nodes
Speedup Scaleup
Speedup xold_time/new_time
Ideal linear
Scaleup small_sys_elapsed_small_problem /
big_sys_elapse_big_problem
Ideal 1
Transaction scaleup N times as many TPC-Cs for
N machines
Batch scaleup N times as big a DB for a query on
N machines

82
Impediments to Good Parallelism

Startup overheads
Amortized for big queries
Interference
usually the result of unpredictable communication
delays (comm cost, empty pipelines)
Skew
Of these, skew is the real issue in DBs
Embarrassingly parallel
I.e. it works

83
Data Layout

Horizontal Partitioning
For each table, assign rows to machines by some
key
Or assign arbitrarily (round-robin)
Vertical Partitioning
Sort table, and slice off columns
Usually not a parallelism trick
But nice for processing queries on read-mostly
data (projection is free!)

84
Intra-Operator Parallelism

E.g. for Hash Join
Every site with a horizontal partition of either
R or S fires off a scan thread
Every storage site reroutes its data among join
nodes based on hash of the join column(s)
Upon receipt, each site does local hash join

85
Recall Exchange!
86
Skew Handling

Skew happens
Even when hashing? Yep.
Can pre-sample and/or pre-summarize data to
partition better
Solving skew on the fly is harder
Need to migrate accumulated dataflow state
FLuX Fault-Tolerant, Load-balancing eXchange

87
In Current Architectures

All DBMSs can run on shared memory, many on
shared-nothing
The high end belongs to clusters
The biggest web-search engines run on clusters
(Google, Inktomi)
And use pretty textbook DB stuff for Boolean
search
Fun tradeoffs between answer quality and
availability/management here (the Inktomi story)

88
Precomputations
89
Views and Materialization

A view is a logical table
A query with a name
In general, not updatable
If to be used often, could be materialized
Pre-compute and/or cache result
Could even choose to do this for common query
sub-expressions
Neednt require a DBA to say this is a view

90
Challenges in Materialized Views

Three main issues
Given a workload, which views should be
materialized
Given a query, how can mat-views be incorporated
into query optimizer
As base tables are updated, how can views be
incrementally maintained?
See readings book, Gupta Mumick

91
Precomputation in IR

Often want to save results of common queries
E.g. no point re-running Britney Spears or
Harry Potter as Boolean queries
Can also use as subquery results
E.g. the query Harry Potter Loves Britney
Spears can use the Harry Potter and Britney
Spears results
Constrained version of mat-views
No surprise -- constrained relational workload
And consistency of matview with raw tables is not
critical, so maintenance not such an issue.

92
Precomputed Aggregates

Aggregation queries work on numerical sets of
data
Math tricks apply here
Some trivial, some fancy
Theme replace the raw data with small
statistical summaries, get approximate results
Histograms, wavelets, samples, dependency-based
models, random projections, etc.
Heavily used in query optimizers for selectivity
estimation (a COUNT aggregate)
Spate of recent work on approximate query
processing for AVG, SUM, etc.
Garofalakis, Gehrke, Rastogi tutorial, SIGMOD
02