Title: Query Processing and Networking Infrastructures
1Query Processing and Networking Infrastructures
- Day 1 of 2
- Joe Hellerstein
- UC Berkeley
- Septemer 20, 2002
2Two Goals
- Day 1 Primer on query processing
- Targeted to networking/OS folk
- Bias systems issues
- Day 2 Seed some cross-fertilized research
- Especially with networking
- Thesis dataflow convergence
- query processing and routing
- Clearly other resonances here
- Dataflow HW architectures
- Event-based systems designs
- ML and Control Theory
- Online Algorithms
3(Sub)Space of Possible Topics
Distributed Federated QP
TransactionalStorage Networking
Active DBs (Trigger Systems)
TraditionalRelational QPOptimization
Execution
Data Model Query LanguageDesign
Parallel QP
NFNF Data Models (OO, XML, Semistructured)
Adaptive QP
Online and Approximate QP
Indexing
Data Reduction
Data Streams Continuous Queries
Visual Querying Data Visualization
Media Queries,Feature Extraction Similarity
Search
Compression
Statistical Data Analysis(Mining)
Boolean Text Search
Traditional TextRanking
HypertextRanking
4Likely Topics Here
TransactionalStorage Networking
Distributed Federated QP
Active DBs (Trigger Systems)
TraditionalRelational QPOptimization
Execution
Data Model Query LanguageDesign
Parallel QP
NFNF Data Models (OO, XML, Semistructured)
Adaptive QP
Online and Approximate QP
Indexing
Data Reduction
Data Streams Continuous Queries
Visual Querying Data Visualization
Media Queries,Feature Extraction Similarity
Search
Compression
Statistical Data Analysis(Mining)
Boolean Text Search
Traditional TextRanking
HypertextRanking
5Plus Some Speculative Ones
Distributed Federated QP
TraditionalRelational QPOptimization
Execution
ContentRouting
IndirectionArchitectures
Parallel QP
Adaptive QP
Online and Approximate QP
Indexing
Peer-to-PeerQP
Data Streams Continuous Queries
SensornetQP
NetworkMonitoring
Boolean Text Search
Traditional TextRanking
6Outline
- Day 1 Query Processing Crash Course
- Intro
- Queries as indirection
- How do relational databases run queries?
- How do search engines run queries?
- Scaling up cluster parallelism and distribution
- Day 2 Research Synergies w/Networking
- Queries as indirection, revisited
- Useful (?) analogies to networking research
- Some of our recent research at the seams
- Some of your research?
- Directions and collective discussion
7Getting Off on the Right Foot
8Roots database and IR research
- Top-down traditions (applications)
- Usually begins with semantics and models.
- Common Misconceptions
- Query processing Oracle or Google.
- Need not be so heavyweight or monolithic! Many
reusable lessons within - IR search and DB querying are fundamentally
different - Very similar from a query processing perspective
- Many similarities in other data models as well
- Querying is a synchronous, interactive process.
- Triggers, rules and "continuous queries" not so
different from plain old queries.
9So well go bottom-up
- Focus on resuable building blocks
- Attempt to be language- and model-agnostic
- illustrate with various querying scenarios
10Confession Two Biases
- Relational query engines
- Most mature and general query technology
- Best documented in the literature
- Conceptually general enough to capture most all
other models/schemes - Everybody does web searches
- So its both an important app, and an inescapable
usage bias we carry around - It will inform our discussion. Shouldnt skew it
- Lots of other query systems/languages you can
keep in mind as we go - LDAP, DNS, XSL/Xpath/XQuery, Datalog
11What Are Queries For? I
- Obvious answer search and analysis over big data
sets - Search select data of interest
- Boolean expressions over content
- sometimes with an implicit ordering on results
- Analysis construct new information from base
data. - compute functions over each datum
- concatenate related records (join)
- partition into groups, summarize (aggregates)
- aside Mining vs. Querying? As a rule of
thumb, think of mining as WYGIWIGY. - Not the most general, powerful answer
12What Are Queries For? II
- Queries bridge a (large!) level of indirection
- Declarative programming what you want, not how
to get it - Easy (er) to express
- Allows the how to change under the covers
- A critical issue!
- Not just for querying
- Method invocation, data update, etc
?? !!
13Motivation for this Indirection
- Critical when rates of change differ across
layers - In particular, when dapp/dt ltlt denvironment/dt
- E.g. DB apps are used for years, decades (!!)
- E.g. networked env high rates of change (??)
- DB lit calls this data independence
14Data Independence Background
- Bad Old Days
- Hierarchical and Network (yep!) data models
- Nesting pointers mean that apps explicitly
traverse data, become brittle when data layouts
change - Apps with persistent data have slow dapp/dt
- And the database environments change faster!
- Logical changes to representation (schema)
- Physical changes in storage (indexes, layouts,
HW) - DBs often shared by multiple apps!
- In B.O.D., all apps had to be rewritten on change
15Its a SW Engineering Thing
- Analogy imagine if your C structs were to
survive for decades - youd keep them very simple
- encapsulation to allow future mods
- Similar Analogy to NWs
- protocol simplicity is good
- soft state is good (discourages hardcoded refs to
transient resources) - But the fun systems part follows directly
- Achieve the goal w/respectable performance over a
dynamic execution environment
16Codds Data Independence
- Ted Codd, IBM c. 1969 and forward
- Turing award 1981
- Two layers of indirection
Applications
Spanned by views and query rewriting
Logical Independence
Logical Representation(schema)
Spanned by queryoptimization andexecution
Physical Independence
Physical Representation (storage)
17A More Architectural Picture
Bridges logical independence
Declarative queryover views
Query Rewriter
Bridges physical independence
Query Processor
Declarative queryover base tables
Optimizer
Query Plan (Procedural)
Executor
N.B. This classical QParchitecture raises
someproblems. To be revisited!
IteratorAPI
Access Methods
18Access Methods Indexing
19Access Methods
- Base data access layer
- Model Data stored in unordered collections
- Relations, tables, one type per collection
- Interface iterators
- Open(predicate) -gt cursor
- Usually simple predicates attribute op constant
- op usually arithmetic (lt, gt, ), though well see
extensions (e.g. multi-d ops) - Next(cursor) -gt datum (of known type)
- Close(cursor)
- Insert(datum of correct type)
- Delete(cursor)
20Typical Access Methods
- Heap files
- unordered array of records
- usually sequential on disk
- predicates just save cross-layer costs
- Traditional Index AMs
- B-trees
- actually, B-trees all data at leaves
- Can scan across leaves for range search
- predicates (lt,gt,, between) result in fewer I/Os
- random I/Os (at least to find beginning of range)
- Linear Hash index
- Litwin 78. Supports equality predicates only.
- This is it for IR and standard relational DBs
- Though when IR folks say indexing, they
sometimes mean all of query processing
21Primary Secondary Indexes
Directory
Directory
Data
(key, ptr) pairs
Data
22Primary Secondary Indexes
Directory
(key, ptr) pairs
23An Exotic Forest of Search Trees
- Multi-dimensional indexes
- For geodata, multimedia search, etc.
- Dozens! E.g. R-tree family, disk-based
Quad-Trees, kdB-trees - And of course linearizations with B-trees
- Path indexes
- For XML and OO path queries
- E.g. Xfilter
- Etc.
- Lots of one-off indexes, often many per workload
- No clear winners here
- Extensible indexing scheme would be nice
24Generalized Search Trees (GiST)
Hellerstein et al., VLDB 95
- What is a (tree-based) DB index? Typically
- A clustering of data into leaf blocks
- Hierarchical summaries (subtree predicates --
SPs) for pointers in directory blocks
25Generalized Search Trees (GiST)
- Can realize that abstraction with simple
interface - User registers opaque SP objects with a few
methods - Consistent(q, p) should query q traverse
subtree? - Penalty(d, p) how bad is it to insert d below p
- Union (p1, p2) form SP that includes p1, p2
- PickSplit(p1, , pn) partition SPs into 2
- Tree maintenance, concurrency, recovery all
doable under the covers - Covers many popular multi-dimensional indexes
- Most of which had no concurrency/recovery story
- http//gist.cs.berkeley.edu
26Some Additional Indexing Tricks
ONeil/Quass, SIGMOD 97
- Bitmap indexing
- Many matches per value in (secondary) index?
Rather than storing pointers to heap file in
leaves, store a bitmap of matches in a (sorted)
heap file. - Only works if file reorg is infrequent
- Can make intersection, COUNT, etc. quicker during
query processing - Can mix/match bitmaps and lists in a single index
- Works with any (secondary) index with duplicate
matches - Vertical Partitioning / Columnar storage
- Again, for sorted, relatively static files
- Bit-slice indexes
27Query Processing Dataflow Infrastructures
28Dataflow Infrastructure
- Dataflow abstraction is very simple
- box-and-arrow diagrams
- (typed) collections of objects flow along edges
- Details can be tricky
- Push or Pull?
- More to it than that
- How do control-flow anddataflow interact?
- Where does the data live?
- Dont want to copy data
- If passing pointers, where doesthe real data
live?
29Iterators
- Most uniprocessor DB engines use iterators
- Open() -gt cursor
- Next(cursor) -gt typed record
- Close(cursor)
- Simple and elegant
- Control-flow and dataflow coupled
- Familiar single-threaded, procedure-call API
- Data refs passed on stack, no buffering
- Blocking-agnostic
- Works w/blocking ops -- e.g. Sort
- Works w/pipelined ops
- Note well-behaved iterators come up for air
in inner loops - E.g. for interrupt handling
g
f
S
R
30Where is the In-Flight Data?
- In standard DBMS, raw data lives in disk format,
in shared Buffer Pool - Iterators pass references to BufPool
- A tuple slot per iterator input
- Never copy along edges of dataflow
- Join results are arrays of refs to base tables
- Operators may pin pages in BufPool
- BufPool never replaces pinned pages
- Ops should release pins ASAP (esp. across Next()
calls!!) - Some operators copy data into their internal
state - Canspill this state to private disk space
31Weaknesses of Simple Iterators
- Evolution of uniprocessor archs to parallel archs
- esp. shared-nothing clusters
- Opportunity for pipelined parallelism
- Opportunity for partition parallelism
- Take a single box in the dataflow, and split it
across multiple machines - Problems with iterators in this environment
- Spoils pipelined parallelism opportunity
- Polling (Next()) across the network is
inefficient - Nodes sit idle until polled, and during comm
- A blocking producer blocks its consumer
- But would like to keep iterator abstraction
- Especially to save legacy query processor code
- And simplify debugging (single-threaded,
synchronous)
32Exchange
Graefe, SIGMOD 90
- Encapsulate partition parallelism asynchrony
- Keep the iterator API between ops
- Exchange operator partitions input data by
content - E.g. join or sort keys
- Note basic architectural idea!
- Encapsulate dataflowtricks in operators,
leavinginfrastructure untouched - Well see this again next week, e.g. in Eddies
33Exchange Internals
- Really 2 operators, XIN and XOUT
- XIN is top of a plan, and pulls, pushing
results to XOUT queue - XOUT spins on its local queue
- One thread becomes two
- Producer graph XIN
- Consumer graph XOUT
- Routing table/fn in XINsupports partition
parallelism - E.g. for sort, join, etc.
- Producer and consumer see iterator API
- Queue thread barrier turns NW-based push into
iterator-style pull
XOUT
XIN
route
Exchange
34Exchange Benefits?
- Remember Iterator limitations?
- Spoils pipelined parallelism opportunity
- solved by Exchange thread boundary
- Polling (Next()) across the network is
inefficient - Solved by XIN pushing to XOUT queue
- A blocking producer blocks its consumer
- Still a problem!
35Exchange Limitations
- Doesnt allow consumer work to overlap w/blocking
producers - E.g. streaming data sources, events
- E.g. sort, some join algs
- Entire consumer graph blocks if XOUT queue empty
- Control flow coupled to dataflow, so XOUT wont
return without data - Queue is encapsulated from consumer
- But
- Note that exchange model is fine for most
traditional DB Query Processing - May need to be extended for new settings
36Fjords
Madden/Franklin, ICDE 01
- Thread of control per operator
- Queues between each operator
- Asynch or synch calls
- Can do asynch poll-and-yield iteration in each
operator (for both consumer and producer) - Or can do synchronous get_next iteration
- Can get traditional behavior if you want
- Synch polls queue of size 1
- ? Iterators
- Synch consumer, asynch producer
- Exchange
- Asynch calls solve the blocking problem of
Exchange
iterator
exchange
37Fjords
- Disadvantages
- Lots of threads
- Best done in an event-programming style, not OS
threads - Operators really have to come up for air
(yield) - Need to write your own scheduler
- Harder to debug
- But
- Maximizes flexibility for operators at the
endpoints - Still provides a fairly simple interface for
operator-writers
38Basic Relational Operators and Implementation
39Relational Algebra Semantics
- Selection sp(R)
- Returns all rows in R that satisfy p
- Projection pC(R)
- Returns all rows in R projected to columns in C
- In strict relational model, remove duplicate rows
- In SQL, preserve duplicates (multiset semantics)
- Cartesian Product R ? S
- Union R ? S Difference R S
- Note R, S must have matching schemata
- Join R p S sp(R ? S)
- Missing Grouping Aggregation, Sorting
40Operator Overview Basics
- Selection
- Typically free, so pushed down
- Often omitted from diagrams
- Projection
- In SQL, typically free, so pushed down
- No duplicate elimination
- Always pass the minimal set of columns downstream
- Typically omitted from diagrams
- Cartesian Product
- Unavoidable nested loop to generate output
- Union
- Concat, or concat followed by dup. elim.
41Operator Overview, Cont.
- Unary operators Grouping Sorting
- Grouping can be done with hash or sort schemes
(as well see) - Binary matching Joins/Intersections
- Alternative algorithms
- Nested loops
- Loop with index lookup (Index N.L.)
- Sort-merge
- Hash Join
- Dont forget have to write as iterators
- Every time you get called with Next(), you adjust
your state and produce an output record
42Unary External Hashing
Bratbergsengen, VLDB 84
- E.g. GROUP BY, DISTINCT
- Two hash functions, hc (coarse) and hf (fine)
- Two phases
- Phase 1 for each tuple of input, hash via hc
into a spill partition to be put on disk - B-1 blocks of memory used to hold output buffers
for writing a block at a time per partition
43Unary External Hashing
- Phase 2 for each partition, read off disk and
hash into a main-memory hashtable via hf - For distinct, when you find a value already in
hashtable, discard the copy - For GROUP BY, associate some agg state (e.g.
running SUM) with each group in the hash table,
and maintain
44External Hashing Analysis
- To utilize memory well in Phase 2, would like
each partition to be B blocks big - Hence works in two phases when B gt ?R
- Same req as external sorting!
- Else can recursively partition the partitions in
Phase 2 - Can be made to pipeline, to adapt nicely to small
data sets, etc.
45Hash Join (GRACE)
Fushimi, et al., VLDB 84
- Phase 1 partition each relation on the join key
with hc, spilling to disk - Phase 2
- build each partition of smaller relation into a
hashtable via hf - scan matching partition of bigger relation, and
for each tuple probe the hashtable via hf for
matches - Would like each partition of smaller relation to
fit in memory - So works well if B gt ?smaller
- Size of bigger is irrelevant!! (Vs. sort-merge
join) - Popular optimization Hybrid hash join
- Partition 0 doesnt spill -- it builds and
probes immediately - Partitions 1 through n use rest of memory for
output buffers - DeWitt/Katz/Olken/Shapiro/Stonebraker/Wood,
SIGMOD 84
46Hash-Join
Original Relations
Partitions
OUTPUT
1
1
2
INPUT
2
hash function hc
. . .
B-1
B-1
B main memory buffers
Disk
Disk
Partitions of R S
Hash table for partition Ri (k lt B-1 pages)
hash
fn
hf
hf
Join Result
Output buffer
Input buffer for Si
B main memory buffers
Disk
47Symmetric Hash Join
Mikillineni Su, TOSE 88 Wilschut Apers,
PDIS 91
- Pipelining, in-core variant
- Build and probe symmetrically
- Correctness Each output tuple generated when
its last-arriving component appears - Can be extended to out-of-core case
- Tukwila Ives HaLevy, SIGMOD 99
- Xjoin Spill and read partitions multiple times
- Correctness guaranteed by timestamping tuples and
partitions - Urhan Franklin, DEBull 00
48Relational Query Engines
49A Basic SQL primer
SELECT DISTINCT ltoutput expressionsgt FROM
lttablesgt WHERE ltpredicatesgt GROUP BY
ltgb-expressiongt HAVING lth-predicatesgt ORDER
BY ltexpressiongt
- Join tables in FROM clause
- applying predicates in WHERE clause
- If GROUP BY, partition results by GROUP
- And maintain aggregate output expressions per
group - Delete groups that dont satisfy HAVING clause
- If ORDER BY, sort output accordingly
50Examples
- Single-table S-F-W
- DISTINCT, ORDER BY
- Multi-table S-F-W
- And self-join
- Scalar output expressions
- Aggregate output expressions
- With and without DISTINCT
- Group By
- Having
- Nested queries
- Uncorrelated and correlated
51A Dopey Query Optimizer
spredicates
- For each S-F-W query block
- Create a plan that
- Forms the cartesian product of the FROM clause
- Applies the WHERE clause
- Incredibly inefficient
- Huge intermediate results!
- Then, as needed
- Apply the GROUP BY clause
- Apply the HAVING clause
- Apply any projections and output expressions
- Apply duplicate elimination and/or ORDER BY
?
tables
52An Oracular Query Optimizer
- For each possible correct plan
- Run the plan (infinitely fast)
- Measure its performance in reality
- Pick the best plan, and run it in reality
53A Standard Query Optimizer
- Three aspects to the problem
- Legal plan space (transformation rules)
- Cost model
- Search Strategy
54Plan Space
- Many legal algebraic transformations, e.g.
- Cartesian product followed by selection can be
rewritten as join - Join is commutative and associative
- Can reorder the join tree arbitrarily
- NP-hard to find best join tree in general
- Selections should (usually) be pushed down
- Projections can be pushed down
- And physical choices
- Choice of Access Methods
- Choice of Join algorithms
- Taking advantage of sorted nature of some streams
- Complicates Dynamic Programming, as well see
55Cost Model Selectivity Estimation
- Cost of a physical operator can be modeled fairly
accurately - E.g. number of random and sequential I/Os
- Requires metadata about input tables
- Number of rows (cardinality)
- Bytes per tuple (physical schema)
- In a query pipeline, metadata on intermediate
tables is trickier - Cardinality?
- Requires selectivity (COUNT) estimation
- Wet-finger estimates
- Histograms, joint distributions and other
summaries - Sampling
56Search Strategy
- Dynamic Programming
- Used in most commercial systems
- IBMs System R Selinger, et al. SIGMOD 79
- Top-Down
- Branch and bound with memoization
- Exodus, Volcano Cascades Graefe, SIGMOD 87,
ICDE 93, DEBull 95 - Used in a few commercial systems (Microsoft SQL
Server, especially) - Randomized
- Simulated Annealing, etc. Ioannidis Kang
SIGMOD 90
57Dynamic Programming
- Use principle of optimality
- Any subtree of the optimal plan is itself optimal
for its sub-expression - Plans enumerated in N passes (if N relations
joined) - Pass 1 Find best 1-relation plan for each
relation. - Pass 2 Find best way to join result of each
1-relation plan (as outer) to another relation.
(All 2-relation plans.) - Pass N Find best way to join result of a
(N-1)-relation plan (as outer) to the Nth
relation. (All N-relation plans.) - This gives all left-deep plans. Generalization
is easy - A wrinkle physical properties (e.g. sort orders)
violate principle of optimality! - Use partial-order dynamic programming
- I.e. keep undominated plans at each step --
optimal for each setting of the physical
properties (each interesting order)
58Relational Architecture Review
Query Parsing and Optimization
Query Executor
Access Methods
Buffer Management
Disk Space Management
DB
59Text Search
60Information Retrieval
- A research field traditionally separate from
Databases - Goes back to IBM, Rand and Lockheed in the 50s
- G. Salton at Cornell in the 60s
- Lots of research since then
- Products traditionally separate
- Originally, document management systems for
libraries, government, law, etc. - Gained prominence in recent years due to web
search - Today simple IR techniques
- Show similarities to DBMS techniques you already
know
61IR vs. DBMS
- Seem like very different beasts
- Under the hood, not as different as they might
seem - But in practice, you have to choose between the 2
IR DBMS
Imprecise Semantics Precise Semantics
Keyword search SQL
Unstructured data format Structured data
Read-Mostly. Add docs occasionally Expect reasonable number of updates
Page through top k results Generate full answer
62IRs Bag of Words Model
- Typical IR data model
- Each document is just a bag of words (terms)
- Detail 1 Stop Words
- Certain words are considered irrelevant and not
placed in the bag - e.g. the
- e.g. HTML tags like ltH1gt
- Detail 2 Stemming
- Using English-specific rules, convert words to
their basic form - e.g. surfing, surfed --gt surf
- Detail 3 we may decorate the words with other
attributes - E.g. position, font info, etc.
- Not exactly bag of words after all
63Boolean Text Search
- Find all documents that match a Boolean
containment expression - Windows AND (Glass OR Door) AND NOT
Microsoft - Note query terms are also filtered via stemming
and stop words - When web search engines say 10,000 documents
found, thats the Boolean search result size.
64Text Indexes
- When IR folks say index or indexing
- Usually mean more than what DB people mean
- In our terms, both tables and indexes
- Really a logical schema (i.e. tables)
- With a physical schema (i.e. indexes)
- Usually not stored in a DBMS
- Tables implemented as files in a file system
65A Simple Relational Text Index
- Create and populate a table
- InvertedFile(term string, docID int64)
- Build a B-tree or Hash index on
InvertedFile.term - May be lots of duplicate docIDs per term
- Secondary index list compression per term
possible - This is often called an inverted file or
inverted index - Maps from words -gt docs
- whereas normal files map docs to the words in the
doc (?!) - Can now do single-word text search queries
66Handling Boolean Logic
- How to do term1 OR term2?
- Union of two docID sets
- How to do term1 AND term2?
- Intersection (ID join) of two DocID sets!
- How to do term1 AND NOT term2?
- Set subtraction
- Also a join algorithm
- How to do term1 OR NOT term2
- Union of term1 and NOT term2.
- Not term2 all docs not containing term2.
Yuck! - Usually forbidden at UI/parser
- Refinement what order to handle terms if you
have many ANDs/NOTs?
67Boolean Search in SQL
Windows AND (Glass OR Door) AND NOT
Microsoft
- (SELECT docID FROM InvertedFile WHERE word
window INTERSECT SELECT docID FROM
InvertedFile WHERE word glass OR word
door)EXCEPTSELECT docID FROM InvertedFile
WHERE wordMicrosoftORDER BY magic_rank() - Really theres only one query (template) in IR
- Single-table selects, UNION, INTERSECT, EXCEPT
- Note that INTERSECT is a shorthand for equijoin
on a key - Often theres only one query plan in the system,
too! - magic_rank() is the secret sauce in the search
engines
68Fancier Phrases and Near
- Suppose you want a phrase
- E.g. Happy Days
- Add a position attribute to the schema
- InvertedFile (term string, docID int64, position
int) - Index on term
- Enhance join condition in query
- Cant use INTERSECT syntax, but query is nearly
the same - SELECT I1.docID FROM InvertedFile I1,
InvertedFile I - WHERE I1.word HAPPY AND I2.word DAYS
AND I1.docID I2.docID AND I2.position -
I1.position 1ORDER BY magic_rank() - Can relax to term1 NEAR term2
- Position lt k off
69Classical Document Ranking
- TF ? IDF (Term Freq. ? Inverse Doc Freq.)
- For each term t in the query
- QueryTermRank occurrences of t in q
TF ?
log((total docs)/(docs with this term)) IDF
? normalization-factor - For each doc d in the boolean result
- DocTermRank occurrences of t in d
TF
? log((total docs)/(docs with this term))
IDF ? normalization-factor
- Rank DocTermRankQueryTermRank
- Requires more to our schema
- InvertedFile (term string, docID int64, position
int, DocTermRank float) - TermInfo(term string, numDocs int)
- Can compress DocTermRank non-relationally
- This basically works fine for raw text
- There are other schemes, but this is the standard
70Some Additional Ranking Tricks
- Phrases/Proximity
- Ranking function can incorporate position
- Query expansion, suggestions
- Can keep a similarity matrix on terms, and
expand/modify peoples queries - Document expansion
- Can add terms to a doc
- E.g. in anchor text of refs to the doc
- Not all occurrences are created equal
- Mess with DocTermRank based on
- Fonts, position in doc (title, etc.)
71Hypertext Ranking
- Also factor in graph structure
- Social Network Theory (Citation Analysis)
- Hubs and Authorities (Clever), PageRank
(Google) - Intuition recursively weighted in-degrees,
out-degrees - Math eigenvector computation
- PageRank sure seems to help
- Though word on the street is that other factors
matter as much - Anchor text, title/bold text, etc.
72Updates and Text Search
- Text search engines are designed to be
query-mostly - Deletes and modifications are rare
- Can postpone updates (nobody notices, no
transactions!) - Updates done in batch (rebuild the index)
- Cant afford to go offline for an update?
- Create a 2nd index on a separate machine
- Replace the 1st index with the 2nd
- Can do this incrementally with a level of
indirection - So no concurrency control problems
- Can compress to search-friendly,
update-unfriendly format - For these reasons, text search engines and DBMSs
are usually separate products - Also, text-search engines tune that one SQL query
to death! - The benefits of a special-case workload.
73Architectural Comparison
Search String Modifier
Ranking Algorithm
The Query
Simple DBMS
The Access Method
OS
Buffer Management
Disk Space Management
Concurrencyand RecoveryNeeded
DB
DBMS
Search Engine
74Revisiting Our IR/DBMS Distinctions
- Data Modeling Query Complexity
- DBMS supports any schema queries
- Requires you to define schema
- Complex query language (hard for folks to learn)
- Multiple applications at output
- RowSet API (cursors)
- IR supports only one schema query
- No schema design required (unstructured text)
- Trivial query language
- Single application behavior
- Page through output in rank order, ignore most of
output - Storage Semantics
- DBMS online, transactional storage
- IR batch, unreliable storage
75Distribution Parallelism
76Roots
- Distributed QP vs. Parallel QP
- Distributed QP envisioned as a k-node intranet
for k 10 - Sound old-fashioned? Think of multiple hosting
sites (e.g. one per continent) - Parallel QP grew out of DB Machine research
- All in one room, one administrator
- Parallel DBMS architecture options
- Shared-Nothing
- Shared-Everything
- Shared-Disk
- Shared-nothing is most general, most scalable
77Distributed QP Semi-Joins
Bernstein/Goodman 79
- Main query processing issue in distributed DB
lit use semi-joins - R S pR(R S)
- Observe that R S ? (R p(S)) S
- Assume each table lives at one site, R is bigger.
To reduce communication - Ship Ss join columns to Rs site, do semijoin
there, ship result to Ss site for the join - Notes
- Im sloppy about dups in my defns above
- Semi-joins arent always a win
- Extra cost estimation task for a distributed
optimizer
78Bloom Joins
Babb, TODS 79
- A constant optimization on semi-joins
- Idea (R p(S)) S is redundant
- Semi-join can safely return false hits from R
- Rather than shipping p(S), ship a superset
- A particular kind of lossy set compression
allowed - Bloom Filter (B. Bloom, 1970)
- Hash each value in a set via k independent hash
functions onto an array of n bits - Check membership correspondingly
- By tuning k and n, can control false hit rate
- Rediscovered recently in web lit, some new
wrinkles (Mitzenmachers compressed B.F.s,
Rheas attenuated B.F.s)
79Sideways Information Passing
- These ideas generalize more broadly
Set o Stuff
CostlySet Generator
80Sideways Information Passing
- These ideas generalize more broadly
- E.g. magic sets rewriting in datalog SQL
- Tricky to do optimally in those settings, but
wins can be very big
Set o Stuff
81Parallelism 101
See DeWitt Gray, CACM 92
- Pipelined vs. Partitioned
- Pipelined typically inter-operator
- Nominal benefits in a dataflow
- Partition typically intra-operator
- E.g. hash join or sort using k nodes
- Speedup Scaleup
- Speedup xold_time/new_time
- Ideal linear
- Scaleup small_sys_elapsed_small_problem /
big_sys_elapse_big_problem - Ideal 1
- Transaction scaleup N times as many TPC-Cs for
N machines - Batch scaleup N times as big a DB for a query on
N machines
82Impediments to Good Parallelism
- Startup overheads
- Amortized for big queries
- Interference
- usually the result of unpredictable communication
delays (comm cost, empty pipelines) - Skew
- Of these, skew is the real issue in DBs
- Embarrassingly parallel
- I.e. it works
83Data Layout
- Horizontal Partitioning
- For each table, assign rows to machines by some
key - Or assign arbitrarily (round-robin)
- Vertical Partitioning
- Sort table, and slice off columns
- Usually not a parallelism trick
- But nice for processing queries on read-mostly
data (projection is free!)
84Intra-Operator Parallelism
- E.g. for Hash Join
- Every site with a horizontal partition of either
R or S fires off a scan thread - Every storage site reroutes its data among join
nodes based on hash of the join column(s) - Upon receipt, each site does local hash join
85Recall Exchange!
86Skew Handling
- Skew happens
- Even when hashing? Yep.
- Can pre-sample and/or pre-summarize data to
partition better - Solving skew on the fly is harder
- Need to migrate accumulated dataflow state
- FLuX Fault-Tolerant, Load-balancing eXchange
87In Current Architectures
- All DBMSs can run on shared memory, many on
shared-nothing - The high end belongs to clusters
- The biggest web-search engines run on clusters
(Google, Inktomi) - And use pretty textbook DB stuff for Boolean
search - Fun tradeoffs between answer quality and
availability/management here (the Inktomi story)
88Precomputations
89Views and Materialization
- A view is a logical table
- A query with a name
- In general, not updatable
- If to be used often, could be materialized
- Pre-compute and/or cache result
- Could even choose to do this for common query
sub-expressions - Neednt require a DBA to say this is a view
90Challenges in Materialized Views
- Three main issues
- Given a workload, which views should be
materialized - Given a query, how can mat-views be incorporated
into query optimizer - As base tables are updated, how can views be
incrementally maintained? - See readings book, Gupta Mumick
91Precomputation in IR
- Often want to save results of common queries
- E.g. no point re-running Britney Spears or
Harry Potter as Boolean queries - Can also use as subquery results
- E.g. the query Harry Potter Loves Britney
Spears can use the Harry Potter and Britney
Spears results - Constrained version of mat-views
- No surprise -- constrained relational workload
- And consistency of matview with raw tables is not
critical, so maintenance not such an issue.
92Precomputed Aggregates
- Aggregation queries work on numerical sets of
data - Math tricks apply here
- Some trivial, some fancy
- Theme replace the raw data with small
statistical summaries, get approximate results - Histograms, wavelets, samples, dependency-based
models, random projections, etc. - Heavily used in query optimizers for selectivity
estimation (a COUNT aggregate) - Spate of recent work on approximate query
processing for AVG, SUM, etc. - Garofalakis, Gehrke, Rastogi tutorial, SIGMOD
02
93A Taste of Next Week
94Query Dataflows Meet NWs
- Some more presentation
- Indirection in space and time
- Thematic similarities, differences in NW/QP
- Revisit a NW classic through a DB lens
- Adaptive QP In Telegraph
- Eddies, Stems, FluX
- A taste of QP in Sensor Networks
- TinyDB (TAG), Directed Diffusion
- A taste of QP in p2p
- PIER project at Berkeley parallel QP over DHTs
- Presentations from MITers?
- Open Discussion
95Contact
- jmh_at_cs.berkeley.edu
- http//www.cs.berkeley.edu/jmh