Title: Exploiting Asynchronous IO using the Asynchronous Iterator Model
1Exploiting Asynchronous IO using the Asynchronous
Iterator Model
- Suresh Iyengar S. Sudarshan
- Santosh Kumar Raja Agrawal IIT
Bombay
Current affiliations Microsoft Hyderabad,
Guruji.com, SAP
2 Agenda
- AIO Background
- Exploiting AIO in query processing
- Asynchronous Iterator model
- Asynchronous Index Nested Loops Join
- Asynchronous versions of other operators
- Performance results
- Related Work
- Conclusion
3 IO Processing Traditional way
Application
Kernel
System call
Read ()
Initiate IO
Context switch
Read response
data
Application Blocked !
- CPU is idle most of the time waiting for an IO
completion
4 IO Processing Async. way
Application
Kernel
System call
Initiate IO
AIO Read ()
Read response
Notify
data
Do other work !!
5 IO Processing Async. Way
- Asynchronous approach
- Overlap of CPU and IO processing
- Application can generate multiple IO requests
- Allows IO subsystem to reorder access to data on
disk - Important in RAID environments
6 Asynchronous IO Interface
( File descriptor, offset, buffer, numBytes, )
Linux 2.6 kernel
aio_read ( aio structure) Request an AIO read operation
aio_error ( aio structure ) Check the status of an AIO request
lio_listio ( array of aio structures ) Initiate a list of AIO operations
- We use list AIO in our implementation
- Can initiate multiple IO read operations in one
system call
7 Handling AIO completion
- Signal-based handler
- A signal is generated on IO completion
- Callback using interrupts
- An interrupt is generated on IO completion
- Concurrent access to completion handler and
shared data structures in both of above methods - Polling
- Store IO requests in pending queue and poll
periodically for completion - Our experiments show polling beats
signal/interrupt based approach
Call completion handler
8 Demand-Driven Iterator
NLJ
scan
scan
Blocking call !
Table A
Table B
- Bottom level nodes perform operations such as
sequential scans or index scans. - Upper level nodes are join nodes or other
operator nodes such as sort or aggregate.
9Agenda
- AIO Background
- Exploiting AIO in query processing
- Asynchronous Iterator model
- Asynchronous Index Nested Loop (INL) Joins
- Asynchronous versions of other operators
- Performance results
- Related Work
- Conclusion
10 Asynchronous Iterator
NLJ
scan
scan
- I dont have the tuple available in the
memory !! - Issue AIO read operation
- Return LATER
Table A
Table B
Non- Blocking call !
11 Asynchronous Iterator Model (AIM)
- Allow a node to return a status LATER to the
parent - Instead of blocking for IO completion.
- The parent operator could
- Perform other work, such as fetching data from
another input - Simply return a LATER status to its parent node
- Or just loop, reinvoking the child operator till
it returns a tuple - E.g. root of the execution plan tree
- Exact action depends on operator
- Asynchronous versions of different operators
- Focus on Asynchronous Indexed Nested Loops join
12 Asynchronous INL Joins
- Original state of Indexed Nested Loops (INL) node
- Left and right subplans and qualifier lists
- Augmented state for async INL node
- An array of outer tuples each having a queue of
matching inner TIDs - AIO may have been issued for some already, others
later - A workqueue for outer slots which already have
AIO issued for their matching inner TIDS - An IO queue recording all pending AIO requests
made by the node - Used to poll for completion of AIO requests
13 Asynchronous INL Join (contd.)
- We divide the async INL join operations into two
stages - Stage 1 Fetch outer tuples and issues AIO
requests - Stage 2 Check for AIO completion, process AIO
results and return join results. - Stages are interleaved
- Stage 1 may be in progress for some tuples, and
Stage 2 for others
14 Asynchronous INL Join (contd.)
Stage 1
Fetch outer tuples
For each outer tuple
Find the matching inner TIDs for each outer tuple
Put the outer tuple in workqueue
Issue LIST AIO for matching inner TIDS of all
outer tuples in workqueue (subject to BATCH_SIZE)
15 Asynchronous INL Join (contd.)
- Rules
- Batch size
- BATCH_SIZE max number of outstanding AIO
requests - Why? OS limits, efficiency issues
- We set the MAX_BATCH_SIZE per node to 200 in our
experiments - Scale BATCH_SIZE in powers of 2 till
MAX_BATCH_SIZE so that async INL can output
tuples quickly at the onset - Case where outer tuple matches a large number of
inner tuples is handled appropriately - Keeping the AIO queue filled
- We issue further AIO requests (fetching outer
tuples as required) if 10 of earlier AIO
requests have completed
16 Asynchronous INL Join (contd.)
For each outer tuple in workqueue
Stage 2
Check if any matching inner TIDs are present in
memory
No
Present ?
Yes
- Remove that inner TID from outer tuples TID
array - Perform join and add to result
- if join result found break from loop
Update workqueue
Next page ..
17 Asynchronous INL Join (contd.)
Prev page..
Yes
Return resultto parent node
Any join results?
Back to start of Stage 2
Yes
No
No
Poll for AIO completion Is tuple found or parent
node cannot handle LATER
Is no outstanding outer tuples reached end of
outer tuple
Yes
No
tupStat END_OF_RESULT result NULL
tupstat LATERresult NULL
Return result and tupStat to parent node
18 Async. versions of other operators
- Async Sequential scan
- Check if next tuple is in the in-memory buffer
- If its present, return the tuple
- Else initiate an async read. Set tupStat LATER
and return - Out of order sequential scan
- Start returning the tuples of a particular
relation which are already there in the memory - even if out of order
- Concurrently, issue AIO for other tuples
19Async. versions of other operators
I can start the sorting of other input !
Merge Join
LATER
sort
sort
LATER
Seq scan
Seq scan
T1
T2
Initiate AIO read
20 Performance Results
- Experiments with TPC-H database with scale
factors of 1 and 10 in three different setups - Core 2 duo P4 with
- 1GB RAM and TPC-H - 1 GB database (single disk)
- 1GB RAM and TPC-H 10 GB database (single disk)
- 3.2GB RAM and TPC-H 10 GB database (4 disks /
RAID 10) - We use PostgreSQL 8.1.3 as the code base
- Compare it with our modified version of the same
code base, incorporating asynchronous iterator
model - with async INL and async seq. scan
21 Performance Results 1GB RAM
Query 1a select l_orderkey, l_quantity from
orders, lineitem where o_orderkeyl_orderkey
and l_orderkey1002 and l_linestatusF
TPCH 1 GB
TPCH 10 GB
22 Performance Results 1 GB RAM
Query 2a select l_orderkey,l_quantity from
orders,lineitem,customer where
o_orderkeyl_orderkey and o_custkeyc_custkey
and l_orderkey1002 and l_linestatusF
TPCH 10 GB
TPCH 1 GB
23Performance Results 1GB RAM
Query 2a Join of orders, lineitem and customer
with filter (TPCH 1GB )
Startup effect
24 Performance Results 1 GB RAM
Query 2b select l_orderkey,l_quantity from
myorders,lineitem,customer where
o_orderkeyl_orderkey and o_custkeyc_custkey
-- No tight selection
TPCH 1 GB
TPCH 10 GB
1GB RAM
25 Performance Results 3.2 GB RAID
Query 2a Join of orders, lineitem and customer
with filter
Query 1a Join of orders and lineitem with filter
TPC-H 10GB / 3.2GB RAM / 4 disks RAID10
26 Performance Results 3.2 GB RAID
Query 1b Join of myorders, lineitem
Query 2b Join of myorders, lineitem and customer
TPC-H 10GB / 3.2GB RAM / 4 disks RAID10
27 Performance Results
TPC-H Q12 select l_shipmode,sum(...) from
orders,lineitem where o_orderkey l_orderkey
and ltseveral selectiongt group by l_shipmode
order by l_shipmode
Original INL Async INL Gain
TPCH 1GB 1GB RAM 64.7 sec 48 sec 25
TPCH 10 GB 1GB RAM 687 sec 431 sec 37
TPCD 10GB RAID 10 4 disks, 3.2 GB RAM 164 sec 147 sec 10
28Related Work
- Graefes generalized spool iterator (Graefe
BTW03 )
- Pre-fetches multiple outer tuples
- Issue AIO for matching inner TIDS
- Can be replenished when empty or when one tuple
is joined
INL
Spool operator
Index lookup
scan
29Related Work
- AIO used in database products
- Microsoft SQL Server, IBM DB2, Oracle
- No public documentation on how these systems use
AIO - Asynchronous iteration for evaluating web queries
(R.Goldman and J. Widom SIGMOD 2000 ) - They report results only on web queries
30 Conclusion
- Proposed the Asynchronous Iterator Model (AIM)
- Presented asynchronous versions of INL and some
operators - Showed gains of over 50 in some cases
- AIM can be useful in web-service access and in
data integration systems like IBM DataJoiner - Future work
- Implementing async versions for index lookup, sub
plan, sort and merge operator - Performing async IO in the presence of ordering
constraints
31Thank YouQuestions ?
32Plans
- Query 1a
- Seq scan on lineitem, probe on orders
- Merge Join
- -gt Index Scan on orders
- -gt Sort lineitem
- -gt Seq Scan on lineitem
- Query 2a
- Nested Loop
- -gt Nested Loop
- -gt Seq Scan on lineitem
- -gt Index Scan on orders
- -gt Index Scan on customer
33Plans
- Query 2a Merge Join
- -gt Sort orders
- -gt Merge Join
- -gt Index Scan on orders
- -gt Sort on lineitem
- -gt Seq Scan on lineitem
- -gt Index Scan on customer
- Query 2b
- Nested Loop
- -gt Nested Loop
- -gt Seq Scan on lineitem
- -gt Index Scan on myorders
- -gt Index Scan on customer