Title: Yakham Ndiaye, Witold Litwin,
1AMOS-SDDS A Scalable Distributed Data Manager
for Windows Multicomputers
- Yakham Ndiaye, Witold Litwin,
- Yakham.Ndiaye, Witold.Litwin_at_dauphine.fr
- CERIA Université Paris IX Dauphine
- Tore Risch
- Tore.Risch_at_dis.uu.se
- Uppsala Universitet Dept. of Information Science
2AMOS-SDDS A Scalable Distributed Data Manager
for Windows Multicomputers
- A Scalable Distributed Data Structure
- - File that scales transparently for the
application in the distributed RAM of a
multicomputer. - AMOS-II DBMS
- - Amos II is an Object Relational DBMS with
external data sources capability. - Coupling SDDS and AMOS-II
- - for a scalable RAM file supporting database
queries.
3Multicomputers
- A collection of loosely coupled computers
- Share nothing architecture
- Message passing through high-speed net
(???100Mb/s) - Network multicomputers
- Use general purpose nets PCs
- Switched multicomputers
- Use a bus, or a switch
4SDDS
- New data structures specifically for
Multicomputers - Data are structured
- - records with keys
- parallel scans function shipping
- Data are on servers
- - waiting for access
- Overflowing servers split into new servers
- - appended to the file without informing the
clients - Queries come from multiple autonomous clients
- - Access initiators
- - Not using any centralized directory for access
computations - See for more http//ceria.dauphine.fr
5SGBD AMOS-II
- AMOS-II Active Mediating Object System
- A RAM database system.
- Declarative query language AMOSQL.
- External data sources capability.
- External program interfaces AMOS-II using
- - Call-level interface (call-in)
- - Foreign functions (call-out)
- See the AMOS-II page for more
- http//www.dis.uu.se/udbl/
6Coupling SDDS AMOS-II
- Client/Server System.
- Scalable RAM Database.
- Scalable distributed range partitioning
- Increased storage and processing capabilities.
- Unlimited Distributed RAM Storage
- Parallel / Distributed queries
- SDDS - Distributed RAM storage manager.
- - Communication platform.
- - Supports efficiently the key range
queries. - AMOS-II - Database query processor.
- - Import and store tuples locally into
AMOS-II. -
7Coupling SDDS AMOS-II
AMOS-SDDS overall Architecture
8The Hardware
- Six Pentium III 700 MHz with 256 MB of RAM
running Windows 2000 on a 100Mbit/s Ethernet
network. - One site is used as Client and the five other as
Servers - File scaled from 1 to 15 servers.
- We run many AMOS-SDDS servers at the same machine
(up to 3 per machine).
9Benchmark queries
- Benchmark data
- Table Person (SS, Name, City).
- Size 20,000 to 300,000 tuples of 25 bytes.
- 50 Cities.
- Random distribution.
- Benchmark query  couples of persons in the
same city  - Query 1, the file resides at a single AMOS-II.
- Query 2, the file resides at AMOS-SDDS.
- Count Join Count couples in the same city
- To determine the result transfer time to the
client - Join evaluation
- Multicast and Nested loop or Local index lookup.
- Measures
- - Speed-up Scale-up
10Server Query Processing
- E-strategy
- Data stay external to AMOS
- within the SDDS bucket
- Custom foreign functions perform the query
- I-strategy
- Data are dynamically imported into AMOS-II
- Possibly with the local index creation
- Deleted after the processing
- Good for joins
- AMOS performs the query
11Speed-up Â
File of 20,000 records, on AMOS-II and
distributed over 1 to 5 AMOS-SDDS servers with
I-strategy.
Elapsed time of Query 1
Elapsed time of Query 2 for I-Strategy
Elapsed time per tuple of Query 2 with I-strategy
12Scaling the file sizeÂ
File of 100,000 records, on AMOS-II or on
AMOS-SDDS, processed using with I-strategy over 5
servers.
Query 2 on AMOS-SDDS
Elapsed time of Query 1 on AMOS-II
Performance of AMOS-II, and of AMOS-SDDS for a
scaling file
13Discussion Â
- File of 20,000 records. For the nested loop, the
improvement ratio is 5.5 times(82). For the
index join, the improvement is about 1.4
times(29). - File of 100,000 records. For the nested loop, the
improvement ratio is 6.5 times(85). For the
index join, the improvement is about 1.7
times(41). - Better scale-up for AMOS-SDDS when scaling the
file size by factor of 5 - - For AMOS-II the nested loop elapsed time per
tuple increases by factor of 5, from 13.15 to
65.57ms(factor of 4.8 for AMOS-SDDS). For the
index join, by factor of 4.8, from 2.25 to
11.81ms(4.3 for AMOS-SDDS).
14Scaling the number of servers Â
Q1 AMOS-SDDS join Q2 AMOS-SDDS join with
count.
Time per tuple (extrapolated for AMOS-SDDS)
Expected time per tuple of join queries to
AMOS-SDDS
15Discussion Â
- The file scales to 300,000 tuples.
- Spreading from 1 to 15 AMOS-SDDS Servers.
- - Transparently for the application
- Results are extrapolated to 1 server per machine.
- - Basically, the CPU component of the elapsed
time is divided by 3 - The extrapolated time per tuple for AMOS-SDDS on
300,000-tuples file on 15 servers is 12.72ms.
Its 2.9 times better than with AMOS-II alone of
36.44 ms.
16Conclusion Â
- We have coupled an SDDS manager and a RAM DBMS
- AMOS-SDDS provides a scalable high-performance
data repository supporting database queries - An important goal studied by various researchers
in the past. - We have explored theoretically and experimentally
various complex design issues implementation
choices - In particular performance improve for larger
files with respect to AMOS-II alone - 2 times better than with AMOS-II alone 18.55 vs
36.44 ms - I-Strategy is more efficient for joins than
E-strategy - Local index on-the-fly creation outperforms the
nested loop evaluation - Despite index creation drop out cost
- Not always however, e.g., the counting (not
reported here)
17Future Work Â
- Other types of DBMS queries
- Client's scalable distributed query decomposer
- AMOS as unique local server storage manager.
- SD-AMOS prototype
- SDDS provides the scalable distributed
partitioning schema. - Server DBMS performs the splits.
- Client manages scalable query decomposition
execution. - The whole system generalizes the PDBMS
technology. - Static partitioning only.