Distributed Data Storage and Processing over Commodity Clusters - PowerPoint PPT Presentation

About This Presentation

Title:

Distributed Data Storage and Processing over Commodity Clusters

Description:

Sphere: Run-time middleware that supports simplified ... Two dual-core AMD CPU, 12GB RAM, 1TB single disk. Open Cloud Testbed. Example: Sorting a TeraByte ... – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 29

Provided by: gu13125

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Data Storage and Processing over Commodity Clusters

1
Distributed Data Storage and Processing over
Commodity Clusters
Sector Sphere
Yunhong Gu Univ. of Illinois at Chicago _at_Univ.
of Chicago, Feb. 17, 2009
2
What is Sector/Sphere?

Sector Distributed Storage System
Sphere Run-time middleware that supports
simplified distributed data processing.
Open source software, GPL, written in C.
Started since 2006, current version 1.18
http//sector.sf.net

3
Overview

Motivation
Sector
Sphere
Experimental studies
Future work

4
Motivation
Super-computer model Expensive, data IO
bottleneck
Sector/Sphere model Inexpensive, parallel data IO
5
Motivation
Parallel/Distributed Programming with MPI,
etc. Flexible and powerful. BUT complicated, no
data locality
Sector/Sphere model Clusters are a unity to the
developer, simplified programming interface, data
locality support from the storage layer. Limited
to certain data parallel applications.
6
Motivation
Systems for single data centers Requires
additional effort to locate and move data.
Sector/Sphere model Support wide-area data
collection and distribution.
7
Sector Distributed Storage System
Storage System Mgmt. Processing
Scheduling Service provider
System access tools App. Programming Interfaces
User account Data protection System Security
Security Server
Master
Client
SSL
SSL
Data
UDT Encryption optional
slaves
slaves
Storage and Processing
8
Sector Distributed Storage System

Sector stores files on the native/local file
system of each slave node.
Sector does not split files into blocks
Pro simple/robust, suitable for wide area
Con file size limit
Sector uses replications for better reliability
and availability
The master node maintains the file system
metadata. No permanent metadata is needed.
Topology aware

9
Sector Write/Read

Write is exclusive
Replicas are updated in a chained manner the
client updates one replica, and then this replica
updates another, and so on. All replicas are
updated upon the completion of a Write operation.
Read different replicas can serve different
clients at the same time. Nearest replica to the
client is chosen whenever possible.

10
Sector Tools and API

Supported file system operation ls, stat, mv,
cp, mkdir, rm, upload, download
Wild card characters supported
System monitoring sysinfo.
C API list, stat, move, copy, mkdir, remove,
open, close, read, write, sysinfo.

11
Sphere Simplified Data Processing

Data parallel applications
Data is processed at where it resides, or on the
nearest possible node (locality)
Same user defined functions (UDF) can be applied
on all elements (records, blocks, or files)
Processing output can be written to Sector files,
on the same node or other nodes
Generalized Map/Reduce

12
Sphere Simplified Data Processing
Input
Output
UDF
Input
Intermediate
UDF
Output
UDF
Input 1
Output
UDF
Input 2
13
Sphere Simplified Data Processing
for each file F in (SDSS datasets) for each
image I in F findBrownDwarf(I, )
SphereStream sdss sdss.init("sdss
files") SphereProcess myproc myproc-gtrun(sdss,"f
indBrownDwarf", ) myproc-gtread(result)
findBrownDwarf(char image, int isize, char
result, int rsize)
14
Sphere Data Movement

Slave -gt Slave Local
Slave -gt Slaves (Shuffle/Hash)
Slave -gt Client

15
Load Balance Fault Tolerance

The number of data segments is much more than the
number of SPEs. When an SPE completes a data
segment, a new segment will be assigned to the
SPE.
If one SPE fails, the data segment assigned to it
will be re-assigned to another SPE and be
processed again.
Detect and remove "fault" nodes.

16
Open Cloud Testbed

4 Racks in Baltimore (JHU), Chicago (StarLight
and UIC), and San Diego (Calit2)
10Gb/s inter-site connection on CiscoWave
1Gb/s inter-rack connection
Two dual-core AMD CPU, 12GB RAM, 1TB single disk

17
Open Cloud Testbed
18
Example Sorting a TeraByte

Data is split into small files, scattered on all
slaves
Stage 1 On each slave, an SPE scans local files,
sends each record to a bucket file on a remote
node according to the key, so that all buckets
are sorted.
Stage 2 On each destination node, an SPE sort
all data inside each bucket.

19
TeraSort
Binary Record 100 bytes
Stage 2 Sort each bucket on local node
10-byte
90-byte
Value
Key
Bucket-0
Bucket-0
Bucket-1
Bucket-1
10-bit
0-1023
Stage 1 Hash based on the first 10 bits
Bucket-1023
Bucket-1023
20
Performance Results TeraSort
Run time seconds Sector v1.16 vs Hadoop 0.17
Data Size Sphere Hadoop (3 replicas) Hadoop (1 replica)
UIC 300GB 1265 2889 2252
UIC StarLight 600GB 1361 2896 2617
UIC StarLight Calit2 900GB 1430 4341 3069
UIC StarLight Calit2 JHU 1.2TB 1526 6675 3702
21
Performance Results TeraSort

Sorting 1.2TB on 120 nodes
Hash vs. Local Sort 981sec 545sec
Hash
Per rack 220GB in/out Per node 10GB in/out
CPU 130 MEM 900MB
Local Sort
No network IO
CPU 80 MEM 1.4GB
Hadoop CPU 150 MEM 2GB

22
CreditStone
Text Record
Stage 2 Compute fraudulent rate for each
merchant
Trans IDTimeMerchant IDFraudAmount 01491200300
2007-09-272451330066.49
Transform
merch-000X
merch-000X
Text Record
Merchant ID
Time
Fraud
merch-001X
merch-001X
Key
Value
3-byte
000-999
merch-999X
merch-999x
Stage 1 Process each record and hash into
buckets according to merchant ID
23
Performance Results CreditStone
Racks JHU JHU, SL JHU, SL, Calit2 JHU, SL, Calit2, UIC
Number of Nodes 30 59 89 117
Size of Dataset (GB) 840 1652 2492 3276
Size of Dataset (rows) 15B 29.5B 44.5B 58.5B
Hadoop (min) 179 180 191 189
Sector with Index (min) 46 47 64 71
Sector w/o Index (min) 36 37 53 55
Courtesy of Jonathan Seidman of Open Data Group.
24
System Monitoring (Testbed)
25
System Monitoring (Sector/Sphere)
26
Future Work

High Availability
Multiple master servers
Scheduling
Optimize data channel
Enhance compute model and fault tolerance

27
For More Information

Sector/Sphere code docs http//sector.sf.net
Open Cloud Consortium http//www.opencloudconsort
ium.org
NCDM http//www.ncdm.uic.edu

28
Inverted Index
HTML page_1
Stage 2 Sort each bucket on local node, merge
same word
word_x word_y word_y word_z
Bucket-A
Bucket-A
1
word_x
Bucket-B
Bucket-B
1
word_y
1
word_z
Bucket-Z
Bucket-Z
1st letter
1
word_z
1, 5, 10
word_z
5
word_z
Stage 1 Process each HTML file and hash (word,
file_id) pair to buckets
10
word_z

Write a Comment

User Comments (0)