Distributed Data Storage and Processing over Commodity Clusters - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed Data Storage and Processing over Commodity Clusters

Description:

Sphere: Run-time middleware that supports simplified ... Two dual-core AMD CPU, 12GB RAM, 1TB single disk. Open Cloud Testbed. Example: Sorting a TeraByte ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 29
Provided by: gu13125
Category:

less

Transcript and Presenter's Notes

Title: Distributed Data Storage and Processing over Commodity Clusters


1
Distributed Data Storage and Processing over
Commodity Clusters
Sector Sphere
Yunhong Gu Univ. of Illinois at Chicago _at_Univ.
of Chicago, Feb. 17, 2009
2
What is Sector/Sphere?
  • Sector Distributed Storage System
  • Sphere Run-time middleware that supports
    simplified distributed data processing.
  • Open source software, GPL, written in C.
  • Started since 2006, current version 1.18
  • http//sector.sf.net

3
Overview
  • Motivation
  • Sector
  • Sphere
  • Experimental studies
  • Future work

4
Motivation
Super-computer model Expensive, data IO
bottleneck
Sector/Sphere model Inexpensive, parallel data IO
5
Motivation
Parallel/Distributed Programming with MPI,
etc. Flexible and powerful. BUT complicated, no
data locality
Sector/Sphere model Clusters are a unity to the
developer, simplified programming interface, data
locality support from the storage layer. Limited
to certain data parallel applications.
6
Motivation
Systems for single data centers Requires
additional effort to locate and move data.
Sector/Sphere model Support wide-area data
collection and distribution.
7
Sector Distributed Storage System
Storage System Mgmt. Processing
Scheduling Service provider
System access tools App. Programming Interfaces
User account Data protection System Security
Security Server
Master
Client
SSL
SSL
Data
UDT Encryption optional
slaves
slaves
Storage and Processing
8
Sector Distributed Storage System
  • Sector stores files on the native/local file
    system of each slave node.
  • Sector does not split files into blocks
  • Pro simple/robust, suitable for wide area
  • Con file size limit
  • Sector uses replications for better reliability
    and availability
  • The master node maintains the file system
    metadata. No permanent metadata is needed.
  • Topology aware

9
Sector Write/Read
  • Write is exclusive
  • Replicas are updated in a chained manner the
    client updates one replica, and then this replica
    updates another, and so on. All replicas are
    updated upon the completion of a Write operation.
  • Read different replicas can serve different
    clients at the same time. Nearest replica to the
    client is chosen whenever possible.

10
Sector Tools and API
  • Supported file system operation ls, stat, mv,
    cp, mkdir, rm, upload, download
  • Wild card characters supported
  • System monitoring sysinfo.
  • C API list, stat, move, copy, mkdir, remove,
    open, close, read, write, sysinfo.

11
Sphere Simplified Data Processing
  • Data parallel applications
  • Data is processed at where it resides, or on the
    nearest possible node (locality)
  • Same user defined functions (UDF) can be applied
    on all elements (records, blocks, or files)
  • Processing output can be written to Sector files,
    on the same node or other nodes
  • Generalized Map/Reduce

12
Sphere Simplified Data Processing
Input
Output
UDF
Input
Intermediate
UDF
Output
UDF
Input 1
Output
UDF
Input 2
13
Sphere Simplified Data Processing
for each file F in (SDSS datasets) for each
image I in F findBrownDwarf(I, )
SphereStream sdss sdss.init("sdss
files") SphereProcess myproc myproc-gtrun(sdss,"f
indBrownDwarf", ) myproc-gtread(result)
findBrownDwarf(char image, int isize, char
result, int rsize)
14
Sphere Data Movement
  • Slave -gt Slave Local
  • Slave -gt Slaves (Shuffle/Hash)
  • Slave -gt Client

15
Load Balance Fault Tolerance
  • The number of data segments is much more than the
    number of SPEs. When an SPE completes a data
    segment, a new segment will be assigned to the
    SPE.
  • If one SPE fails, the data segment assigned to it
    will be re-assigned to another SPE and be
    processed again.
  • Detect and remove "fault" nodes.

16
Open Cloud Testbed
  • 4 Racks in Baltimore (JHU), Chicago (StarLight
    and UIC), and San Diego (Calit2)
  • 10Gb/s inter-site connection on CiscoWave
  • 1Gb/s inter-rack connection
  • Two dual-core AMD CPU, 12GB RAM, 1TB single disk

17
Open Cloud Testbed
18
Example Sorting a TeraByte
  • Data is split into small files, scattered on all
    slaves
  • Stage 1 On each slave, an SPE scans local files,
    sends each record to a bucket file on a remote
    node according to the key, so that all buckets
    are sorted.
  • Stage 2 On each destination node, an SPE sort
    all data inside each bucket.

19
TeraSort
Binary Record 100 bytes
Stage 2 Sort each bucket on local node
10-byte
90-byte
Value
Key
Bucket-0
Bucket-0
Bucket-1
Bucket-1
10-bit
0-1023
Stage 1 Hash based on the first 10 bits
Bucket-1023
Bucket-1023
20
Performance Results TeraSort
Run time seconds Sector v1.16 vs Hadoop 0.17
Data Size Sphere Hadoop (3 replicas) Hadoop (1 replica)
UIC 300GB 1265 2889 2252
UIC StarLight 600GB 1361 2896 2617
UIC StarLight Calit2 900GB 1430 4341 3069
UIC StarLight Calit2 JHU 1.2TB 1526 6675 3702
21
Performance Results TeraSort
  • Sorting 1.2TB on 120 nodes
  • Hash vs. Local Sort 981sec 545sec
  • Hash
  • Per rack 220GB in/out Per node 10GB in/out
  • CPU 130 MEM 900MB
  • Local Sort
  • No network IO
  • CPU 80 MEM 1.4GB
  • Hadoop CPU 150 MEM 2GB

22
CreditStone
Text Record
Stage 2 Compute fraudulent rate for each
merchant
Trans IDTimeMerchant IDFraudAmount 01491200300
2007-09-272451330066.49
Transform
merch-000X
merch-000X
Text Record
Merchant ID
Time
Fraud
merch-001X
merch-001X
Key
Value
3-byte
000-999
merch-999X
merch-999x
Stage 1 Process each record and hash into
buckets according to merchant ID
23
Performance Results CreditStone
Racks JHU JHU, SL JHU, SL, Calit2 JHU, SL, Calit2, UIC
Number of Nodes 30 59 89 117
Size of Dataset (GB) 840 1652 2492 3276
Size of Dataset (rows) 15B 29.5B 44.5B 58.5B
Hadoop (min) 179 180 191 189
Sector with Index (min) 46 47 64 71
Sector w/o Index (min) 36 37 53 55
Courtesy of Jonathan Seidman of Open Data Group.
24
System Monitoring (Testbed)
25
System Monitoring (Sector/Sphere)
26
Future Work
  • High Availability
  • Multiple master servers
  • Scheduling
  • Optimize data channel
  • Enhance compute model and fault tolerance

27
For More Information
  • Sector/Sphere code docs http//sector.sf.net
  • Open Cloud Consortium http//www.opencloudconsort
    ium.org
  • NCDM http//www.ncdm.uic.edu

28
Inverted Index
HTML page_1
Stage 2 Sort each bucket on local node, merge
same word
word_x word_y word_y word_z
Bucket-A
Bucket-A
1
word_x
Bucket-B
Bucket-B
1
word_y
1
word_z
Bucket-Z
Bucket-Z
1st letter
1
word_z
1, 5, 10
word_z
5
word_z
Stage 1 Process each HTML file and hash (word,
file_id) pair to buckets
10
word_z
Write a Comment
User Comments (0)
About PowerShow.com