User Patterns from SkyServer - PowerPoint PPT Presentation

About This Presentation
Title:

User Patterns from SkyServer

Description:

We are building a distributed SQL Server cluster exceeding 1 Petabyte. Just becoming operational ... For the JHU cluster BW=0.664, MEM=1.099. Components. Data ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 6
Provided by: alex268
Learn more at: https://dsf.berkeley.edu
Category:

less

Transcript and Presenter's Notes

Title: User Patterns from SkyServer


1
User Patterns from SkyServer
  • 1/f power law of session times and request sizes
  • No discrete classes of users!!
  • Users are willing to learn SQL for advantage
  • Quickly adopted to server-side MyDB environment
  • 1600 active users of MyDB
  • Start with small ad-hoc requests
  • Once template finalized, run on whole dataset
  • Share results with others
  • Through the web services interface
  • Frequent use of crawlers
  • Often bucketize requests (checkpointing)

2
MyDB/CasJobs
Nolan Li, JHU
  • Server side user databases
  • Can perform joins with main databases
  • Users can create their own UDFs etc
  • Every user action is recorded
  • Can share tables, do simple graphics server side

MyDB
DR4
DR5
DR6
S4
3
JHU Petascale Archive
  • We are building a distributed SQL Server cluster
    exceeding 1 Petabyte
  • Just becoming operational
  • 40x8-core servers with 22TB each, 6x16-core
    servers with 33TB each, connected with
    Infiniband
  • 10Gbit lambda uplink to StarTap
  • Funded by Moore Foundation, Microsoft and the
    Pan-STARRS project
  • Dedicated to eScience,will provide public access

4
Amdahls Laws
  • Gene Amdahl (1965) Laws for a balanced system
  • Parallelism max speedup is S/(SP)
  • One bit of IO/sec per instruction/sec (BW)
  • One byte of memory per one instruction/sec (MEM)
  • One IO per 50,000 instructions (IO)
  • Modern multi-core systems move farther away from
    Amdahls Laws (Bell, Gray and Szalay 2006)
  • For a Blue Gene the BW0.013, MEM0.471.
  • For the JHU cluster BW0.664, MEM1.099

5
Components
  • Data must be heavily partitioned
  • It must be simple to manage
  • Distributed SQL Server cluster
  • Management tools
  • Configuration tools
  • Workflow environment for loading/system jobs
  • Workflow environment for user requests
  • Provide crawler framework
  • Both SQL and procedural languages
  • User workspace environment (MyDB)
Write a Comment
User Comments (0)
About PowerShow.com