SimMillennium and Beyond From - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

SimMillennium and Beyond From

Description:

CITRIS Cluster 1: 3/2002 deployment (Intel Donation) ... CITRIS Cluster 2: deployment (Intel Donation) ~128 Dell McKinley class Duals: 256 processors ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 23

Provided by: DavidE7

Learn more at: https://people.eecs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: SimMillennium and Beyond From

1
SimMillennium and BeyondFrom Computer Systems,
Computational Science and Engineering in the
Large to petabyte stores

David Culler,
NSF Site Visit
March 5, 2003

2
SimMillennium Project Goals

Vision To work, think, and study in a
computationally rich environment with deep
information stores and powerful services
Enable major advances in Computational Science
and Engineering
Simulation, Modeling, and Information Processing
becoming ubiquitous
Explore novel design techniques for large,
complex systems
Fundamental Computer Science problems ahead are
problems of scale
Organized in concert with Univ. structure gt
computational economy
Develop fundamentally better ways of
assimilating and interacting with large volumes
of information and with each other
Explore emerging technologies
networking, OS, devices

3
Research Infrastructure We Built

Cluster of Clusters (CLUMPS) distributed over
multiple departments
gigabit ethernet within and between
Myrinet High speed interconnect
Vineyard Cluster System Architecture
Rootstock remote cluster installation tools
Ganglia remote cluster monitoring
GEXEC remote execution, GM (Myricom) messaging,
MPI
PCP parallel file tools
collection of port daemons, tools to make it all
hand together
Gigabit to desktop, immersadesk, ...

4
(No Transcript)
5
Cluster Counts

Millennium Central Cluster
99 Dell 2300/6400/6450 Xeon Dual/Quad 336
processors
Total 238 GB memory, 2 TB disk
Myrinet 2000 1000Mb fiber ethernet
Millennium Campus Clusters (Astro, Math, CE, EE,
Physics, Bio)
176 proc, 34 GB mem, 1.2 TB local disk
total 512 proc, 292 GB mem, 3.2 TB scratch
NPACI ROCKS Cluster
8 proc, 2 GB mem, 36 GB
OceanStore/ROC cluster
PlanetLab Cluster
6 prc, 1.32 GHz, 3 GB mem, 180 GB
CITRIS Cluster 1 3/2002 deployment (Intel
Donation)
4 Dell Precision 730 Itanium Duals 8 processors
Total 8GB memory, 128GB disk
Myrinet 2000 1000Mb copper ethernet (SimMil)
CITRIS Cluster 2 deployment (Intel Donation)
128 Dell McKinley class Duals 256 processors
16x2 installed

6
Cluster Top Users 2/2003
http//ganglia.millennium.berkeley.edu

800 users total on central cluster
84 major users for 2/2003 average 62 total CPU
utilization
ROC middle tier storage layer
testing/performance (bling,ach,fox_at_stanford)
Computer Vision Group image recognition,
boundary detection and segmentation, data mining
(aberg,lwalk,dmartin,ryanw, xren) 2 hours on
cluster vs. 2 weeks on local resources
Computational Biology Lab - large-scale
biological sequence database searches in parallel
(brenner_at_compbio)
Tempest - TCAD tools for Next Generation
Lithography (yunfei)
Internet services performance characteristics
of multithreaded servers (jrvb,jcondit)
Sensor Networks power reduction (vwen)
Economic modeling (stanton_at_haas)
Machine learning information retrieval, text
processing (blei)
Analyzing trends in BGP routing tables (sagarwal,
mccaesar)
Graphics - Optical simulation and high quality
rendering (adamb, csh)
Digital Library Project image retreival by
image content (loretta)
Bottleneck Analysis of Fine-grain Parallelism
(bfields)
SPUR Earthquake simulation (jspark_at_ce)
Titanium compiler and runtime system design for
high performance parallel programming languages
(bonachea)
AMANDA neutrino detection from polar ice core
samples (amanda)

7
Impact

Numerous groups doing research they could not
have done without it
Malik photorealistic rendering, physics
simulation,..
Yelick, Titanium, Heart Modeling, ...
Wilensky, Digital Library, image segmentation
Brewer, Culler, Ninja Internet Service Arch...
Price, AMANDA, ...
Kubiatowicz, OceanStore, Katz, Sahara,
Hellerstein PIER
First eScience Portals
Tempest, EUV lithography, Sugar MEMS simulation
services
safe.millennium.berkeley.edu on Sept 11
built w/i hours, scaled to million hits per day
CS267 core of MS of computation science X
Cluster tools widely adopted
NPACI ROCKS
Ganglia the most downloaded cluster tool, in all
the distributions, OSCAR, open source development
team

8
Computational Economy

Developed economic-based resource allocation
decentralized design
interactive and batch
Advanced the SOA
controlled experiments with priced and unpriced
clusters
analysis of utility gain relative to traditional
resource allocation algorithms
Picked up in several other areas
index pricing internet bandwidth
iceberg pricing in telco/internet merge
core to internet design for planetary scale
services

9
Emergence of Planetary-Scale Services

In past year Millennium became THE simulation
engine for P2P
oceanstore, I3, Sahara, BGP alternatives, PIER
Ganglia was the technical enabler for planetlab
gt 100 machines at gt 50 sites in gt 8 countries
THE testbed for internet-scale systems research

10
Fundamental Bottleneck Storage

Current storage hierarchy
based on NPACI reference
3 TB local /scratch and /net/MMxx/scratch 4-day
deletion
0.5 TB global NFS /work 9-day deletion
inadequate BW and capacity
4 TB /home and /project
uniform naming through automount
doesnt scale to cluster access
gt augment capacity, BW, and metadata BW
weve been tracking cluster storage options since
xFS on NOW and Tertiary Disk in 1995.

11
Another Cluster a storage cluster
Millennium Clusters
Scalable GigE Core
Massive Storage Clusters
Myrinet SAN
Citris Clusters
Designed for higher reliability Avoid competition
from on-going computation Local disks heavily
used as scratch
12
Initial Cluster Design with 3.5TB Distributed
File Store
Myrinet 2000
2 Frontend Nodes
Foundry 8000
Campus Core
2
2
1TFlop 1.6TB memory 128 Dual Itanium 2 Compute
Nodes
Foundry 1500
128
128
Foundry 8000
4 Storage Controller 2 MetaServers
6
6
4
1 Gigabit Ethernet
Myrinet
3.5TB Fibre Channel Storage
Fibre Channel
13
Initial 3.5 TB Cluster Data Store
Meta Server
Meta Server
864GB
864GB
864GB
864GB
Storage Controller
Storage Controller
Storage Controller
Storage Controller
BlueARC si8300 with 24 36GB 15K rpm disks and
growth room
36GB 15K rpm
fibre channel
gbit ethernet
myrinet
14
Lustre A High-Performance, Scalable, Distributed
File System for Clusters and Shared-Data
Environments

Progress since xFS
TruCluster, GPFS, pvfs, ...
need production quality
NAS is finally here
History CMU, Seagate, Los Alamos, Sandia,
TriLabs
Distributed Filesystem replacing NFS
Object based file storage
object like inode represents a file
Opensource development managed by Cluster File
Systems, Inc.
Gaining wide acceptance for production
high-performance computing
PNNL and LLNL
Los Alamos and Sandia Labs
HP support as part of linux cluster effort
Intel Enterprise Architecture Lab

15
Lustre Key Advantages

Open protocols, standards Portals API, XML, LDAP
Runs on commodity PC hardware 3rd party OST
such as BlueArc
Uses commodity filesystems on OSTs
such as ext3, JFS ReiserFS and XFS
Scalable and efficient design splits
(qty 2) Metadata servers storing file system
metadata
(up to 100) Object storage targets storing files
To support up to 2000 clients
Flexible model for adding new storage to existing
Lustre file system.
Metadata server failover

16
Lustre Functionality
recovery, file status, file creation
Meta Servers (Meta Data Servers)
Storage Controllers (Object Storage Targets)
system and parallel file I/O, file locking
directory metadata and concurrency
Clients
17
Growth Plan

based on conservative 50 per year density
expect roughly double

35 TB 8 SS 3 MS
23 TB 8 SS 3 MS
14 TB 8 SS 3 MS
8 TB 6 SS 3 MS
3.5 TB 4 SS 2 MS
y03
y04
y05
y06
y07
18
Example Projects

Cluster monitoring trace
¼ TB per year for 300 nodes
ROC failure data
¼ TB per year, much higher if get industrial
feeds
Digital Library
Video
100 GB/hour uncompressed
Vision
100 GB per experiement
PlanetLab
internet wide instrumentation and logging

We will look back and say, we are doing
research today that we could not have done
without this
19
End of the Tape Era
20
Emergence of the Sensor Net Era

100s of research groups and companies using the
Berkeley Mote / TinyOS platform
dozens of projects on campus
billions of networked devices connected to the
physical world constantly streaming data
gt start building the storage and processing
infrastructure for this new class of system today!

21
Environment Monitoring Experience

Canonical patch net architecture
live historical readings www.greatduckisland.net
43 nodes, 7/13-11/18
above and below ground
light, temperature, relative humidity, and
occupancy data, at 1 minute resolution
gt1 million measurements
Best nodes 90,000
3 major maintenance events
node design and packaging in harsh environment
-20 100 degrees, rain, wind
power mgmt and interplay with sensors and
environment