Title: A Tentative Proposal for ISTORE-2
1A Tentative Proposal for ISTORE-2
July 18, 2000
Winfried W. Wilcke wilcke_at_almaden.ibm.com (408)
927-2139 Almaden Research Center
Richard C. Booth rcbooth_at_us.ibm.com (408)
927-1879 Almaden Research Center
David A. Patterson pattrsn_at_cs.berkeley.edu (510)
642-6587 University of California, Berkeley
2Underlying Beliefs...
- Commodity components are quickly winning the
server wars - Gigabit Ethernet will win everything
- x86 Processors
- Linux OS will prosper
- Large servers (100-10k nodes) will be quite
common - and most are storage centric - What matters most
- Ease of management, density of nodes and seamless
geographical interconnect
3Generations of IStore
- IStore IStore-1 Present UCB Project
- IStore-2 Joint Research Prototype
- 2000 nodes
- Split between UCB, IBM and others
- Hardware similar to IStore-1
- Focus on real applications and management
software - Operational YE 2001
- Follow-on Work
4Talk Outline
- Project Goals
- Applications
- Research Topics
- Hardware Architecture
- Development Schedule
- Working Relationships
- Next Steps
5ApplicationsResearch Topics
6Candidate Applications
- Research Focus
- NOAA Severe Weather Warning (R. Arps, ARC)
- Fast Image Recognition (J. Malik, UCB)
- Commercial Focus
- Scalable E-business server (IGS) - a must !
- Deep Searching of Entire Web Webfountain (N.
Pass) - (tbd) Large Scale Network Attached Server (J.
Palmer) - (tbd) Speech Recognition Farms for Phone-based
Special Web-services
7NOAA Severe Weather.... Ron Arps
- Doppler Radar enables detection of violent
tornadoes and plane crashes due to windshear - Doubled warning time for residents in Oklahoma
during '99 class 5 outbreaks - Goal 15 minutes avg. warning time in 2004
- Eventually 120 radar sites will be established
- Matches well with I-Store characteristics
- Needs scalable local storage/processing plus
seamless transfer of data on geographical scale,
manageable from one site
8(No Transcript)
9WebfountainNorm Pass
- Index entire Web every few weeks
- Google, Northernlight index 25
- 4 TB index gt 200 TB in two years
- 'Miner' technology demonstrated
- Resumes, Prices, Geospatial,...
- Prototype running on a 30 node Linux farm
10Software Model
- Users will see a standard Linux farm (shared
nothing) programming model - No porting effort for existing Linux farm
applications (except dealing with different
versions of Linux, of course) - The system management functions are only visible
to system administrators - Exception are performance monitoring functions
useful for tuning apps
11Differences to a Linux Farm
- Much higher spatial density of Nodes or Bricks
- Single network protocol (Ethernet) for ALL
off-node communications - Design with geographical distribution in mind
- Diagnostic Processors
- Lego-like, standardized building blocks
- Regular and relaxed homogeneous
- Monitoring Hardware
- Measuring of relevant environmental parameters
- (New) System Management Language
- AME, SON and RAIN objectives
12(No Transcript)
13AME, RAIN and SON
- Three areas of system research to be explored
with I-Store - These three areas are largely independent of each
other
14AME
- Availability
- No single points of failure
- Introspection, failover and fast failure
- Fast repair by swapping identical blocks
- Maintainability
- Homogenous structure
- System management language
- Extensibility/Scalability
- Shared nothing architecture
15RAIN
- Redundant Array of Inexpensive Network (Switches)
- Issues to be explored
- Optimal topology
- Density/cost of ports, optics vs. copper
- Routing algorithms within a machine
- Need for TCP hardware acceleration
- Performance of Ethernet protocol
- Frame sizes
- Simplified switches
16SON
- Storage Oriented Nodes
- Basic Premise of one nodeone diskone processor
- It works in farms, but is it a good general
choice? - Is the loss of flexibility (in the ratio of disks
per processor) a good tradeoff for easier
management?
17Additional Software Research Topics...
- Define AME, RAIN, SON benchmarks
- Server Management Language
- Parallel Searching of geographically distributed
database - Dynamic Resource Allocation (i.e. Firewalls)
- SCSI over TCP/IP (SAN within I-Store)
- Storage for mobile users (ala Ocean Store)
18System Management Language
- Define a high-level, interpretive(?) system
management language - May use facilities of system OS
- Highly regular I-Store is the first target
- Sample Verbs
- allocate, protect, share, map, backup, restore,
copy, correlate, display, discover, ping,
initialize, report, arm, define(node)....
19System Management Language
- Should easily describe tasks such as
- Backup all data located in the Philippines to
Colorado (a volcano is about to blow) - Set alarm if any disk is more than 80 full
- Define protected subregions in the system
- Display CPU utilization by time and state
- Discover present routing topology
- Show 3D correlation plot of disk vibration vs
brick temperature vs. actual failure events - .....
20Hardware ArchitectureDevelopment
ScheduleWorking Relationships
21IStore HardwareArchitecture Goals
- Seamless Scalability
- O(10,000) AME Storage Nodes
- Optimized Storage Brick for Packaging Density
- Geographically Disperse Nodes
- Gb Ethernet Connections to WAN Routers
- Storage Brick
- Full PME Brick Processor, Memory, Cache
- Gb Ethernet as the Sole Interconnection Fabric
- Imbedded Disk with 10s GBytes
22IStore HardwareArchitecture Goals (cont.)
- State-of-the-art Intel Processor Memory Element
(PME) - 650 MHz Pentium III with 100 MHz System Bus
- 256 KB L2 cache
- O(512MB) main memory
- State-of-the-art Interconnect Fabric
- 1 Gb Ethernet Runtime Network
- 10/100 Mb Ethernet Diagnostic Network
- State-of-the-art Disks
- 2.5" 32 GB drive
23IStore HardwareArchitecture Goals (cont.)
- Berkeley AME Hardware Management Support
- Diagnostic processor
- Environmental sensors
- TCP/IP Hardware Accelerator
- Class 4 Hardware State Machine
- SCSI over TCP ("iSCSI") Support
- Compatible with Standard Ethernet
Switches/Routers
24IStore-1Current Berkeley Design
- 80 nodes
- AME
- 266 MHz Pentium II
- Four 100 MB Ethernet Ports/brick
- Integrated UPS
25IStore-2Deltas from IStore-1
- Geographically Disperse Nodes
- O(1000) nodes at Almaden
- O(1000) nodes at Berkeley
- Upgraded Storage Brick
- Pentium III 650 MHz Processor
- Two Gb Ethernet Copper Ports/brick
- One 2.5" ATA disk
- User Supplied UPS Support
- Standard Ethernet Switches
26Follow on Work
- Ethernet Sourced in Memory Controller (North
Bridge) - TCP/IP Hardware Accelerator
- Class 4 Hardware State Machine
- SCSI over TCP Support
- Integrated UPS
27Why an IStore-2 PrototypeIs Interesting
- Storage Bricks
- New ratios for MIPS/bandwidth/storage
- New level of density
- AME Hardware Support
- Seamless scaling
- Self maintaining nodes
- It Exists
28IStore-2Core Design Team
- IBM (full time)
- System Architect Winfried Wilcke
- Lead Designer Richard Booth
- 1 Experienced Hardware Designer tbd
- 3 Designers tbd
- Berkeley
- 6 Graduate Students
29IStore-2Development Schedule
- Working Model
- 7/00 Agreement in Principle
- 8/00 Working Team Membership
- Design
- 9/00 Architecture Specification version 1.0
- 11/00 Design Workbook version 1.0
- Implementation
- 2Q/01 First 3 Nodes Power-up
- 3Q/01 O(64) nodes available to users
- 4Q/01 O(2000) nodes available to users
30IStore-2 Footprint(per 1000 nodes)
- 16 Storage (19") Racks
- 64 Storage bricks/rack
- 8 type 1 storage bricks/drawer
- 8 storage drawers/rack
- Ethernet switches in rack
- 8 Global Ethernet Switch (19") Racks
- Requires 600 sq.. ft lab
31IStore-2 PlatformRequired Resources
- Staffing
- 6 ARC/SSD IBMers
- 6 UCB Graduate Students
- Lab Space
- 600 sq. ft. lab at Almaden
- 600 sq. ft. lab at Berkeley
- Hardware Costs
- 3M (mostly 2001 dollars)
32IStore-2Working Model
- Jointly Authored Architecture Specification
- 1 or 2 Almaden authors
- 1 or 2 Berkeley authors
- Design Workbook
- Each Core Team Member owns a section
- Weekly Half Day Working Face-to-face Meetings
- Alternate between Almaden and Berkeley
- Shared Electronic Documentation
- Machine Available -for free- to Users From Either
Institution - IP is Handled Like Previous IBM/UCB Projects ??
- Fabrication (some design ?) Vendored Out
33Next Steps
- Continue to Seek Feedback on Proposal
- Funding Discussion
- IBM
- Berkeley
- Form IBM Team
- Begin Regular Working Meetings
- Begin Architectural Design