ISTORE: Introspective Storage for DataIntensive Network Services

1 / 22
About This Presentation
Title:

ISTORE: Introspective Storage for DataIntensive Network Services

Description:

Aaron Brown, David Oppenheimer, Kimberly Keeton, ... Themis.cs outage: relied on humans to watch behavior of system to invoke proper ... –

Number of Views:48
Avg rating:3.0/5.0
Slides: 23
Provided by: aaronbrown6
Category:

less

Transcript and Presenter's Notes

Title: ISTORE: Introspective Storage for DataIntensive Network Services


1
ISTORE Introspective Storage for Data-Intensive
Network Services
  • Aaron Brown, David Oppenheimer, Kimberly Keeton,
  • Randi Thomas, Noah Treuhaft, John Kubiatowicz,
    Kathy Yelick, and David Patterson
  • Computer Science Division
  • University of California, Berkeley
  • http//iram.cs.berkeley.edu/istore/

2
Technical Problem to Tackle?
  • Build HW/SW to provide a scalable, available,
    maintainable (SAM) server that is dedicated to
    a single data-intensive application
  • Currents state-of-the-art emphasis is on
    cost-performance, with SAM largely ignored
  • Cost of administration/year typically 3X cost of
    disk
  • RAID, Tandem-like availability based on fail
    fast nothing fails fast today
  • Themis.cs outage relied on humans to watch
    behavior of system to invoke proper response
    what if take vacation day, go to dentist?

3
ISTORE-1 Hardware Prototype
  • Based on intelligent disk bricks (64 nodes)
  • fast embedded CPU performs local monitoring
    tasks, runs parallel application code
  • diagnostic hardware provides fail-fast behavior,
    self-testing, additional monitoring ?
    Introspection

4
A Software Framework for Introspection
  • ISTORE hardware provides device monitoring
  • ISTORE software framework should simplify writing
    introspective applications
  • Rule-based adaptation engine encapsulates the
    mechanisms of collecting, processing monitoring
    data
  • Maintainability information stored in data base
  • Policy compiler and mechanism libraries help turn
    application adaptation goals into rules
    reaction code
  • These provide a high-level, abstract interface to
    the systems monitoring and adaptation mechanisms

5
What is our plan for success?
  • One year view
  • Run data intensive kernels (sort, hash-join, ...)
    on small cluster of PCs to establish performance
    level
  • 3 OS (Linix, Free BSD, NetBSD) for genetic
    diversity
  • Install Berkeley DB to collect maintainability
    data
  • Learn about adaptabilty theory from Michael
    Jordan
  • Invent SAM Benchmarks
  • Construct ISTORE-1
  • Three year view
  • Based on lessons learned, construct ISTORE-2
  • Policy-based monitoring and reaction to SAM
    events

6
Status and Conclusions
  • ISTOREs focus is on introspective systems
  • a new perspective on systems research priorities
  • Proposed framework for building introspection
  • intelligent, self-monitoring plug-and-play
    hardware
  • software that provides a higher level of
    abstraction for the construction of introspective
    systems
  • flexible, powerful rule system for monitoring
  • policy specification automates generation of
    adaptation
  • Status
  • ISTORE-1 hardware prototype being constructed now
  • software prototyping just starting

7
Backup Slides
8
Related Work
  • Hardware
  • CMU and UCSB Active Disks
  • Software
  • Adaptive databases MS AutoAdmin, Informix
    NoKnobs
  • Adaptive OSs MS Millennium, adaptive VINO
  • Adaptive storage HP AutoRAID, attribute-managed
    storage
  • Active databases UFL Gator, TriggerMan
  • ISTORE unifies many of these techniques in a
    single system

9
ISTORE-1 Hardware Design
  • Brick
  • processor board
  • mobile Pentium-II, 366 MHz, 128MB SODRAM
  • PCI and ISA busses/controllers, SuperIO (serial
    ports)
  • Flash BIOS
  • 4x100Mb Ethernet interfaces
  • Adaptec Ultra2-LVD SCSI interface
  • disk one 18.2GB 10,000 RPM low-profile SCSI disk
  • diagnostic processor

10
ISTORE-1 Hardware Design (2)
  • Network
  • primary data network
  • hierarchical, highly-redundant switched Ethernet
  • uses 16 20-port 100Mb switches at the leaves
  • each brick connects to 4 independent switches
  • root switching fabric is two ganged 25-port
    Gigabit switches (PacketEngines PowerRails)
  • diagnostic network

11
Diagnostic Support
  • Each brick has a diagnostic processor
  • Goal small, independent, trusted piece of
    hardware running hand-verifiable
    monitoring/control software
  • monitoring CPU watchdog, environmental
    conditions
  • control
  • reboot/power-cycle main CPU
  • inject simulated faults power, bus transients,
    memory errors, network interface failure, ...
  • Separate diagnostic network connects the
    diagnostic processors of each brick
  • provides independent network path to diagnostic
    CPU
  • works when brick CPU is powered off or has failed
  • separate failure modes from Ethernet interfaces

12
Diagnostic Support Implementation
  • Not-so-small embedded Motorola 68k processor
  • provides the flexibility needed for research
    prototype
  • can communicate with CPU via serial port, if
    desired
  • still can run just a small, simple monitoring and
    control program if desired (no OS, networking,
    etc.)
  • CAN (Controller Area Network) diagnostic
    interconnect
  • one brick per shelf of 8 acts as gateway from
    CAN to redundant switched Ethernet fabric
  • CAN connects directly to automotive environmental
    monitoring sensors (temperature, fan RPM, ...)

13
ISTORE Research Agenda
  • ISTORE goal create a hardware/software
    framework for building introspective servers
  • Hardware
  • Software toolkit that allows programmers to
    easily define the systems adaptive behavior
  • provides abstractions for manipulating and
    reacting to monitoring data

14
Rule-based Adaptation
  • ISTOREs adaptation framework built on model of
    active database
  • database includes
  • hardware monitoring data device status, access
    patterns, performance stats
  • software monitoring data app-specific
    quality-of-service metrics, high-level workload
    patterns, ...
  • applications define views and triggers over the
    DB
  • views select and aggregate data of interest to
    app.
  • triggers are rules that invoke application-specifi
    c reaction code when their predicates are
    satisfied
  • SQL-like declarative language used to specify
    views and trigger rules

15
Benefits of Views and Triggers
  • Allow applications to focus on adaptation, not
    monitoring
  • hide the mechanics of gathering and processing
    monitoring data
  • can be dynamically redefined without altering
    adaptation code as situation changes
  • Can be implemented without a real database
  • views and triggers implemented as device-local
    and distributed filters and reaction rules
  • defined views and triggers control frequency,
    granularity, types of data gathered by HW
    monitoring
  • no materialized database necessary

16
Raising the Level of AbstractionPolicy Compiler
and Mechanism Libs
  • Rule-based adaptation doesnt go far enough
  • application designer must still write views,
    triggers, and adaptation code by hand
  • but designer thinks in terms of system policies
  • Solution designer specifies policies to system
    system implements them
  • policy compiler automatically generates views,
    triggers, adaptation code
  • uses preexisting mechanism libraries to implement
    adaptation algorithms
  • claim feasible for common adaptation mechanisms
    needed by data-intensive network service apps.

17
Open Research Issues
  • Defining appropriate software abstractions
  • how should views and triggers be declared?
  • what is the systems schema?
  • how should heterogeneous hardware be integrated?
  • can it be extended by the user to include new
    types and statistics?
  • what should the policy language look like?
  • what level of policies can be expressed?
  • how much of the implementation can the system
    figure out automatically?
  • to what extent can the system reason about
    policies and their interactions?
  • what functions should mechanism libraries provide?

18
More Open Research Issues
  • Implementing an introspective system
  • what default policies should the system supply?
  • what are the internal and external interfaces?
  • debugging
  • visualization of states, triggers, ...
  • simulation/coverage analysis of policies,
    adaptation code
  • appropriate administrative interfaces
  • Measuring an introspective system
  • what are the right benchmarks for
    maintainability, availability, scalability?
  • O(gt1000)-node scalability
  • how to write applications that scale and run well
    despite continual state of partial failure?

19
Motivation Technology Trends
  • Disks, systems, switches are getting smaller
  • Convergence on intelligent disks (IDISKs)
  • MicroDrive system-on-a-chip gt tiny IDISK nodes
  • Inevitability of enormous-scale systems
  • by 2006, a O(10,000) IDISK-node cluster with 90TB
    of storage could fit in one rack

20
Disk Limit
  • Continued advance in capacity (60/yr) and
    bandwidth (40/yr)
  • Slow improvement in seek, rotation (8/yr)
  • Time to read whole disk
  • Year Sequentially Randomly (1 sector/seek)
  • 1990 4 minutes 6 hours
  • 2000 12 minutes 1 week(!)
  • 3.5 form factor make sense in 5-7 years?

21
ISTORE-II Hardware Vision
  • System-on-a-chip enables computer, memory,
    redundant network interfaces without
    significantly increasing size of disk
  • Target for 5-7 years
  • 1999 IBM MicroDrive
  • 1.7 x 1.4 x 0.2 (43 mm x 36 mm x 5 mm)
  • 340 MB, 5400 RPM, 5 MB/s, 15 ms seek
  • 2006 MicroDrive?
  • 9 GB, 50 MB/s (1.6X/yr capacity, 1.4X/yr BW)

22
2006 ISTORE
  • ISTORE node
  • Add 20 pad to MicroDrive size for packaging,
    connectors
  • Then double thickness to add IRAM
  • 2.0 x 1.7 x 0.5 (51 mm x 43 mm x 13 mm)
  • Crossbar switches growing by Moores Law
  • 2x/1.5 yrs ? 4X transistors/3yrs
  • Crossbars grow by N2 ? 2X switch/3yrs
  • 16 x 16 in 1999 ? 64 x 64 in 2005
  • ISTORE rack (19 x 33 x 84) (480 mm x 840 mm
    x 2130 mm)
  • 1 tray (3 high) ? 16 x 32 ? 512 ISTORE nodes
  • 20 traysswitchesUPS ? 10,240 ISTORE nodes(!)
Write a Comment
User Comments (0)
About PowerShow.com