ISTORE: Introspective Storage for DataIntensive Network Services

1 / 22

About This Presentation

Title:

ISTORE: Introspective Storage for DataIntensive Network Services

Description:

Aaron Brown, David Oppenheimer, Kimberly Keeton, ... Themis.cs outage: relied on humans to watch behavior of system to invoke proper ... –

Number of Views:48

Avg rating:3.0/5.0

Slides: 23

Provided by: aaronbrown6

Learn more at: http://endeavour.cs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: ISTORE: Introspective Storage for DataIntensive Network Services

1
ISTORE Introspective Storage for Data-Intensive
Network Services

Aaron Brown, David Oppenheimer, Kimberly Keeton,
Randi Thomas, Noah Treuhaft, John Kubiatowicz,
Kathy Yelick, and David Patterson
Computer Science Division
University of California, Berkeley
http//iram.cs.berkeley.edu/istore/

2
Technical Problem to Tackle?

Build HW/SW to provide a scalable, available,
maintainable (SAM) server that is dedicated to
a single data-intensive application
Currents state-of-the-art emphasis is on
cost-performance, with SAM largely ignored
Cost of administration/year typically 3X cost of
disk
RAID, Tandem-like availability based on fail
fast nothing fails fast today
Themis.cs outage relied on humans to watch
behavior of system to invoke proper response
what if take vacation day, go to dentist?

3
ISTORE-1 Hardware Prototype

Based on intelligent disk bricks (64 nodes)
fast embedded CPU performs local monitoring
tasks, runs parallel application code
diagnostic hardware provides fail-fast behavior,
self-testing, additional monitoring ?
Introspection

4
A Software Framework for Introspection

ISTORE hardware provides device monitoring
ISTORE software framework should simplify writing
introspective applications
Rule-based adaptation engine encapsulates the
mechanisms of collecting, processing monitoring
data
Maintainability information stored in data base
Policy compiler and mechanism libraries help turn
application adaptation goals into rules
reaction code
These provide a high-level, abstract interface to
the systems monitoring and adaptation mechanisms

5
What is our plan for success?

One year view
Run data intensive kernels (sort, hash-join, ...)
on small cluster of PCs to establish performance
level
3 OS (Linix, Free BSD, NetBSD) for genetic
diversity
Install Berkeley DB to collect maintainability
data
Learn about adaptabilty theory from Michael
Jordan
Invent SAM Benchmarks
Construct ISTORE-1
Three year view
Based on lessons learned, construct ISTORE-2
Policy-based monitoring and reaction to SAM
events

6
Status and Conclusions

ISTOREs focus is on introspective systems
a new perspective on systems research priorities
Proposed framework for building introspection
intelligent, self-monitoring plug-and-play
hardware
software that provides a higher level of
abstraction for the construction of introspective
systems
flexible, powerful rule system for monitoring
policy specification automates generation of
adaptation
Status
ISTORE-1 hardware prototype being constructed now
software prototyping just starting

7
Backup Slides
8
Related Work

Hardware
CMU and UCSB Active Disks
Software
Adaptive databases MS AutoAdmin, Informix
NoKnobs
Adaptive OSs MS Millennium, adaptive VINO
Adaptive storage HP AutoRAID, attribute-managed
storage
Active databases UFL Gator, TriggerMan
ISTORE unifies many of these techniques in a
single system

9
ISTORE-1 Hardware Design

Brick
processor board
mobile Pentium-II, 366 MHz, 128MB SODRAM
PCI and ISA busses/controllers, SuperIO (serial
ports)
Flash BIOS
4x100Mb Ethernet interfaces
Adaptec Ultra2-LVD SCSI interface
disk one 18.2GB 10,000 RPM low-profile SCSI disk
diagnostic processor

10
ISTORE-1 Hardware Design (2)

Network
primary data network
hierarchical, highly-redundant switched Ethernet
uses 16 20-port 100Mb switches at the leaves
each brick connects to 4 independent switches
root switching fabric is two ganged 25-port
Gigabit switches (PacketEngines PowerRails)
diagnostic network

11
Diagnostic Support

Each brick has a diagnostic processor
Goal small, independent, trusted piece of
hardware running hand-verifiable
monitoring/control software
monitoring CPU watchdog, environmental
conditions
control
reboot/power-cycle main CPU
inject simulated faults power, bus transients,
memory errors, network interface failure, ...
Separate diagnostic network connects the
diagnostic processors of each brick
provides independent network path to diagnostic
CPU
works when brick CPU is powered off or has failed
separate failure modes from Ethernet interfaces

12
Diagnostic Support Implementation

Not-so-small embedded Motorola 68k processor
provides the flexibility needed for research
prototype
can communicate with CPU via serial port, if
desired
still can run just a small, simple monitoring and
control program if desired (no OS, networking,
etc.)
CAN (Controller Area Network) diagnostic
interconnect
one brick per shelf of 8 acts as gateway from
CAN to redundant switched Ethernet fabric
CAN connects directly to automotive environmental
monitoring sensors (temperature, fan RPM, ...)

13
ISTORE Research Agenda

ISTORE goal create a hardware/software
framework for building introspective servers
Hardware
Software toolkit that allows programmers to
easily define the systems adaptive behavior
provides abstractions for manipulating and
reacting to monitoring data

14
Rule-based Adaptation

ISTOREs adaptation framework built on model of
active database
database includes
hardware monitoring data device status, access
patterns, performance stats
software monitoring data app-specific
quality-of-service metrics, high-level workload
patterns, ...
applications define views and triggers over the
DB
views select and aggregate data of interest to
app.
triggers are rules that invoke application-specifi
c reaction code when their predicates are
satisfied
SQL-like declarative language used to specify
views and trigger rules

15
Benefits of Views and Triggers

Allow applications to focus on adaptation, not
monitoring
hide the mechanics of gathering and processing
monitoring data
can be dynamically redefined without altering
adaptation code as situation changes
Can be implemented without a real database
views and triggers implemented as device-local
and distributed filters and reaction rules
defined views and triggers control frequency,
granularity, types of data gathered by HW
monitoring
no materialized database necessary

16
Raising the Level of AbstractionPolicy Compiler
and Mechanism Libs

Rule-based adaptation doesnt go far enough
application designer must still write views,
triggers, and adaptation code by hand
but designer thinks in terms of system policies
Solution designer specifies policies to system
system implements them
policy compiler automatically generates views,
triggers, adaptation code
uses preexisting mechanism libraries to implement
adaptation algorithms
claim feasible for common adaptation mechanisms
needed by data-intensive network service apps.

17
Open Research Issues

Defining appropriate software abstractions
how should views and triggers be declared?
what is the systems schema?
how should heterogeneous hardware be integrated?
can it be extended by the user to include new
types and statistics?
what should the policy language look like?
what level of policies can be expressed?
how much of the implementation can the system
figure out automatically?
to what extent can the system reason about
policies and their interactions?
what functions should mechanism libraries provide?

18
More Open Research Issues

Implementing an introspective system
what default policies should the system supply?
what are the internal and external interfaces?
debugging
visualization of states, triggers, ...
simulation/coverage analysis of policies,
adaptation code
appropriate administrative interfaces
Measuring an introspective system
what are the right benchmarks for
maintainability, availability, scalability?
O(gt1000)-node scalability
how to write applications that scale and run well
despite continual state of partial failure?

19
Motivation Technology Trends

Disks, systems, switches are getting smaller

Convergence on intelligent disks (IDISKs)
MicroDrive system-on-a-chip gt tiny IDISK nodes
Inevitability of enormous-scale systems
by 2006, a O(10,000) IDISK-node cluster with 90TB
of storage could fit in one rack

20
Disk Limit