Title: ISTORE%20Software%20Runtime%20Architecture
1ISTORE Software Runtime Architecture
- Aaron Brown, David Oppenheimer, Kimberly Keeton,
Randi Thomas, Jim Beck, John Kubiatowicz, and
David Patterson - http//iram.cs.berkeley.edu/istore
- 1999 Winter IRAM Retreat
2ISTORE RuntimeSoftware Architecture
- Runtime system goals for the ISTORE
meta-appliance - (1) Provide mechanisms that allow network service
applications to exploit introspection (monitor
adapt) - (2) Allow appliance designer to tailor runtime
system policies and interfaces - How the goals are achieved
- (1) Introspection layered local and global
runtime system libraries that manipulate and
react to monitoring data - (2) Specialization runtime system is extensible
using domain- specific languages (DSLs)
3Roadmap
- Layered software structure
- Example of introspection
- Runtime system extensibility using DSLs
- Conclusion
4Layered software structure
HW Device (NIC)
HW Device
5Device Interface Layer
HW Device (NIC)
HW Device
6Device interface layer
- Microkernel OS modules
- Traditional OS services
- Networking, mem management, process scheduling,
threads, - Device-specific monitoring
- Raw access patterns
- Utilization statistics
- Environmental parameters
- Indications of impending failure
- Self-characterization of performance, functional
capabilities
7Local runtime layer
HW Device (NIC)
HW Device
8Local runtime layer
- Non-distributed mechanisms needed by network
service applications - Feeds information to global layer or performs
local operations on behalf of global layer - Example mechanisms
- Application-specific filtering/aggregation of
device monitoring data - Example OLTP server vs. DSS server
- Data layout and naming
- Example record-based interface for DB,
file-based for web server - Device scheduling
- Example maximize TPS vs. maximize disk bandwidth
utilization - Caching
- Coherence essential vs. coherence unnecessary
- More efficient caching implementation possible in
second case
9Global runtime layer
HW Device (NIC)
HW Device
10Global runtime layer
- Aggregate, process, react to monitoring data
- Relies on local per-device runtime mechanisms to
provide monitoring data, implement control
actions - Provides application interface that hides
distributed implementation of runtime services - Example services
- High-level services
- Load balancing replicate and/or migrate heavily
used data objects when a disk becomes
over-utilized - Availability replicate data from failed or
failing component to restore required redundancy - Plug-and-play integrate new devices into the
system - Low-level services used to implement high-level
global services - Distributed directory tracks data and metadata
objects - Migration, replication, caching
- Inter-brick communication
- Distributed transactions
11Distributed application worker code
HW Device (NIC)
HW Device
12Distributed application worker code
- Runs on top of global runtime system
- Written by appliance designer
- Application-specific
- Database
- scan, sort, join, aggregate, update record,
delete record, ... - Transformational web proxy
- fetch web page (from disk or remote site), apply
transformation filter, update user preferences
database, ... - System administration tools implemented at this
level - Customized runtime system defines administrative
interface tailored to application
13Application front-end code
HW Device (NIC)
HW Device
14Application front-end code
- Runs on front-end interface bricks
- Accepts requests from LAN/WAN connection
- Incoming requests made using standard high-level
protocols - HTTP, NFS, SQL, ODBC,
- Invokes and coordinates appropriate worker code
components that execute on internal blocks - Takes into account locality and load balancing
- Database front-end performs SQL query
optimization, invokes distributed relational
operators on data storage devices - Transformational proxy front-end invokes
distiller thread on appropriate device brick - if data is cached, invoke on disk node
- otherwise, fetch data from web and invoke on
compute node or disk node
15Roadmap
- Layered software structure
- Example of introspection
- Runtime system extensibility using DSLs
- Conclusion
16From introspection to adaptation
Intelligent HWcomponents
Adaptive, self-maintaining appliance
Continuous monitoring
Extensible, application-tailored runtime system
- Example slowly-failing data disk in large DB
system - (1) Detect problem
- (2) Repair problem while continuing to handle
incoming requests - (3) Return to normal system operation
17Failing disk detection
- Microkernel monitoring module continuously
monitoring disks health detects exceptional
condition, e.g. - ECC failures
- Media errors
- Increased rates of ECC retries
- Notifies global fault handling mechanism
18Failing disk reaction
- Global fault handling mechanism
- Prevents system from sending more work to failed
device - Modifies global directory to remove entries
corresponding to failed components data - Application-specific response to impending
failure - Transactional system discard work currently in
progress on failing device, reissue to another
data replica - Non-transactional system w/o coherent replicas
checkpoint computation, restore on another data
replica - Transformational web proxy do nothing
- Instruct disk runtime system to shut disk down
- Disk device is considered failed
19Failing disk return to normal operation
- Global fault handling mechanism...
- Rebuilds data redundancy
- By allocating space for a new replica on a
functioning disk and copying data to it from
existing replicas - Using an application-specific data replication
mechanism - Where to allocate new replicas, how to copy data,
how to lay out data for new replicas, how to
update global directory - Example in upcoming slide
- Life returns to normal
- Degree of fault-tolerance has been restored
- Failed component can be replaced during
regularly-scheduled maintenance
20Roadmap
- Layered software structure
- Example of introspection
- Runtime system extensibility using DSLs
- Conclusion
21Runtime system extensibility
- Two ways of looking at system
- Partitioned on functional/mechanism boundaries
- Collection of libraries failure detection,
transactions, ... - Mechanisms are isolated
libfail
librepl
application
OS
libtrxn
libcache
22Runtime system extensibility
- Two ways of looking at system
- Partitioned on functional/mechanism boundaries
- Collection of libraries failure detection,
transactions, ... - Mechanisms are isolated
- Partitioned on global system properties
- This is how the programmer thinks about the
system (high-level) - e.g. application-specific data availability
policy - Failure detection (which devices to monitor, )
- Replication (used to restore redundancy)
- Transactions (how to restart work in progress)
- Caching (how to handle dirty cached objects
during failure)
libfail
librepl
application
OS
libtrxn
libcache
23Runtime system extensibility
- Two ways of looking at system
- Partitioned on functional/mechanism boundaries
- Collection of libraries failure detection,
transactions, ... - Mechanisms are isolated
- Partitioned on global system properties
- This is how the programmer thinks about the
system (high-level) - e.g. application-specific data availability
policy - Failure detection (which devices to monitor, )
- Replication (used to restore redundancy)
- Transactions (how to restart work in progress)
- Caching (how to handle dirty cached objects
during failure)
libfail
librepl
Customized runtimesystem library
application
OS
libtrxn
libcache
compiler
policy
24Extensibility using DSLs
- DSLs are languages specialized for a particular
task - Each ISTORE DSL
- Encapsulates high-level semantics of one system
behavior - Allows declarative specification of
- Behavior of one aspect of the system (a policy)
- Interfaces to coordinated mechanisms that
implement the policy - Is compiled into an implementation that might
coordinate several local and/or global base
runtime system mechanisms - May be implemented as background and/or
foreground tasks - Analysis tools can potentially infer unspecified
emergent system behaviors from the specifications - e.g. what impact will a new redundancy policy
have on transaction commit time - Extensions compiled together with local and
global base mechanisms form the distributed
runtime system
25Extensibility using DSLs Example
- AvailFailureDetected(Device d) Object o
ObjList objsTransaction t TxnList
txnsReplica x, c, rDirectoryMarkDeviceDisab
led(d)AdminAlertFailure(d)objs
DirectoryGetObjects(d) objs stored on
failed deviceforeach o (objs) x
DirectoryGetReplica(o,d) find os
replica on d DirectoryDeleteReplica(x)
delete from global directory txns
TxnGetActiveTxns(x) foreach t (txns)
TxnAbortTxn(t) abort pending txns
for o on d c DirectoryGetReplica(o)
find still-accessible copy r
LoadBalancerAllocateReplica(o) get space for
new repl LocalRuntimeCopyObject(c,c-gtdevice,r,r
-gtdevice) copy it DirectoryAddReplica(r,r-
gtdevice) update directory foreach t
(txns) TxnIssueTxn(txn,r) reissue
txns on new replica -
-
26Extensibility using DSLs (cont.)
- Similar specification written for each extension
to base library - Other examples of extensible system behaviors
- Transaction response time requirements
- Prioritizing operations based on type of data
processed - Resource allocation
- Backup policy
- Exported administrative interface
27Why use DSLs?
- Possible choices
- Each appliance designer writes runtime system
from scratch - Similar to exokernel operating systems
- All designers use single parameterized runtime
system library - Similar to tunable kernel parameters in modern
OSs - Designer writes high-level specification of
system behavior - DSL compiler automatically translates
specification into runtime system extensions that
coordinate base mechanisms - Advantages include
- Programmability
- Performance
- Reliability, verifiability, safety
- Artificial diversity
28DSL advantages (cont.)
- Programmability
- High-level specification close to designers
abstraction level - Easier to write, reason about, maintain, modify
runtime system code - Simple enough to allow site-specific
customization at installation time - Performance
- Aggressive DSL compiler can take advantage of
high-level semantics of specification language - Base library mechanisms can be highly optimized
optimization complexity hidden from appliance
designer - Web example infer that TCP checksums should be
stored with web pages
29DSL advantages (cont.)
- Reliability
- Automatically generate code thats easy to forget
or get wrong - Example synchronization operations to serialize
accesses to distributed data structure - Verifiability
- Of input code (DSL specification)
- More abstract form of semantic checking
- e.g. DSL supports types natural to behavior being
specified gt type-checking verifies some semantic
constraints - e.g. ensure no unencrypted objects are written
to disk - Of output code (coordinated use of base
mechanisms) - DSL compiler writer satisfied DSL compiler is
correct gt appliance designer inherits
verification effort - Safety (prevent runtime errors)
- Whole classes of general programming errors not
possible - DSLs hide details runtime memory management,
IPC, ... - Compiler automatically adds code
synchronization, ...
30DSL advantages (cont.)
Specification
DSL compiler
Implementation 1
Implementation 2
Implementation 3
...
- Artificial diversity
- Potentially allow system to continue operation in
face of internal bugs or malicious attack - Multiple implementations of component run
simultaneously on different data replicas - Continuously check each other with respect to
high-level behavior - Non-traditional fault-tolerance, but related to
process pairs - Potentially usable to enhance performance
- Select best-performing implementation(s) for
future use periodically reevaluate choice - Examples of possible implementation differences
- Low-level runtime memory layout, code ordering
and layout - High-level system resource usage (recompute vs.
use stored data, general space/time/bandwidth
tradeoffs)
31ISTORE software summary
- ISTORE software architecture provides an
extensible runtime environment for distributed
network service application code - Layered local and global mechanism libraries
provide introspection and self-maintenance - Mechanisms can be customized using DSL-based
specifications of application policy - DSL code coordinates base mechanisms to implement
application semantics and interfaces - DSL-based extension offers significant advantages
in programmability, performance, reliability,
safety, diversity
32ISTORE summary
- Network services are increasing in importance
- Self-maintaining scaleable storage appliances
match the needs of these services - ISTORE provides a flexible architecture for
implementing storage-based network service apps - Modular, intelligent, fault-tolerant hardware
platform is easy to configure, scale, and
administer - Runtime system allows applications to leverage
intelligent hardware, achieve introspection, and
provide self-maintenance through - Layered runtime software structure
- DSL-based extensibility that allows easy
application-specific customization
33Agenda
- Overview of ISTORE Motivation and Architecture
- Hardware Details and Prototype Plans
- Software Architecture
- Discussion and Feedback
34Backup slides
35What ISTORE is not
- An extensible operating system
- Use commodity OS, only add hardware monitoring
module - MM could just be a device driver gt no need for
microkernel OS - ISTORE could be built on top of an extensible
operating system for even greater flexibility - An attempt to make commodity OSs extensible
- Extensible runtime system allows designer to
customize higher-level operations than OS
extensions do - Closest to an extensible distributed operating
system built on top of a commodity single-node
operating system - A multiple-protection-domain system
- Assumes non-malicious programmer
- If user-downloaded code permitted, sandbox must
be implemented as part of (trusted) application - DSLs specify resource allocation/scheduling
policies, appliance designer responsible for
ensuring fairness - A framework for building generic servers
36ISTORE boot process
- (1) Initially, undifferentiated ISTORE system
- (2) On boot, each device block contacts system
boot server - (3) Device blocks download customized runtime
system and application worker code - Front-end blocks also download application
front-end code - Runtime system libraries structured as shared
libraries gt hot upgrade
37Example Appliances
- E-commerce
- Web search engine
- Transformational web/PDA proxy
- Election server
- Mail server
- News server
- NFS server
- Database server OLTP, DSS, mixed OLTP-DSS
- Video server