Title: NeST: Network Storage
1NeST Network Storage
Flexible Commodity Storage Appliances John
Bent, Miron Livny, Andrea Arpaci-Dusseau and
Remzi Arpaci-Dusseau
2Terms
- Appliance (Merriam-Webster)
- b an instrument or device designed for a
particular use specifically a household or
office device - Storage appliance
- Storage plus access methods
3What storage users want
- Reliability and availability
- Manageability
- cost of management gt cost of storage itself
- no futz computing
- Scalability
- Performance
4What storage vendors have
- NetApp, EMC, others make storage appliances
(network-attached storage) - Manageable
- Just plug it in and it works
- Administrative web interface
- Reliable and available
- Standard RAID techniques
- High performance
- Specialized, thin OS focused on serving files
5What storage vendors get,annual revenues
NetApp 800 million in 2000
EMC 9 billion in 2000
6Whats the problem?
- False coupling between HW and SW
- Playground syndrome
- Myth of specialization
7H/W and S/W are bundled
- Hardware decisions are imposed
- Hard to ride commodity curve
- Example
- Netapp F720
- 35,000.00, 252 GB
- 138 / GB
- Maxtor DiamondMax
- 279.00, 80 GB
- 3.50 / GB
8Playground syndrome
- We have storage appliances . . .
- if you use these protocols,
- if you use these security mechanisms,
- if you are comfortable with our data semantics
- Non-flexible software entity
9Myth of specialization
- Specialize for one protocol on one machine
- Specialization decreases over time as
- Protocols are added
- Product line expands
- Example Netapp software
- Generation 1 fit on a single floppy
- Generation 2 took six
- Generation 3?
10Alternatives?
- Appliance (Merriam-Webster)
- a a piece of equipment for adapting a tool or
machine to a special purpose
11Our game?
- Flexible, commodity based, software-only storage
appliances - Goal
- Find a networked machine
- Drop some software on it
- Have a ready to use storage appliance with
flexible mechanisms
12New worlds, new problems
- Diverse hardware, software platforms
- Netapp, EMC advantage
- fewer platforms, control over OS
- Our approach
- Automate configuration to each host system
- Hardware example - use file system or self-manage
- Software example - use either read/write or mmap
- Cost of flexibility
- Key is design of the software
13Outline
- Introduction
- Building flexible storage modules
- Big picture
- Protocol layer
- Concurrency architecture
- Storage layer
- Motivations for flexible storage appliances
- Conclusion and current status
14NeST structure
- Cleanly separated modules for communication,
transfer and storage - Protocol layer
- Maps diverse protocols into common control flows
- Concurrency architectures
- Different models to maximize system throughput
- Storage layer
- Provides abstract interface to disks
15NeST structure
Central Control
16Protocol layer
A collection of servers is less than the sum of
their parts.
17Consolidate protocols
- Single point of control
- Storage quotas and guarantees can be supported
across multiple protocols. - Bandwidth can be controlled and quality of
service can be guaranteed. - Single administrative interface
- Set policies
- Manage user accounts
18Protocol layer implementation
- Each protocol listens on well-defined port
- Central control accepts connections
- Protocol layer reads from connection and returns
generic request object - Like Linux V-nodes
- Add new protocol by writing a couple of methods
19Protocol layer example,directory list request
20Concurrency architecture
- Three difficult goals
- Low latency
- High bandwidth
- Multiple simultaneous clients
- No single portable solution
- Provide multiple models to provide solutions on a
range of different platforms - Multi-threaded
- Multi-process
- Event driven
21Concurrency architecture
- Central control creates transfer object
- Socket descriptor from the protocol layer
- File descriptor from the storage layer
- Transfer object passed to concurrency
architecture
22Concurrency on Linux
23Storage layer
- Three needed areas of flexiblity
- File systems interfaces
- Example read()/write() or mmap()
- Abstract storage models
- RAID, JBOD, etc.
- User account administration
- Creation and removal
- Quotas and guarentees for users and groups
24File system interfaces on Linux
25Outline
- Introduction
- Building flexible storage modules
- Motivations for flexible storage appliances
- Conclusion and current status
26Clients have different needs
- Communication protocols
- Replacement costs
- Data semantics
- Security and authentication
27Communication protocols
- The Esperanto problem
- Too many protocols to implement them all
- Too many clients use proprietary protocols
Storage must allow pluggable protocols.
28Replacement costs
- Infinite cost to replace first class data.
- Variable cost to replace cached data depending on
size and distance. - Variable cost to replace job output files
depending on computation cost.
First class data
Cheap cached files
Cost aware storage can effectively increase its
own capacity.
29Data semantics
- Must stored objects be protected from read and
write dependencies? - Is transaction support necessary?
- Acceptable replies to storage requests.
30Data semantics, example
- Problem
- PFS on top of FTP fakes open
- read may then return file not found
- Solution
- Mechanisms are needed to support flexible
semantics independent of the transfer protocol.
Divorce semantics from the protocol.
31Security and authentication
- Ownership
- Privacy
- Encryption
- Authentication
- Access rights
32Who, when, how and how much?
- Who is allowed to use the storage?
- Promiscuity and monogamy are easy
- Polygamy is also easy
33Do I know you?
- Problem
- Migrant grid users may need temporary,
preferential storage access - Solution
- Provide mechanisms to
- advertise available storage
- create self-destructing user accounts
Matchmake applications with storage opportunities.
34Outline
- Introduction
- Building flexible storage solutions
- Motivations for flexible storage appliances
- Conclusion
- Current status
- Future work
- Concluding remarks
35Current status
- Concurrency architectures are done
- Gets, puts, reads and writes perform well
- Virtual protocol class interface is built
- NeST speak is fully implemented
- Grid ftp coming soon!!
- Simple first implementation of storage
reservations and remote quota management is done - Venkateshwaran Venkataramani
36Future work
- Discovery process of client storage requirements
- Quality of service guarantees for bandwidth and
storage - Support for transient and opportunistic users
37Concluding remarks
- Return storage to the commodity curve by creating
software-only storage appliances - Allow greater storage flexibility for a wide
range of application needs