Title: The Pros
1The Pros Cons ofContent Addressed Storage
- Arun Taneja
- Founder Consulting Analyst
2Current Data Protection Environment
- Data Tsunami
- No Backup Windows
- Cost of Downtime Increasing
- Regulations and Compliance Requirements
- Data Protection Technology at Break Point
3Many New Technologies to The Rescue
4What is CAS?
5CAS vs Networked Storage
- SAN NAS Use File Systems to Place and Locate
Data (/abc/xyz/acme.doc) - Hierarchical
- Difficult to Scale Beyond TBs
- Application Determines if Duplication of Object
Exists - Indexing can Become Complicated
6How is CAS Done?
- Algorithm Applied to the Objects Content
- File
- Portion of a file
- Directory or file system
- Unique 128-bit Coding Results (160-bits for
Avamar)
128-bit hash unique to that object (eg. MD5)
7What Can CAS Be Used For?
- Archival Storage
- Backup and Restore
- Disaster Recovery
- Content Management
8Issues with Existing Architectures
9Methods for Keeping More Data Online
- Bigger Primary Storage
- Compression of Data
- Hierarchical Storage Architectures
- Data Normalization Finding Subsets of Data That
are Common and Storing Them Only Once - No Limit on the Effective Compression Ratio
- Indexing Systems Super Critical
10Commonality Factoring Using CAS
- Fixed Size Atomics for Database
- Variable Size Atomics for File Systems
- CAS Algorithms Used to Calculate CA for Each
Subset - Data Structures Needed to Reconstruct from
Atomics - Above Data Kept with Atomics Data
11CAS Example Avamar
- CAS Applied to BU/Restore, Archive and DR
(initial focus BU/R) - Focus on Data Reduction
- Typical Secondary to Primary Ratio is 101
- Avamar Claims 1.2 to 1
- Never Do Full Incremental Backups, Only SnapUps
12CAS ExampleAvamar Systems Architecture
- Distributed Backup Repository
- Peer-to-Peer RAIN Architecture
- Each Node has Uniform and Consistent View of
Repository
- Clients can Request Services from any Node
- Data Striped Across Nodes (similar to RAID)
- No Single Point of Failure
- Requires Agent on Each Client System
13CAS Archival ExampleEMC Centera
CA of CDF Returned
Centera
Application
CDF
CA of CDF
store
CDF
XML
Calculate CA and extract metadata
C-clip
metadata
CA
store
file
Blob
API
Source EMC
14CAS Advantages EMC Centera
15CAS Players
Persist Technologies
16CAS Futures What's Needed?
- Flexible Scaling Capabilities
- Integration with File Interfaces
- Easy API-free Application Integration
- Integrated Indexing
17Summary
18Taneja Group Recommendations
- Absolutely Test Out CAS Systems but
- Apply to a Project at a Time (consider the
disruptive factor) - Keep a Fallback Position (run systems in
parallel) - Test Out Recoverability Regularly
- Keep in MindMore Solutions Coming