Title: CSG Storage Workshop
1CSG Storage Workshop
2Workshop Team
- Bruce Vincent (Stanford)
- Scotty Logan (Stanford)
- Dennis Cromwell (Indiana)
- Ron Thielen (U Chicago)
- Kitty Bridges (Michigan)
- Additional presenters
- Cory Snavely (Michigan)
- Jim Pepin (USC, en route to Clemson)
3Agenda
- 1-120 Survey Results (Kitty)
- 120-215 Framing the discussion definitions,
emerging technologies, policy, architecture
(Ron) - 215-245 Case Study Research storage (lots of
small files, as in genomics) (Jim) - 245-3 BREAK
- 3-330 Case study library archive storage (Cory)
- 330-4 Case study research storage (lots of big
files, as in astronomy, physics) (Dennis) - 4-445 Case study tiered storage architecture,
virtualization (Ron, Scotty)
4Survey Results
5Institutions Responding (19)
- Brown
- Carnegie Mellon
- Columbia
- Duke
- Georgetown
- Harvard
- Indiana
- MIT
- NYU
- Penn State
- Princeton
- Stanford
- University of California
- University of Chicago
- University of Colorado
- University of Michigan
- University of Virginia
- University of Wisconsin
- Virginia Polytechnic Institute and State
University
6File Service Offered
Distributed file service (AFS, DCE, DFS, WAFS) 42
Microsoft DFS 11
CIFS 53
NFS 47
WebDAV 47
Xythos 26
Other 37
7File Service Quotas
lt100 MB 101-300 MB 301 MB-1 GB 1 GB- 10 GB gt10 GB
Students 4 4 2 8 0
Faculty 2 2 3 8 1
Staff 2 2 3 8 1
Other individuals 2 2 2 6 0
Depts 2 2 3 8 2
Courses 2 2 3 5 3
Research groups 2 2 2 5 3
8How Many Staff Manage It?
1 to 3 staff 11
4 or 5 staff 6
6 or more 2
Separate groups? Yes 74 (14) No 26 (5)
9Staffing
- Do you have problems recruiting and retaining
skilled storage staff? - Yes 47.4 (9) No 52.6 (10)
10Mitigating these problems?
- Adjusted the salary levels for storage staff to
reflect the market. - Hardware and Software Consolidation,
automation/tools, delegation/self-service - By keeping things simple (e.g. no SAN
requirements, etc.), but it is unclear how this
will scale. - Standardize on more current mainstream
technologies and move away from technologies with
a strong historical legacy but not a significant
current relevance such as AFS. - We're currently using some contractors, and more
training for our existing staff, but we're still
looking for ideas. - Take longer to find people who love storage and
that is what they want to do as a career.
11Mitigating these problems?
- Staff members with good storage management skills
are just one of the areas where we worry about
finding and keeping talent. It is difficult for
us to match corporate compensation, but we find
that money is not the ultimate decision maker.
Training is important and we are diligent about
keeping a training budget. We allocate 1400 per
FTE in the organization. This is pooled and
allocated as appropriate, so it gives us some
flexibility to keep staff skills fresh and to
build depth in critical areas. We also like to
keep work interesting, which is very important to
highly motivated technical staff. In addition, we
recognize outstanding work with both small
monetary bonuses and publicity. Still, we
recently lost our best SAN person.
12Client replication technologies, (MS Offline
Files Folders or Apple's Portable Home
Directories?
13Block storage (11 using, considering)
- Considering iSCSI
- Dont provide block storage direct to end-users.
We think we can do it via iSCSI but haven't
really done much than ponder the possibility. - SAN and NAS, as well as DAS
- Provide block storage to major campus "services"
through EMC software the EMC DMX Clarion
storage systems. - Fiber channel primarily iSCSI planned for future
- EMC DMX and Clariion
- Fibre Channel SAN available to servers hosted in
data center - Block storage is only provided in the data center
- NAS
- NFS, iSCSI and SAN attached.
- SAN
14Storage in Data Center
- EMC Clarion and DMX, HP EVA 8000 and 5000, also
use CISCO directors for the SAN - Network Appliance
- EMC, NetApps
- Almost exclusively EMC currently researching
other vendor's offerings (SUN, Hitachi and Apple) - IBM, Sun/Hitachi
- EqualLogic for iSCSI ACNC for direct attached
- Network Appliance, SUN, Apple, Hitachi, IBM
- IBM FastT and Shark HP SANStorage
- Network Systems EMC CX700 Storage Array for
Databases, FileServices, Exchange and Backup
Storage Brocade Silkworm Switches Storage TEK
L180 Tape Library Administrative Computing
Storage Tek V2X Storage Array for mainframe, AIX
Oracle systems Brocade Silkworm Switches Storage
Tek Timberwolf Tape Library - EMC DMX, Clariion HP EVA Equallogic iscsi
15Storage in Data Center
- Cisco MDS 9000 series switches, Brocade Silkworm,
EMC Clariion, DMX and Sun - EMC DMX and Clariion storage arrays, EMC NS704G
NAS gateway, Nexsan ATAbeast and SATAbeast
storage arrays, IBM and StorageTek tape
libraries, Cisco and Qlogic Fibre Channel
directors and switches Emulex and Qlogic Fibre
Channel HBAs - NetApp, Dell/EMC SAN, Sun/Hitachi SAN, StorageTek
SAN, IBM SAN - EMC DMX, HDS USP Tagmastore, IBM DS4800, Sun 5320
NAS Gateway - IBM, Sun, and Network Appliances for central
services - Primary SAN for Enterprise systems will be an
Hitachi Tagmastore, moving from an IBM Shark. We
have a mix of Direct attached storage, SAN and
NAS for other areas inlcuding EMC and HP. - Primarily Sun storage (w/ Brocade switches) for
Unix platforms. Windows environments typically
have direct attach storage on file servers from
Dell. - IBM mid-range SAN disk, Brocade switches, IBM
tape robotics with 3592 drives, NetApp filers,
Linux and Windows servers, IBM SAN Volume
Controller for virtualization
16Remote Access
17Underlying Tech for Service
18Specialized Storage for HPC?
Yes 37 (7) No 63 (12) NetApp Sort of, we
provide Network Appliance filer space to our
research clusters. EMC DMX2000 and DMX800 via
Cisco SAN Fabric BlueArc (and other dedicated NFS
servers) IBM SAN IBM DS4800
19If one backend technology for all storage (7
reporting)
- Moving into that direction - virtualized storage
- but not there yet. We use the IBM SAN Volume
Controller (SVC) on the hDSAN side to virtualize
across the EMC and HP storage arrays. - Network Appliance Filers
- Some is on direct attached storage ( a case by
case choice by the user / service) but the vast
majority is on EMC storage over a fiber channel
SAN - SAN technology with an EMC Clariion storage array
that provides shared access to centralized
storage. - EMC DMX and Clariion FC storage presented via FC
McData switches, and CIFS via EMC NS704G NAS
gateway. Expect to expand to NFS and CIFS in
future. - HDS Tagmastore USP 1100
20Backing it up
TSM 12 (was 13)
TSM and replication 3
Legato 2
Atempo 1
Amanda 1
Veritas 1
Disk-based MAID technology 1
21Meets Needs?
22Backing up Mobile Devices?
- mobile devices PDAs - no. mobile devices
laptop - yes, moving to LiveBackup from Atempo. - Connected desktop backups
- TSM (3)
- Support for these devices is provided through a
server that is backed up with our Backup
Services. The only supported application is based
on MS Exchange and utilizes existing mailboxes
already on the servers and being backed up
23Desktop Backup
We are investigating providing desktop and laptop
backup and have made a short list of vendors in
this space. Included are Atempo, Asigra and Iron
Mountain
24Tiered Solution (17)
- TSM (4)
- All of our (3) tiers of storage use the same core
back up technology (TSM) which allows individual
customizations of timing and overall strategy - Multiple tiers of disk storage, but predominantly
use TSM clients on the hosts for backup. A few
exceptions for our large ERP systems where we
clone the volumes storing the database and back
them up from a dedicated server to minimize the
backup window - TSM, NSR, IBM's "flashcopy" and NetApp's
snapshots - IBM Tivoli Storage Manager policy domains,
management groups, and storage groups allow us to
implement numerous combinations of storage tiers
and timing strategies for different requirements.
25Tiered Solutions
- Use the traditional Unix methodology of daily,
weekly, and monthly incremental backups. - See detailed slide deck (Walter Wong from CMU)
- We havent done any tiered data with backups, we
do however utilize different disk solutions based
on the service requirements. For Example
Exchange We use only Fiber Channel 15000 RPM
Drives File Servers We use Raid 5 ATA Drives
Backup 2 Disk We use Raid 3 ATA Drives - 2 tiered storage based on perf/avail. No backup
distinction yet. - Use Veritas Netbackup for large backups that
require a small window to complete because of the
technology Vertias provides with support for
multiple streams we can complete backups more
quickly than we can with TSM. - All issues we need to consider though!
26Separate Archive Service
- Operate a ADIC StorNext HSM for research data
that sort of fits this mission. - TSM (5)
- We currently do minimal archiving using a
function in TSM and store both to tape and to
disk. - We provide archive via IBM Tivoli Storage Manager
for servers and services hosted by IT Services. - Do not provide a separate archive service and
currently use the archive component of both TSM
and Legato - We do some end-of-semesters that are archived for
5 yrs. - Minimal archive service for CIS internal use
only, using an archive to tape with Legato
Networker. We see archival strategies as a means
to increasing usable storage and will be looking
into other application and file system archive
solutions in the near future. Mainframe
environment monthly archive of unchanged files
to tape.
27Library Archives Service (19)
Library doesnt provide service 68.4
(13) Library alone provides service 26.3
(5) Library provides a service in partnership
with central IT 5.3 (1)
28Considered Outsourcing?
- We have a partnership with the Duke Health
System. They have a large SAN for health care
data that we leverage for top tier storage. - Twice researched the option and rejected it due
both to cost and loss of local flexibility. - Have looked at outsourcing backup twice. Both
times the cost benefit was not evident. When you
assess in detail the breakdown of roles/resp
breakdown between customer and provider, there is
still a significant resource commitment on the
customer side. - Considered outsourcing our desktop backups. We
had issues with lack of support for non-Windows
platforms, some support and bandwidth problems,
and no way to take a single charge from the
vendor and break it out to re-bill our users. - This topic brings perceived security/privacy
concerns to the surface a cost/benefit analysis
will be interesting outsourcing may fit as a
piece of the life cycle management of
information we are just at the early stage of
"consideration." - Considered it but found we could do it cheaper by
centralizing all of campus
29BC/DR Impact
- Increased the storage requirements and increased
the importance of the storage strategy. A driving
factor in our desire to explore a more
centralized enterprise storage architecture. - Customers are requesting shorter RPO's and RTO's
and in some cases are willing to accept longer
RTO's if RPO can be improved. Causing us to more
seriously assess replication, quasi single
instance stores, virtualization etc. Pushed us
toward service mirroring, replication - For top-tier storage, the SAN is in a different
location than the ATLs. TSM and offsite tape
storage is also used to provide recovery
capabilities. - Working on a project with Virginia Tech to
leverage our NLR connections to hold copies of
critical data at their location. - We must modernize and update to new software, new
hardware, employ disk-to-disk-to-tape, and use
off-site replication of storage arrays. - Have looked to adopt technologies that support
replication and failover in order to support an
enhanced DR strategy. - SAN extension and replication combined with both
server and storage virtualization enable us to
address BC and DR requirements in ways that were
previously unapproachable.
30BC/DR
- Post 911 BCP and DRP awareness has increased
dramatically but unfortunately that awareness has
not been accompanied by significant increased
funding. We are now faced with almost
uncontrolled growth in amount of stored data.
Since we actively mirror all institutional data
the end result of the explosion in data storage
has been the consumption of our entire storage
budget in "storing". Our costs have grown too
fast for us to be able to mount a detailed and
effective BCP. - Currently in the process of re-evaluating our
entire backup and storage architectures. Also
driven by the new compliance issues around
Electronic Evidence and E-Discovery legislation. - Current disaster recovery strategy is ambiguous.
Nightly copies of our backup tapes are sent to
Iron Mountain for storage. In the event of a true
loss of datacenter disaster we would look for
assistance from our vendors and peer institutes
to assist us in our recovery. We have taken under
advisement the need to have a well documented and
institutionally driven business continuity plan.
When we do move forward in developing this there
should be many driving factors for storage,
including requirements for data redundancy,
archival, and data accessibility.
31Summary of Unsolved Problems
- Funding, funding models, costs
- Smart data storage (data de-duplication,
compression, life cycle management) - Multi-platform, with as close to native access as
possible - Replacing current distributed file services (DCE,
DFS) - Virtualization and tiering
- More, more, more (and staying ahead of or on par
with demand)
32Details on Unsolved Problems
- In the process of implementing a broad array of
storage, backup and recovery services. Developing
the detailed delivery approaches, the
funding/business models and the policy/procedure
details will require considerable effort in the
coming year. - Data de-duplication technology will make a huge
improvement in our ability to adequately backup
our data - Cost control which equates to storage volume
control. We are actively researching the
acquisition of an automated archiving solution
which would necessarily include storage
classification tools / software so we can control
both what and where (in terms of tiers and/or
archiving) we store as well as simply how much.
We are also looking in to adding a 4th tier of
storage with lower performance and (hopefully)
greatly reduced costs - A good replacement for what we have in DCE/ DFS
now. - Archival Desktop Backup Centralized Storage
for Research Computing - True virtualization of file and storage
infrastructure.
33Details on Unsolved Problems
- Universal client access with a reasonable
authentication / authorization environment
doesn't exist. vendor storage implementations are
about 10 years behind commonly accepted ideas in
storage/filesystem research and they don't seem
to be generally aware of that fact. scalable
horizontal growth of storage is just starting to
get mainstream and usually is appearing with the
iSCSI vendors. backup technology is slow to move
to CDP ("continuous data protection") and still
rooted in old models that don't fully leverage
the cheap storage that is available. transparent
tiering of data based on usage with policy is
barely a glimmer in commercial products. Selling
"ILM" solutions is confusing the market place and
solutions for properly tagging data is being
confused in or buried by marketing babble data
scrubbing of failed disk drives in RAID sets
doesn't seem to be in the common thought process
yet similarly, disk level encryption is still in
its infancy tape encryption is getting better
but that is still slow. key escrow of all this is
also a pain. "disks are cheap storage is
expensive" isn't common knowledge so there is
unreasonable pressure to provide "enterprise"
class storage at commodity pricing. Compare the
cost of a 750GB drive (0.60 per GB) vs. "cheap
RAID" (1.33 per GB) vs. "enterprise class SAN"
(10 per GB).... and none of these costs include
replication for BC/DR, backups, etc (though it
does include RAID and hot-spare overheads).
34Details on Unsolved Problems
- The need to provide a variety of network file
system solutions (cifs,nfs, iscsi) and not a good
platform for enabling that type of access easily
to the storage infrastructure we have built. - No single file protocol that works well from on
and off campus on Linux, Mac and Windows without
extra client software, provides rich access
controls and supports our central authentication
service. Enterprise scale laptop backup software
is still lacking for non-Windows platforms. Key
escrow for device and backup encryption is poorly
supported by vendors. - Growth of digitization of information,
information life cycle management, encryption,
compression - Petabyte storage solutions for researchers.
Trying to find a scalable way to store tens of
Petabytes w/o managing multiple storage arrays. - Adequate funding to keep up with the ever
increasing demand within the university community - Increasing demand for centralized file services
from departments. Multi-site redundancy of
storage for critical services. An enterprise
backup solution that would allow us to provide
backup services for department with local storage.
35Summary
- Growth in data is a huge problem - an unfunded
mandate in higher ed - Federal and other requirements for keeping and
protecting data for longer periods - Unmanaged data is becoming a larger problem -
were just keeping ever ything because it is too
hard to clean things up - Inefficiency
- Not aligning least used data with least expensive
solution - Backing up a lot that doesnt need backup, not
backing up what does - Cost and funding models
- Technology pieces complicated to knit together
into a solution
36NEXT Setting the Stage Case Studies
- Setting the stage
- Defining terms, regulations, architecture
- Case studies
- Library archive
- Large data stores
- Small data files
- Large data files
- Tiered storage and virtualization
- (Case Study on website)