Status:%20Central%20Storage%20Services - PowerPoint PPT Presentation

About This Presentation
Title:

Status:%20Central%20Storage%20Services

Description:

RHEA (2nd generation cluster) NAS Status. 3/06 NAS heads ordered ... Additional NAS heads purchase (RHEA) Year 1 projection revised to 200TB deployed storage ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 44
Provided by: CDCS7
Category:

less

Transcript and Presenter's Notes

Title: Status:%20Central%20Storage%20Services


1
Status Central Storage Services
  • CD/LSC/CSI/CSG June 26, 2007

2
Storage Services
  • File Based Storage
  • NFS/CIFS (BlueArc) Fast on-site access
  • AFS Global access, authenticated filesystem
  • Block Based Storage
  • Fibre-Channel connect to SAN
  • Archival Storage
  • Backups

3
NAS Status
  • Newest Service
  • 2 Production clusters
  • Fermi-Blue (1st generation cluster)
  • RHEA (2nd generation cluster)

4
NAS Status
  • 3/06 NAS heads ordered (Fermi-Blue)
  • 5/06 Pilot deployment
  • SLF, DSG, KITS, PPD and FESS department servers
  • Year 1 projection 10TB deployed storage

5
NAS Status
Projected Rollout
Department Servers, Array Consolidation
Phase 1
Rollout to Farms servers
Phase 2
Rollout to Farms Workers
Phase 3
Year 3
Year 2
Year 1
6
NAS Status
Actual Rollout
Department Servers, Array Consolidation
Phase 1
Rollout to Farms servers
Phase 2
Rollout to Farms Workers
Phase 3
Year 3
Year 2
Year 1
7
NAS Status
  • Actual Year 1 Deployment
  • Q2 2006 Pilot Program
  • Early adopters
  • Timing
  • CMS home area evaluation
  • Fermigrid NFS issues

8
NAS Status
  • Actual Year 1 Deployment (cont)
  • Q3 2006 Production
  • Phase 1 in full production
  • CMS Fermigrid go production (Phase 2)
  • Additional NAS heads purchase (RHEA)
  • Year 1 projection revised to 200TB deployed
    storage

9
NAS Status
  • Actual Year 1 deployment (cont)
  • Q4/2006
  • CMS and Fermigrid deploy to worker nodes
    (Phase-3)
  • Q1/Q2 2007
  • D0/CDF/Miniboone begin consolidation of servers
    into central NAS service
  • Requests for space from LHC, ILC and SDSS

10
NAS Status
  • NAS Storage Growth Year 1

11
NAS Status
Q3 2007 storage deployment _at_ 425-905TB
12
NAS Status
  • Current customers
  • Experiments
  • CMS, CDF, D0, FermiGrid/OSG, Miniboone
  • ILC, LHC, SDSS, Sciboone(?)
  • Departments
  • CD, Directorate, FESS, ESH, PPD, VMS
  • Services
  • Scientific Linux (FERMI), CVS, KITS, Alphaflow,
    Enstore

13
NAS Status
  • Benefits
  • Stability -- Savings multiplier
  • Effort re-directed towards supporting application
  • Reduced downtime Increased productivity
  • Consolidation (30 servers/storage arrays)
  • Reduce equipment support costs
  • Reduce power cooling

14
NAS Status
  • Benefits (cont)
  • Ease of use
  • Familiar storage solution minimal training
  • Flexible
  • Choice of storage tiers, price points

15
NAS Status
  • Challenges
  • Growth higher than expected Lun limit
  • Each cluster is limited to 256 luns
  • Each lun limited to 2TB
  • Upgrade to 64TB lun support expected EOY 2008
  • Criticality of service
  • Central location
  • Offsite DR required?

16
NAS Status
  • Challenges (cont)
  • Backup of large data an issue
  • Large data areas gt5TB
  • Millions of files
  • Logistics
  • Power
  • Floor space

17
NAS Status
  • FY08 Plans
  • Expansion of service
  • Participate in Tier 3 evaluation
  • Development of better reporting tools

18
NAS Status
  • More info
  • http//computing.fnal.gov/nasan/bluearc.html
  • Questions?

19
SAN Status
  • 272 Fibre-Channel ports
  • 128 ports added to fabric in 07 (CMS
    contribution)
  • Qlogic switches
  • 2Gb Fibre Channel Connections

20
SAN Status
  • 23 storage arrays
  • 12 centrally managed
  • Database Array (3PAR) purchased and tested
  • D0ora2 deployment 7/2/2007
  • Start retiring 1st Generation Tier 2 storage
    arrays (Infortrend)
  • 11 externally managed

21
SAN Status
  • 346TB
  • 156TB centrally managed
  • 190TB externally managed

22
SAN Status
  • SAN fabric opened up to external members
  • CMS, CDF, D0, Miniboone
  • Must retire LSI storage array
  • End of support (year end 2007)
  • Impacts IMAP/POP, AFS, DSG(CDF)

23
SAN Status
  • FY07 Plans
  • Additional HDS array
  • NAS storage for SDSS, Windows Migration, DSG
  • Block storage for LSI migration

24
SAN Status
  • FY07 Plans (Cont)
  • Purchase 2 Nexsan SATAbeasts
  • Replace 4 Infortrend arrays
  • Backup cache disk, DSG RMAN disks
  • Test as possible tier 3 candidates

25
SAN Status
  • FY08 Plans
  • Additional capacity for 3PAR
  • For sparing
  • DSG migration
  • Additional capacity for NAS
  • Decommission remaining Infortrend arrays
  • Other tier 3 alternatives (nexgen HDS, DDN)
  • Virtualization across arrays

26
SAN Status
  • Questions?

27
Site Backup Status
  • Service entering 4th year 10/07
  • 2 Backup Servers
  • Chasm (infrastructure and business)
  • Canyon (experiment)
  • 1 Library (600 slots)
  • 8 SAIT-1 Tape Drives
  • 2 Infortrend Storage arrays
  • TiBS Backup Software

28
Site Backup Status
  • 22TB data
  • 12,700 backup volumes
  • 5,506 UNIX/Windows, 7171 AFS, 25 NDMP
  • 452 clients
  • 18.5 increase in past 6 months (3.7TB)
  • No single volume gt 100GB

29
Site-Backup Status
  • Typical Daily Backup Timeline (canyon)

24 hour window
600PM
200AM
100PM
140PM
Debug
Incr/Network
Merges
Retry
30
Site-Backup Status
  • Issues
  • Resolving client backup issues
  • High client volatility
  • Reconfiguration/Renaming/Reinstalls
  • Large delta in data
  • Contacting admins
  • Slow client network performance

31
Site-Backup Status
  • Issues (cont)
  • Merge problems
  • Can be difficult to debug
  • Tape drive/Software or combination
  • Cache disk
  • Multiple disk failures

32
Site-Backup Status
  • Issues (cont)
  • SAIT-1 drive performance issues
  • Tapes written on one drive are slow to read on
    another
  • Long debug time gt 1 hour
  • Usually requires multiple replacements
  • Sony and Spectra investigating
  • Too few

33
Site-Backup Status
  • FY07 Plans

IP
LTO-4
NDMP
Chasm
SAN
Disk Cache
SAIT-1 Drives
Migrate more backups to NDMP
IP
Relieve pressure on chasm
Canyon
Migrate clients from canyon to chasm
Relieve pressure on canyon
34
Site-Backup Status
  • FY07 Plans (cont)
  • Upgrade cache disks
  • Replace aging Infortrend disks
  • Higher performing array
  • RAID 6

35
Site-Backup Status
  • Challenges
  • Desire from users to expand backups
  • Larger backup volumes
  • Larger backup sets

36
Site-Backup Status
  • FY08 Plans
  • Upgrade Servers to Solaris 10
  • Faster IP stack and Filesystem
  • Upgrade server hardware
  • Faster bus speed
  • Utilize faster cache disk
  • Take advantage of faster filesystem
  • Feed faster tape drives
  • Migrate canyon backups to LTO-4

37
Site-Backup Status
  • FY08 Plans (cont)
  • Investigate Disk-based library
  • TiBS specific implementation
  • Use common disks as a disk library
  • Synchronous copy to tape (also)
  • Faster restores, possibly backups
  • May increase overall backup system throughput

38
Site-Backup Status
  • FY08 Plans (cont)
  • Investigate Virtual Tape Library
  • Agnostic solution (not TiBS specific)
  • Asynchronous copy to tape
  • Emulate tape drives and libraries
  • Faster restores and backups
  • Will increase overall backup system throughput
  • Some systems have data-deduplication
  • Inline or post-process

39
Site-Backup Status
  • More information
  • http//computing.fnal.gov/site-backups
  • Questions?

40
AFS Status
  • 12 AFS servers
  • 17TB storage
  • Largest customers Minos and Web
  • Roughly 8-10 increase per year
  • (Based off number of volumes)
  • Must migrate servers off of LSI storage array and
    onto HDS Tier 2 storage.

41
AFS Status
  • FY07 Plans
  • Migrate data to HDS Tier 2 disks
  • Migration partially complete (1.8TB installed)
  • Tier 2 storage re-allocated to NAS due to high
    demand
  • Test Solaris 10 AFS server with ZFS

42
AFS Status
  • FY08 Plans
  • Upgrade Servers to Solaris 10
  • Faster OS filesystem and IP stack
  • Newer CPUs low power
  • Dual Power Supply
  • Upgrade OpenAFS
  • Multi-domain support
  • Support for gt 2GB files
  • Promote RO copies to RW copies

43
AFS Status
  • More information
  • http//computing.fnal.gov/nasan/afs.html
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com