Central Services Databases

1 / 31
About This Presentation
Title:

Central Services Databases

Description:

... tempspace, sniping of inactive sessions, auto start of listener, IA, estimate table/index stats ... on June 16, discussed the san testing in more detail. ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Central Services Databases


1
Central Services -- Databases Farms
  • Run II Computing Review
  • September, 2005

2
Databases
  • Current Resources
  • Support Levels
  • Effort
  • Hardware software
  • Monitoring tools
  • Completed and Current Projects
  • CDF Replication
  • SAM Schema
  • Freeware Support

3
Support Levels Effort
  • Database system administration support for
  • CDF offline
  • CDF online
  • D0 offline
  • 24x7 support for production databases
  • primary secondary on call
  • 5 machines databases
  • 9x5 support for Development / Integration
    databases
  • 6 machines
  • 11 databases -- 4 dev, 4 int, 1 cdfval, 2
    testbeds
  • SAM support, plus other application support
  • Effort
  • 3.5 DBAs (Trumbo, Stanfield, Kumar, Bonham)
  • 2 Sysadmins (Mihalek, Kovich, Kastner)

4
Current Resources
5
Current Resources
6
Monitoring And Data Modeling Tools
  • Monitoring Tools
  • dbatool/toolman
  • To monitor space usage, users, SQL, tempspace,
    sniping of inactive sessions, auto start of
    listener, IA, estimate table/index stats
  • OEM (Oracle Enterprise Manager) DB Monitoring
    tool
  • http//www-cdserver.fnal.gov/cd_public/css/dsg/db_
    stats/data/db_stats.html
  • Ganglia
  • http//fcdfmon2.fnal.gov/
  • Data Modeling Tool
  • Oracle Designer is used for data modeling and
    initial space estimates for applications.

7
Uptimes
  • CDF Online
  • 100
  • CDF Offline
  • 99.44 (1776 mins since 11 Nov 2004)
  • CDF Offline Replica
  • 100
  • D0 Offline Production
  • 99.85 Uptime (420 mins since 15 Nov 2004)

8
Completed Projects
  • CDF Online hardware replacement, implementation
    of Bzora1.
  • Replacement of basic replication with streams.
  • SAM Schema support for CDF and D0
  • Implementation of dcache/enstore for database
    backups to tape.
  • Introduction of SAN technology for backups.
  • Tested complete database recovery of d0ofprd1
    database.
  • New license agreement with Oracle Corporation.
  • Translated SAM schema to Postgres.
  • Replacement of CDF replica machine.
  • D0 Luminosity db deployed on Oracle 9i and 10g
    versions (on machine loaned from CDF).

9
Growth of D0 Offline DBs
10
Growth of CDF Online DB
11
Growth of CDF Offline DB
12
CDF Streams Replication
On-line Users
CAF and Others
On-line 9.2.0.6 (cdfonprd)
Replica1 9.2.0.6 (cdfstrm1) Replication Distributi
on
fnal Replica2 9.2.0.4 (cdfrep01)
Potential Streams
Streams
Four App
Off-line 9.2.0.6 (cdfofprd)
Remote Sites
47 Apps
Potential Streams
Remote Users
Farm Users
13
RMAN Backup on SAN
  • Inexpensive, large disk array can accommodate
    growing RMAN backups
  • Fast reliable backup and recovery
  • 24x7 and 8x5 support tiers available
  • Can serve various O/S platforms
  • Briefing on the database backup/recovery
    standardization on June 16, discussed the san
    testing in more detail.
  • http//www-css.fnal.gov/dsg/internal/briefings_and
    _projects/briefings/standardizing_database_backups
    .ppt
  • Multiplexing of archives to local disk and SAN

14
RMAN to SAN Experience
  • d0ofdev1 RMANs to SAN since Nov. 04
  • Two 1TB SAN mount points available
  • Keep 2 alternating days of RMANs on SAN,
    once/week to local backup disk
  • RMAN validation to determine backup file
    integrity
  • One validation failure since Nov. 04
  • Recoveries from SAN were all successful

15
SAM Schema
  • Production Deployments
  • Autodestination subsystem of SAM schema
  • Indexes on param values deployed in production
  • Data types correction cut
  • Indexes for volumes
  • Works-in-Progress
  • Request subsystem of SAM Schema
  • Cut in Mini-SAM
  • Upgrade to Mini-SAM as SAM Schema Evolved
  • This facilitates individual developers to have
    copy of SAM metadata and seed data available for
    server software rewrite if needed.
  • Mini-SAM in Postgres
  • Initiative to move towards freeware databases for
    SAM. Proof of product not complete, requires
    testing with a dbserver from the SAM development
    team

16
Oracle Contract
  • Negotiated a new contract w/ Oracle
  • Explicitly distinguishes between scientific
    business use, w/ different arrangements
  • Scientific provides for 2400 term user
    licenses renewed annually
  • Covers the entire user base
  • Can increase or decrease as needed (including
    decrease to zero)
  • Negotiated 87 discount (better than DOE)
  • Annual cost for scientific use increased from
    114k to 290k
  • But far less than feared
  • Discounts apply for five years

17
Freeware Support
  • Mysql/Postgres prototype
  • Proof of product with CDF data
  • Mechanism for population is on demand, it does
    not support updates
  • CDF successfully tested with CDF code -
    (Karlsruhe)
  • Providing consulting for freeware databases
  • actively maintaining new versions of mysql
    postgres in KITS and working towards a more
    robust environment
  • actively maintaining documentation for mysql
    postgres in our freeware area
  • http//www-css.fnal.gov/dsg/external/freeware
  • actively assisting users with questions,
    upgrades, testing, etc. for freeware products

18
Introduction to CSS Run II Farms Activities
  • Personnel
  • General system administration of farms
  • System design of new IO and concatenation systems
  • Deployment of new batch systems
  • Evaluation and benchmarking of new hardware types
  • Commissioning and troubleshooting of new compute
    servers
  • Fermigrid work for interoperability of farms

19
Personnel
  • 3 FTE's involved in day to day system
    administration (Tader, Van Conant, Syu)
  • Two team members in supporting role (Timm,
    Greaney) for planning and installation.

20
New CDF Farms Software
  • Production farms made transition to total new
    mode of processing
  • SAM used for data delivery
  • SAM used for concatenation bookkeeping
  • Condor with CAF scripts as batch system
  • Production farms were first large-scale demo of
    this setup, late 2004, now known as Phase I
    farm.
  • Phase II production farm commissioned in June of
    2005
  • Condor successfully deployed as batch system,
    success will lead us to use it on other farms as
    welll.
  • No use of FBSNG/DFARM anymore

21
Monitoring
  • All farms are monitored with Ganglia
  • http//fnpca.fnal.gov/ganglia/
  • ngop is used for detecting error conditions
    generating alerts into the helpdesk system

22
New CDF Farms hardware
  • Phase I farm
  • 27 worker nodes, all off-warranty, used for SAM
    testing
  • Phase II production farm
  • 174 worker nodes, for production reconstruction
  • Six IDE RAID servers
  • Use for concatenation on local disk
  • Can be moved between two farms as needed.
  • Already have throughput of gt2 TB/day, and can
    still increase.

23
CDF Condor Phase I
24
D0 Farms Design
  • 448 worker nodes in D0 farms currently
  • Goal is to migrate functions off of SGI head node
    d0bbin
  • Batch system already has been moved
  • Will use 5 linux worker nodes, SAN-attached disk,
    and Global File System for concatenation
  • New NFS head node coming soon, also will use
    SAN-attached disk
  • Master of SAM station will move to another node

25
D0 Farm Utilization
26
Opteron Hardware Evaluation
  • CSS led multi-department taskforce to evaluate
    AMD Opteron technology
  • Found Opterons of similar cost to their Intel
    counterparts to be significantly better on D0
    code
  • Opteron hardware will be purchased in FY2005
    acquisition of worker nodes.
  • Significant improvement in performance per
    dollar, also more efficient in electrical power.

27
FY2004 Farms Acquisition
  • First major deployment in new Grid Computing
    Center
  • Farm team staff were contacts for installers
  • Supervised burnin for 160 CDF 280 D0 nodes
  • Includes CDF CAF, D0 CAB nodes as well
  • Led troubleshooting effort for several months
  • Physically present in GCC with vendor staff
  • Resonant vibration between system drives cases
    fans resulted in intermittent disk errors

28
FERMIGRID
  • CSS personnel part of Fermigrid gateway team
  • Goals for Fermigrid
  • All Fermilab clusters can interoperate
  • Unified interface to Open Science Grid
  • General Purpose Farms are first Fermigrid compute
    element
  • Other farms we manage will follow shortly.
  • http//osg-cat.grid.iu.edu/

29
Open Science Grid Gatekeepers
  • GP Farms
  • Existing OSG gatekeeper being used by CDF to run
    Condor glide-ins (GlideCAF).
  • D0 Farms
  • Machine has been ordered, we will install.
  • D0 CAB
  • Run2-sys personnel are attending our training
    sessions, will learn to install, machine has been
    ordered.
  • CDF Farms
  • Some new nodes will be available to OSG and
    Condor glide-ins, we will install the gatekeeper.

30
SAMGrid Gatekeepers
  • J. Snow experience running D0 jobs on CMS shows
    it is better to have separate SAMGrid gatekeeper.
  • SAMGrid testbed part of GP Farms since early 2004
  • We manage gatekeepers
  • Used for testing mostly, not production
  • D0 Farms SAMGrid gatekeeper soon to be replaced
    with faster node.

31
(No Transcript)
Write a Comment
User Comments (0)