Core Services - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Core Services

Description:

Educational programs. Expert technical assistance ... Procedure to allow remote sites to populate local replicas (freeware or Oracle) on demand ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 24
Provided by: mark575
Category:

less

Transcript and Presenter's Notes

Title: Core Services


1
Core Services
  • Run II Computing Review
  • Mark O. Kaletka, CD/CSS Department Head
  • 5/12/2015

2
Introduction
  • The Labs approach to general computing strikes a
    balance among
  • Centralized infrastructure services
  • Common tools approaches
  • Flexibility in accommodating local requirements
  • In the last two years, considerable progress
    towards
  • Consolidation of roles services
  • Better coordination cooperation
  • Increased flexibility in shifting resources

3
Core Services
  • Central Services
  • Scientific Linux
  • Customer Support
  • Database Administration
  • Farms Support
  • Equipment Support Logistics
  • Covered in separate talks
  • Networks
  • Data Handling Mass Storage
  • Software Development Application Support

4
Central Services
  • EMail
  • Email gateways route email, virus scanning spam
    tagging, processing 800,000 message/wk
  • IMAP servers provide online storage of email
    2nd tier of virus scanning, backed by SAN
  • OpenAFS
  • Global (in true sense) file system backed by SAN
  • Web Servers
  • Central web servers backed by OpenAFS
  • Backups
  • Early stages of enterprise backup service
  • FNALU
  • Central general-purpose interactive batch UNIX
    cluster
  • Windows Support
  • Windows domain central infrastructure
  • Distributed server local desktop support
  • Other services
  • Printing, CVS, sw OS support, hw sw
    contracts, hw repairs, licenses, news, video
    conferencing streaming, ( would like to add
    more)

5
Fermi Linux
  • Experience with Fermi Linux back to FL 5.0 (1998)
  • http//www.fnal.gov/docs/Recommendations/dr0010.ht
    ml
  • Currently 2 FTE of formal support active
    community support
  • linux-users_at_fnal.gov
  • http//listserv.fnal.gov/archives/linux-users.html
  • Take advantage of GPL other public sources
  • Changes we make (value we add)
  • Add common HEP applications
  • Fix things that are broken
  • Fermilab security requirements
  • Workgroup customizations
  • These have been discussed at HEPiX and elsewhere

6
Fermi Linux
  • Reconstruction analysis farms are at FL 7.3
  • Largely driven by Run II experiment requirements
  • But (currently) also the case for CMS GRID
  • This also drives what is done on desktops
  • Currently frozen with security errata from
    Fedora
  • Fermi Linux LTS 3.0.1 is not yet fully certified
  • In the physics application sense
  • No clear indication when experiments are ready to
    move
  • Servers use commercial Linux
  • Driven by 24x7 HW SW support
  • Typically Oracle database servers, plus a few
    others
  • Red Hat Enterprise 3
  • Typically purchased w/ server (e.g. Dell, HP)
  • Only a handful of systems
  • Plus a little bit of chaos
  • D0 postdocs run a Fedora desktop analysis cluster
    (ClueD0)
  • Others do other things (SuSE, )

7
Scientific Linux
  • Scientific Linux is
  • A natural outgrowth of Fermi Linux
  • And an abstraction layer on top of Fermi Linux
  • Workgroup gt Site
  • Offered to the HEP community as an alternative
  • Much of this is work we would do anyway, others
    should benefit
  • But, this may not be right for everyone
  • In the spirit of Open Source
  • Conforms to GPL license Red Hats trademark
    guidelines
  • An opportunity to work together
  • Intended to be a community support effort
  • But this will require some formal infrastructure
    coordination
  • Not an offer by Fermilab to support everyone!
  • General agreement with LCG at Spring 04 HEPiX
    meeting
  • A real project http//www.scientificlinux.org

8
Computer Security
  • Integrated computer security program
  • Modeled on integrated safety program
  • Computer security team provides
  • Guidance on policy best practices
  • Educational programs
  • Expert technical assistance
  • Central authentication vulnerability scanning
    services
  • Interface to other organizations
  • Incident response team drawn across Laboratory
  • Volunteer fire brigade of local experts
  • 24x7 call rotation

9
Customer Support
  • CD HelpDesk
  • First-tier support for users, dispatch for
    second-tier experts
  • Point of contact for accounts authentication
  • Point of contact dispatch for computer security
  • Remedy application to dispatch track open
    tickets, escalate, page, etc.
  • Integrated w/ automated monitoring via API
  • Knowledge of which systems are important (i.e.
    24x7)
  • Knowledge of paging escalation procedures
  • TelAssist commercial call center service
  • Provides off-hours human presence
  • Very flexible cost-effective

10
Database Administration
  • Resources
  • Four production servers seven SAM dbservers
  • 24x7 with primary secondary on call
  • Seven development / integration servers
  • 9x5 with primary secondary
  • 3 DBAs 2.5 sysadmins
  • Current Projects
  • SAM schema support for CDF and D0
  • Replacement hardware for CDF online database
  • Streaming technology to replace basic replication
  • Alternatives for rman tape backups of very
    large dbs
  • Standby database hardware for CDF online
  • Procedure to allow remote sites to populate local
    replicas (freeware or Oracle) on demand
  • Oracle Advanced Security Option

11
Database Administration
12
Database Administration
13
Database Administration
8.7.1.4 Off-line DB (cdfofprd)
9.2.0.4 On-line DB (cdfonprd)
Farm Users
4 Applications
Basic
DFC
4 Applications

Basic
Basic
9.2.0.3 Off-line DB (cdfrep01)
Failover for Read Service ONLY
On-line Users
CAF and Others
14
Database Administration
On-line Users
CAF and Others
On-line 9.2.0.4 (cdfonprd)
Replica1 9.2.0.4 (cdfrep02) Replication Distributi
on
fnal Replica2 9.2.0.4 (cdfrep01)
Streams
Streams
Four App
Off-line 9.2.0.4 (cdfofprd)
Remote Sites
Five App
Streams
Remote Users
Farm Users
15
Database Administration
  • Oracle Licensing
  • 135 concurrent user licenses will be renewed
    until Dec. 31, 2006
  • Paperwork in the works on this
  • Working with Business Services on new license
    model, in preparation for lab-wide Oracle license
    Jan. 2007
  • After Dec. 31 2006, concurrent licenses no longer
    an option, Oracle insists on per-cpu licensing
    for external-facing applications
  • Expenive! (1-3M)
  • Were negotiating ?
  • Alternatives?

RUN II Databases Only Avg Users Per Month
D0 Online 18
D0 Offline 25
CDF Offline 22
CDF Online 32
CDF Rep01 18
Total 115
16
Farms Support
  • CSS provides system administration for
  • CDF Reconstruction
  • 132 nodes
  • D0 Reconstruction
  • 388 nodes
  • General Purpose
  • 171 nodes
  • CMS
  • 313 nodes
  • Prototype (test)
  • 18 nodes
  • Total of 1022 nodes installed ( more on the way)

17
Farms Support
thru Apr 2004
18
Farms Support
Installation of CDF D0 systems at New Muon Lab
Construction of HDCF at Wide Band Lab
19
Farms Support
  • Software support
  • 4 FTEs from CSS department for system
    administration of Run II reconstruction farms
  • Also support for middleware (FBSNG, dfarm)
  • Hardware support
  • Vendor warranty (3 yrs) D1 contract
  • Parts on-site labor on recent purchases
  • Hardware repairs dominate system support

20
Farms Support
  • All farms are managed in a consistent way
  • Fermi Linux farms workgroup
  • Rocks for installation
  • NGOP monitoring
  • Ganglia monitoring
  • FBSNG scheduler
  • Hardware evaluation, specification, procurement
    burn-in
  • Farms Users meetings

21
Farms Support
  • SAMGRID Testbed
  • Established in March of 2004
  • Part of General Purpose farms
  • This was requested so that development could
    occur in a generic environment with nothing Run
    II specific
  • 31 worker nodes, formerly from the old D0 farm,
    13 more nodes still coming
  • One grid interface node runs Globus gatekeeper,
    other is a SAM station supplies data to and
    from the worker nodes
  • Local batch manager is FBSNG, hosted on fnsfo
  • Reserved for SAMGRID activities only
  • Only SAMGRID user can run on them, and SAMGRID
    user can't run anywhere else in the farm
  • Purpose was for SAM-GRID and JIM Developers to
    test out software
  • D0 Runtime Environment tests also requested
  • Development has centered on D0 MC and CDF MC thus
    far

22
Equipment Support Logistics
  • Electronics Support Repair
  • Repair test electronics in direct support of
    RUN II
  • SVX type electronics, CAEN power supplies,
    Michigan TDC96 boards, BiRa HV system, FEB (Front
    End Board) boards, Motorola VMIC VME crate
    processors, etc.
  • 330 items repaired in 2004
  • Also performed special CDF requests to install
    board modifications
  • Review electronic parts used in repair of SVX
    type boards for Run II
  • Identified small number of parts where an
    increase in spare parts inventory was recommended
  • Logistics Services
  • Receive, inventory, tag deliver equipment
    purchases

23
Summary
  • We provides computing resources and services to
    help accomplish the laboratorys scientific
    mission.
  • These not only directly support the scientific
    program, especially Run II, but also provide
    support for many of the engineering, technical
    and administrative needs of the lab.
Write a Comment
User Comments (0)
About PowerShow.com