Title: FinalCommissioning
1Commissioning Monitoring
Mark Krasberg
April 30th, 2009
2DOMHubMonitor
- Has been running for several years
- Requires exact detector hardware response
- A single missing DOM will result in the
winterovers being paged - Once this season DOMHubMonitor complained that
there were too many DOMs on a DOMHub not the
usual type of error turned out a quad had been
plugged in to the wrong hub accidentally
3DOMHubMonitor
- Detector becoming increasingly complex/non-uniform
- SUMMARY
- --------------------------------------------------
--------------------------------------- - HUB AM 02 03 04 05 06 10 11 12 13 17 18 19 20
21 26 27 28 29 30 - COMM 2 60 60 60 58 60 60 60 60 60 60 58 58 60
60 59 58 58 58 58 - --------------------------------------------------
--------------------------------------- - --------------------------------------------------
--------------------------------------- - HUB 36 37 38 39 40 44 45 46 47 48 49 50 52
53 54 55 56 57 58 59 - COMM 60 60 60 60 58 56 60 60 58 60 60 59 60 58
59 60 60 60 60 58 - --------------------------------------------------
--------------------------------------- - --------------------------------------------------
--------------------------------------- - HUB 60 61 62 63 64 65 66 67 68 69 70 71 72
73 74 75 76 77 78 83 - COMM 60 60 60 60 60 60 56 60 60 55 60 59 60 60
59 60 60 60 60 60 - --------------------------------------------------
--------------------------------------- - -----------------------------------------------
- HUB T1 T2 T3 T4 T5 T6 T7 T8
- COMM 32 32 32 32 32 32 32 12
- -----------------------------------------------
DOMHubMonitor looks for deviations from the
expected configuration (checks quad cables, DOR
cards, firmware fuses, of communicating DOMs)
4checkDisk running since June 1st, 2008
- Design was to detect filling discs via trending
- Watching 440 partitions on gt100 machines
(servers and DOMhubs) at SPS - Emails, then pages the winterovers if a disk is
filling - Preempts problems BEFORE they occur
- Otherwise wed be running TestDAQ (backup) more
often - checkDisk has overachieved because it is an
end-to-end test - Filling disks ? the design
- Also
- Hung/crashed machines
- NFS problems
- LDAP problems
- Last week it even discovered a memory problem
5ICL power outage
Multiple failures including disk fills, NFS
mount issues, LDAP failures
CheckDisk implemented
CheckDisk turned off
CheckDisk turned back on
MAPO gets cold
Disk fills
ICL power outage (UPS)
Hub72 problem
Start of Paging system
Hub59 problem
pDAQ hangs
DOMHubMonitor (real time DOM hardware monitor)
implemented long ago
LID and Test runs ignored
6The end of AMANDA?
- AMANDA could cause problems for years
72008/2009 Commissioning Working Group
- Mark Krasberg
- Mathieu Labare
- Tilo Waldenmaier
- Camille Parisel
- Erik Verhagen
8(No Transcript)
9We were slightly ahead of schedule most of the
season
- Commissioning timeline is almost completely
determined by DOM freeze-in, commissioning itself
took as long as expected.
102008/09 Commissioning Timeline
11High Current/LC problems
- We have found a small number of them every year
during commissioning (this issue is never seen
during DOMtesting) - We took special LC waveform data (Chris Wendt
helped develop this test) on the surface to try
to pin down when the problem was occurring - Lots of high current DOMs with broken LC
discovered at the top of string 36 (DOMs turned
on around 2 weeks after deployment) - DOMTesting surface data was normal, which meant
the problem was occurring later
12High Current/LC problems
- It was extra windy during the deployment of
string 36 - Theory from Per Olof et al
- InIce cable charge buildup caused by wind,
resulting in discharge into DOM (LC circuitry is
not as protected from ESD as power circuitry is) - During pole discussions about this we learned
that the deployment ground strap for the InIce
cable had broken - May have been broken before string 36 deployment
13High Current/LC problems
- Actions taken
- Procedural changes were made for the remaining
deployments to mitigate ESD - Ground strap repaired
- Problem believed to be solved
- No high current DOMs found since
- There is often a question whether or not turning
DOMs on soon after deployment is a good idea - In this case it helped
14DOMs can repair themselves
- A small number of high current DOMs (from this
season and also from previous years) with broken
LC repaired themselves after running them for a
few hours. We have no explanation
15Local Coincidence part 2
- Multiple LC fails on the same string can indicate
accidental quad-cable swaps. - This season we found 8 swapped cables
- Cable swaps are easy to find/easy to fix
- At end of season the database was not
reinitialized after unswapping the last pair of
crossed cables - Database issue discovered by Dawn Williams during
flasher timing tests
16Spontaneous DOM failures since last spring
- 59-45 Essex_6 (IC40) started producing bad time
stamps - Split off on Jan 24, 2009
- 19-60 Coxae ATWD chip broke in Feb/Mar 2009,
pair split off - 18-46 Triquetrum suffered bad COMMs failure in
March, 2009, pair split off
17Results from 2009 commissioning
1876 DOMs
589 DOMs
1390 DOMs
2515 DOMs
66-33 New_York and 66-34 Dou_Mu go high
current
39-61 Hydrogen PMT breaks
39-22 Liljeholmen stops communicating properly
59-45 Essex producing bad data
54-47 Garbanzo_bean stops communicating
30-60 Rowan stops communicating
39-21 Aspudden slows down
19Commissioning with Multimon
20Problem DOMs
- There were several DOMs during commissioning with
oscillatory rates and even DOMs with high LC
rates (80 Hz instead of 15 Hz). - Mathieu left them for me to study
- When I looked at these DOMs in March, I saw no
problem
21Meteor Radar
1-minute ON 1-minute OFF
Always ON
22Meteor Radar Monitoring
- Two DOMs are currently devoted to monitoring the
meteor radar (via multimon) - In the last 10 days, the meteor radar has crashed
three times twice in the on position
Camille and Erik have been reporting the problems
to the meteor science tech - Email from Erik V about the science tech
- And he mentioned how funny it was that since
IceCube is complaining about it, there is not one
day without the radar having trouble...
23Plot from Dawn Williams
24Slide from Chris Wendt Problem is not confined
to the new obama DOMs
25SPOTS
5 minutes on, 5 minutes off
26IceTop InIce
Spikes have a 15-second period
Multimon has a 15-second period
27A Complex Configuration!(Slide from 2008)
- See http//wiki.icecube.wisc.edu/index.php/Problem
_DOMs - 12 dead or shorted DOMs
- 2 semi-useless DOMs
- (Blackberry and Hydrogen)
- 1 DOM set to low gain (Phenol)
- 1 IceTop DOM promoted to high gain DOM (Unagi)
- Florida has also been temporarily promoted
because partner has one bad ATWD - 26 unplugged DOMs
- 12 high current, 12 bad COMMs, 4 not frozen
(77-47 Hamster is still in water!) - 14 DOMs are being operated in headers-only mode
- Broken LC or no neighbors with working LC
- 12 DOMs have a broken LC link between them
- 85 DOMs have non-standard LC configuration
- Wed like to change the config so that a couple
of the DOMs only read out one of the ATWDs (one
of the ATWDs is bad). (recently implemented by
John J). - TOTAL NUMBER OF DOMs with non-standard config
88 - Around 3 of the deployed DOMs
28Run Configuration continues to become more and
more complex
- Run Configurations now include anti-meteor radar
measures. - 130 DOMs will be operated at different gains
(disciminators will be set to 0.25 PE to minimize
physics impact of different gains). - Maximum gain should be 3.5e7
- It is a challenge to get the configuration of
every DOM right!
29Commissioning
- Thanks to
- Mathieu, Tilo, Camille and Erik for all the
commissioning work - had to learn a lot of information in a short time
30IceStories
- icestories.exploratorium.edu
- Was granted my own private TDRSS data transfer
queue - Some days I transferred more data than IceCube
did - Easy to get data from South Pole? New Mexico
- Takes longer to copy data from New Mexico to
Madison
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)