Failure in Railway Signal Box, Altona Germany - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Failure in Railway Signal Box, Altona Germany

Description:

Failure in Railway Signal Box, Altona Germany – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 14
Provided by: vijairag
Category:

less

Transcript and Presenter's Notes

Title: Failure in Railway Signal Box, Altona Germany


1
Failure in Railway Signal Box, Altona Germany
  • Real Time System Failure
  • A Case Study
  • Vijai Raghunathan
  • Instructor Dr.Lumpp
  • Course EE585 Fault Tolerant System, Fall 2006

2
Introduction
  • In 12 March 1995, German railway replaced switch
    tower with a computerized system.
  • Computer System by Siemens
  • Railway Station Involved Hamburg Altona
  • Hamburg Altona extremely busy station with
    30000 passengers every day.

3
Altona Station
4
The Old System
  • 250 rail shunts
  • A few hundreds of signals
  • 7 major switch stands
  • 50 experienced switchmen

5
Electromechanical Systems
6
The New System
  • 18 switch stands
  • All the above controlled by Intel 486 based real
    time systems
  • One central operating and displaying system (BAR
    16) coordinating all stands.
  • BAR 16 16 bit interface
  • BAR 16 redundant hardware with 2 processors,
    ram and no disk
  • Incompatible with the old one.cannot run in
    parallel with old one.
  • Needs 40 switchmen lesser than old system.

7
The Computer System
8
Failure of New System
  • BAR 16 failed immediately after start up.
  • Cause not found for hours.
  • So Altona station temporarily shut down.
  • Passengers forced to take other railway routes
    25 kilometers away.

9
Cause
  • Programming Error
  • Possible Stack Overflow Condition
  • Routine handling stack overflow went into dead
    loop.
  • Allocation of stack required a few more bytes
    over 3500 bytes.
  • But RAM was only 3500 bytes.

10
What happened ?
Programs stack size
Actual Hardware Present
RAM
Stack Size (algorithm)
3500 bytes
Sizegt3500 bytes
Error zone
11
Bug Fixed
  • The bug appeared only twice in 4 days.
  • Finally on Wednesday (system failed on Sunday),
    bug was fixed.
  • A new RAM with 4000 bytes was built to allocate
    the stack routine.
  • 4000 bytes (500 more to give a safe upper limit)
  • Rail traffic back at 159pm on Wednesday.

12
Conclusion
  • Hidden Faults tough to find!
  • Siemens Manager said the team felt the stack
    routine would not be used due to the presence of
    a good RTOS.
  • Assumption that stack routine will never be
    used was a mistake.
  • Expect your software to run in the worst
    situations while designing RT Systems.
  • Altona Incident - Another Lesson Learnt by
    engineers!

13
References
  • http//catless.ncl.ac.uk/Risks/16.93.htmlsubj1.1
  • http//delivery.acm.org/10.1145/780000/773574/p7-n
    eumann.pdf?key1773574key27387399511collGUIDE
    dlGUIDECFID1337449CFTOKEN44661065
  • http//www5.informatik.tu-muenchen.de/huckle/bugs
    e.html
Write a Comment
User Comments (0)
About PowerShow.com