Title: Systems Prognostic Health Management EMIS 7305 March 28, 2006
1Systems Prognostic Health ManagementEMIS
7305March 28, 2006
Systems Engineering Program
- Christopher Thompson
- Senior Research Engineer
- Lockheed Martin Missiles and Fire Control
Disclaimer This briefing is unclassified and
contains no proprietary information. Any views
expressed by the author are his, and in no way
represent those of Lockheed Martin Corporation.
2Topic Outline
- Introduction
- Definitions
- The Goal of Prognostic Health Management
- PHM Stakeholders
- PHM Modeling
- Sensors
- Prognostics Analysis Tools
- Availability
- Examples
3Introduction
- Education
- B.S. in Electrical Engineering, SMU (1997)
- M.S. in Mechanical Engineering, SMU (2001)
- - Focus Fatigue and Fracture Mechanics
- M.S. in Systems Engineering (one class remaining)
- - Focus Reliability, Statistical Analysis
- Ph.D. in Applied Science (anticipated 2008)
- - Proposed Dissertation Title
- Sensor Optimization for Systems
Prognostic-Diagnostic Health Management in a
Unmanned Ground Combat Vehicle
4Introduction
- Experience
- Lockheed Martin Missiles and Fire Control, Dallas
TX - Systems Engineer
- - Multifunction Utility/Logistics Equipment
(MULE) - Reliability Engineer
- - Army Tactical Missile System (TACMS)
- Lockheed Martin Aeronautics, Fort Worth TX
- Vehicle Systems - Prognostic Health Management
- - F-35 Joint Strike Fighter
- SMU School of Engineering
- - TA for Dr. Jerrell Stracener
5Introduction
- Future Combat Systems
- MULE Program
6Introduction
- Some keys to the successful fielding of the U.S.
Armys Future Combat Systems are - Reducing the Logistics footprint
- Increasing Availability
- Reducing total cost of ownership
- Implementing Performance Based Logistics
- Improvements in the ilities (RAM-T)
- Reliability
- Availability
- Maintainability
- Testability
- Supportability
7Some Definitions
- Prognostics - Of or relating to prediction a
sign of a future happening a portent. - Prognostics is the process of calculating and
reporting an estimate of remaining useful life
for a component, within sufficient time to repair
or replace it before failure occurs.
8Some Definitions
- Prognostic Health Management (PHM) The
implementation of an integrated software and
hardware system which monitors the health, status
and performance of a vehicle or system, tracks
consumables (oil, batteries, ammunition, filters,
fuel, coolant) and configuration (software
versions, part history), and determines
remaining life of all safety and performance
critical components, predicting failures before
they occur, thereby enhancing logistics and
maintenance activities. PHM consists of
on-board as well as off-board components.
9Some Definitions
- Diagnostics - The identification of a fault or
failure condition of an element, component,
sub-system or system, combined with the deduction
of the lowest measurable cause of that condition
through confirmation, localization, and
isolation. - Confirmation is the process of validation that a
failure/fault has occurred, the filtering of
false alarms, and assessment of intermittent
behavior. - Localization is the process of restricting a
failure to a subset of possible causes. - Isolation is the process of identifying a
specific cause of failure, down to the smallest
possible ambiguity group.
10Some Definitions
- Fault A condition that renders an element
unable to perform its required function at
desired levels of performance, or in a degraded
mode. -
- Failure The inability of a component, system or
sub-system to perform its intended function as
designed. Failure may be the result of one or
more faults. - Fault Tolerance The design of a system so that
it will continue to operate in a degraded or
reduced level rather than failing completely,
when some part of the system fails.
11Some Definitions
- Failure Cascade The result when a failure
occurs in a system of interconnected components,
and the successful operation of a component
depends on the successful operation of a
preceding component. Conversely, a failure can
trigger the failure of successive parts, and
potentially amplify the result or impact.
Redundancy and fault tolerant design can reduce
the criticality or impact of the cascade, but not
necessarily prevent a failure.
12Some Definitions
- Design Failures These take place due to
inherent errors or flaws in the system design. - Infant Mortality Failures - These cause newly
manufactured systems to fail, and can generally
be attributed to errors in the manufacturing
process, or poor material quality control. - Random Failures - These can occur at any time
during the entire life of a system. Electrical
systems are more likely to fail in this manner. - Wear Out Failures - As a system ages, degradation
will cause systems to fail. Mechanical systems
are more likely to fail in this manner.
13Some Definitions
- One-To-One Redundancy - Each active component in
a system has a redundant backup on standby. The
active component is monitored at all times, and
the standby component will activate if the
primary component fails. Since the probability of
both components failing at the same time is low,
One-To-One Redundancy provides the highest level
of availability, but at a considerable
disadvantage of requiring double the size,
weight, power and cost, while reducing
reliability (more components which can fail). -
14Some Definitions
- N X Redundancy N components are required to
perform a function, but the system is configured
with N X components. When any of the N
components fail, one of the X modules activates.
The advantage lies in reduced size, weight, power
and cost of the system, in the case where X is
smaller than N. In case of multiple component
failures, this scheme provides lesser system
availability.
15Some Definitions
- Load Sharing Multiple components share a
combined load. A higher level component manages
load distribution, and monitors the health and
status of the components. If one of the load
sharing components fails, the load is
re-distributed among the others, allowing for
graceful performance degradation. In this scheme,
there is almost no extra cost. The main
disadvantage is that multiple failures, system
performance may degrade below an acceptable
level.
16The Ultimate Goal of Prognostics
- The purpose of Prognostic Health Management is to
repair systems before they fail, while maximizing
useful life consumption, and to have the
necessary parts, tools and maintainers waiting
nearby to resolve the correct problem as quickly
and efficiently as possible.
17PHM Stakeholders
SYSTEMS ENGINEERING SOFTWARE SIMULATION TEST ENGINEERING MECHANICAL ENGINEERING ELECTRICAL ENGINEERING TRAINING PROD. SUPP.
PHM Model Design Interface Management Requirements Development Sensor Optimization CAIV/WAIV Analysis Prognostic Trending System Architecture PHM Model Integration Software Interfaces Fault/Failure Simulation Continuous BIT/PHM Test Planning Fault/Failure Criticality Fault/Failure Propagation Fault/Failure Simulation Platform Integration Crack Growth Sensing Stress/Strain Sensing Corrosion Sensing Vibration Sensing Consumables Monitoring Acoustic Sensing Thermal Sensing Sensor Implementation Sensor Integration Data Management Data Architecture Reliability/ Failure Modes Maintainability Testability Logistics Sustainment Training Safety
18Systems Engineerings Role in PHM
- Requirements Development
- System Integration
- System Architecture
- Interface Management
- Risk Assessment
- Performance Measures TPMs KPPs
- System Modeling Knowledge Integration
- Functional Decomposition
19PHM Requirements
- The PHM system shall isolate X percent of all
detected failures to a single component, within Y
percent confidence interval. - The PHM system shall predict X percent of
expected failures for the next Y hours of
operation. - The PHM system shall predict all failures that
can result in a Safety Critical Failure. - The PHM system shall incorporate sensors to
assess platform health, status and performance. - The PHM system shall incorporate sensors to
monitor platform consumables. - The PHM system shall record and store all sensor
data in onboard memory.
20The Ilities Product Support
- Reliability
- FMECA Failure Modes Effects Criticality
- FRACAS Failure Reporting Corrective Actions
- Measures MTBF, MTBSA, MTBEFF, MTBUMA
- Maintainability
- - Maintenance Ratio
- - Preventive Maintenance Checks
- - Condition Based Maintenance
- - Design for Maintainability
- Availability
- - AO, AI, AA
21The Ilities Product Support
- Testability
- - Verification and Validation
- - Fault Insertion
- - Simulation
- Supportability
- Consumables Monitoring
- Supply Planning and Prediction
- System Safety
- - Single Multiple Fault Tolerant Design
- Safety Critical Failures
- Human/Machine Interaction
22PHM Modeling
- eXpress Modeling Tool
- Model Based Reasoning
- Case Based Reasoning
- Knowledge Bases
- Prognostics Analysis Tools
23eXpress Modeling Tool
DATA MINING
DIAGNOSTIC, PROGNOSTIC PHM DESIGN
SENSOR FUSION
REQUIREMENTS ANALYSIS
Mission Assurance, Availability Success
Run-Time Prognostic Health Management
Performance Based Logistics
CONOPS, SPECS LOGISTICS
RISK ASSESSMENT
LIFE CYCLE TRADE SPACE
FRACAS FMECA DEVELOPMENT
BUSINESS CASES
24Impact Technologies
Prognostics developed at Impact Technologies
Gas Turbine Engines and Auxiliary Systems
Avionics PHM and Reasoning Aircraft Actuators
(EMA, EHA) Switching Mode Power Supplies, GPS
Receivers and Power Electronics Generators and
Electric Drive Systems Bearings, Gears,
Shafts, Drive Trains, and Clutches Hydraulic,
Lube Oil and Fuel Systems Structures and
Components Diesel Engines
25Impact Technologies
- Prognostics modules have been developed and
successfully tested on the following systems - Pratt Whitney F-100 engine on F-15 and F-22
- Engine, generator, lubrication system and
gearbox on Honeywell F124 - Oil wetted components on GE F110-129, GE F404,
Rolls Royce F405 - CH-47 T-55 engine and drive-train and
- CH-60 intermediate gearbox
- Blackhawk Carrier Plate Prognosis System
- JSF Clutch Wear and Lift-Fan Prognosis System
- Fuel system and Power generation system on
DDG-class Navy Ships
26Impact Technologies
- A number of different techniques have been used
in the development of these prognostics - Analytical and stochastic physics of failure
models - Advanced signal processing
- Feature extraction methods
- Health state estimation and prediction
algorithms - Statistical reliability
- Bayesian updating methods
- Component damage accumulation models
- Probabilistic remaining useful life estimation
- Data driven modeling techniques
27Model Based Reasoning
Model Based Reasoning (MBR) is a qualitative
scheme where a model of the system is combined
with an inference engine that is able to
accomplish fault detection and fault isolation.
The qualitative model is used to describe system
elements and components, interconnections, and
input/output behavior of the system being
diagnosed, or Knowledge Base and to establish
an envelope of correct behavior. To accomplish
diagnosis, the model determines what differences
exist between the actual behavior of the system
and the model of the system. The inference
engine, using this comparison information,
accomplishes the fault isolation task.
28Case Based Reasoning
Case Based Reasoning (CBR) is the process of
solving problems based on past understanding of
similar problems. The vast majority of this type
of information is contained within the
maintainers and operators the experience and
knowledge of the person using the system in
question. CBR compares a case, forms an implicit
generalization of the case, and then identifies
commonalities between a retrieved case and the
target problem.
29Knowledge Bases
inorganic sensor data
off-board prognostic trend analysis
organic sensor data
KNOWLEDGE BASE FMECA data fault/failure
propagation system level interactions functional
interdependencies physical interdependencies desig
n knowledge prognostic trend analysis CAD
models circuit layouts
Database Management Data Mining Feature Extract
ion
subsystem/ LRU internal sensor data
sensor fusion and signal conditioning
BIT data
consumables monitors
maintainer inputs
30Prognostic Analysis Tools
- Learning Systems Artificial Intelligence
- Genetic Algorithms
- Expert Systems
- Fuzzy Logic
- Neural Networks
- Database Techniques
- Feature Extraction
- Data Mining
- Mathematical Techniques
- Kalman Filtering
- Dempster-Schafer Method
- Wavelets
- Statistical Analysis
- Chaos Math?
31Prognostic Analysis Tools
- Traditional Academic Solutions to PHM
- Run-to-Failure analysis of large, expensive
systems, such as ship or rail engines - Analysis involves impractical, complex math
models that require years of training to
understand and interpret - Very expensive
- Time consuming process
- Rarely offer concrete design guidelines or
solutions
32Prognostic Analysis Tools
- Why Engineers in Industry Need More
- We have bottom lines and schedules to meet!
- We have customer requirements to satisfy!
- Systems Engineers work with designers who dont
like impractical, complex math models that
require years of training to understand and
interpret! - We have program managers who dont like very
expensive, time consuming solutions! - We like concrete design guidelines and solutions!
33Sensor Technology
- BIT/BITE
- Sensor Fusion and Virtual Sensors
- Sensor Conditioning and Filtering
- Smart Sensors
34Availability Analysis
- Availability, Achieved
- where
- MTBF Mean Time Between Failure
- MTTR Mean Time To Repair
35Availability Analysis
- Availability, Operational
- where
- MTBUMA Mean Time Between Unscheduled
- Maintenance Actions
- ALDT Administrative Logistical Down Time
- MTTR Mean Time To Repair
36Availability Analysis
- MTBUMA Mean Time Between Unscheduled
- Maintenance Actions
- where
- MTBM Mean Time Between Failures
- MTBM Mean Time Between Maintenance
-
37Availability Analysis
- How can we improve AO?
-
- - By decreasing Administrative Logistical Down
Time (ALDT) - - By increasing Mean Time Between Failures
(MTBF) - - By decreasing Mean Time To Repair (MTTR)
- - By increasing Mean Time Between Unscheduled
Maintenance Actions (MTBUMA) by decreasing
MTBR induced and MTBR no defect
38Availability Analysis
- How can we decrease ALDT?
- - By improving Logistics
- Improve scheduling of inspections
- Improve commonality of parts
- Decrease time to get replacements
- - By improving Prognostics
- Replace parts before they fail, not after
- Maximize use of component life
- Improve off-board prognostics trending
- More sensors!!
39Availability Analysis
- How can we increase MTBF?
- - By improving Reliability
- Select more rugged components
- Improve life screening and testing
- Improve thermal management
- - By improving Quality
- Better parts screening
- Better manufacturing processes
- - By adding Redundancy
- At the cost of Size, Weight and Power!
40Availability Analysis
- How can we decrease MTTR?
- - By improving Maintainability
- Improve quality and efficacy training
- Simplify fault isolation
- Decrease number of tools and special equipment
- Decrease access time (panels, connectors)
- Improve Preventative Maintenance
- - By improving Diagnostics
- Improve BIT and BITE
- Decrease ambiguity group size
- Improve maintenance manuals and training
41Availability Analysis
- How can we increase MTBM (induced/no defect)?
- - By improving Safety
- Limit the potential for accidental damage
- - By improving Prognostics
- Improve PHM models to monitor induced damage
- - By improving Diagnostics
- Lower the false alarm rate
- Dont repair/replace things which arent broken!
42Sensor Example
Engine Health/Performance Monitoring Place an
acoustic sensor on the engine housing. Establish
nominal operating parameters. Develop library
relating fault precedents to failures odd
sounds which warn of impending failure. Monitor
for out of nominal acoustic signature.
43PHM Example
Consider a toaster Not just any toaster, but the
toaster on the first mission to Mars. NASA could
only afford to send one, and it must work, every
time, or else the astronauts wont have toast.
The toaster must also not endanger the mission
by causing a safety hazard or waste bread.
Mission Critical Function - make
toast Safety Critical Functions - dont injure
the astronauts - dont damage the spaceship -
dont burn the toast!
44PHM Example
- Identify the elements of a toaster.
- What are the failure modes?
- What should we monitor for safety hazards?
- What elements should we monitor for diagnostics?
- What data should we collect for prognostics?
- How would we optimize the sensor coverage and
data collection?
45Issues Related to PHM
- Continually monitoring sensors and storing all
that data for analysis will quickly consume
available bandwidth and storage space. - Capturing profound knowledge of a complex
engineered system and its myriad failure modes is
very difficult, and involves integrating
knowledge which crosses discipline boundaries
SE, EE, ME, RAM-T, Safety, Software, Math,
Statistics, Physics - Prognostic analysis of data is a very difficult
problem, with no easy or universal solution. - PHM is a relatively new field.
46Final Remarks
- Do I have any practical PHM suggestions?
- - Aim for the low hanging fruit
- Use the sensors you already have in creative
ways. - Only add sensors when you must.
- You cant monitor everything, so dont try.
- - Dont reinvent the wheel
- Build on others work and experience.
- Find good tools to design your system.
47Additional Prognostic Analysis Tool