Civil and Environmental Engineering - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Civil and Environmental Engineering

Description:

Knowledge Discovery Process Steps. domain understanding. data understanding ... prior/domain knowledge. effects of noisy data are mitigated. sufficient data ... – PowerPoint PPT presentation

Number of Views:212
Avg rating:3.0/5.0
Slides: 32
Provided by: rebeccabar
Category:

less

Transcript and Presenter's Notes

Title: Civil and Environmental Engineering


1
Sensors Knowledge Discovery (a.k.a. Data
Mining)
  • H. Scott Matthews
  • April 6, 2004
  • (originally presented by Rebecca Buchheit)

2
Recap
  • Sensors - what are they?
  • Sensor Networks - how they help us
  • Sensor Signal Acquisition and Use
  • Next - how to use the data!

3
Life Cycles of Sensor Networks
  • Currently, sensors and sensor systems are fairly
    proprietary
  • e.g. a Johnson Controls HVAC sensor system
    uses only their equipment
  • Need to design more robust networks that are
    standards-driven and open

4
Life Cycles (2)
  • In addition, sensor networks then to have very
    short lifetimes
  • i.e. We build one, use it for a few years, and
    then replace it with a newer/better one
  • Need to plan for, and design architectures for
    sensor networks that will last the life of the
    infrastructure we are monitoring
  • e.g. 50-100 years for bridges (to manage LCC)

5
A Knowledge Discovery Framework for Civil
Infrastructure Contexts
  • Rebecca Buchheit
  • Department of Civil and Environmental Engineering
  • Carnegie Mellon University

6
Motivation
  • condition and usage patterns of critical
    infrastructure attracting increased attention
  • deteriorating infrastructure cheap data
    collection methods health monitoring,
    transportation management, other data intensive
    civil infrastructure techniques

7
Motivation
  • amount of data, relationships between attributes,
    context-sensitivity, observational collection
    methods gt data mining and knowledge discovery in
    databases (KDD) process
  • our ability to collect data far outstrips our
    ability to analyze and understand the data at a
    high level of abstraction

8
Databases Statistics and Machine Learning
Data Mining
statistics
databases
data mining
machine learning
9
Definitions
  • Data Mining
  • algorithms to extract patterns from large data
    sets
  • Knowledge Discovery in Databases
  • ... the non-trivial process of identifying
    valid, novel, potentially useful, and ultimately
    understandable patterns in data. Fayyad, et al
  • Uses observational, not controlled, data

10
Knowledge Discovery Process Steps
  • domain understanding
  • data understanding
  • data preparation
  • data modeling (a.k.a data mining)
  • results evaluation
  • deployment

11
CRISP-DM
  • CRoss-Industry Standard Process for Data Mining
  • high-level, hierarchical, iterative process model
    for KDD
  • provides framework for applying KDD consistently

12
Domain Understanding
  • evaluate fit between KDD and the problem
  • how much data?
  • what type of data?
  • perceived quality of data?
  • what is being measured?
  • right data to answer the question?
  • organizational support?

13
Data Understanding
  • summary statistics
  • plotting and visualization
  • missing values
  • randomly missing
  • influenced by a measured factor
  • influenced by an unmeasured factor
  • evaluate quality of existing data
  • what is good data?
  • what do we do with bad data?

14
Data Preparation
  • most time-consuming part of KDD
  • data selection
  • which records (rows) to use
  • which attributes (columns) to use
  • data cleaning
  • do something to bad and missing data
  • integrate data from different sources
  • transform data

15
Data Modeling/Data Mining
  • choose an algorithm
  • choose parameters for that algorithm
  • apply algorithm to data
  • evaluate results
  • predictive accuracy
  • descriptive coverage
  • repeat as necessary
  • repeat as necessary

16
Data Mining Goals
  • Prediction
  • predict the value of one or more variables based
    on the values of other variables
  • Description
  • describe the data set in a compact,
    human-understandable form

17
Data Mining Tasks
  • Classification
  • Regression
  • Clustering
  • Deviation detection
  • Summarization
  • Dependency modeling

18
Classification
  • learn how to classify data items into predefined
    groups

19
Regression
  • map a real-valued dependent variable to one or
    more independent variables

20
Clustering
  • learn natural classes or clusters of data

21
Deviation Detection
  • detect changes or deviations from normal or
    baseline state

22
Summarization
  • summarize subsets of data set

computer industry mean salary 65k service
industry mean salary 20k
23
Dependency Modeling
  • learn relationships between attributes or
    between items in the data set
  • pattern recognition
  • time series analysis
  • association rules

In 80 of the cases, an engineer with a PE and 10
years experience is a project manager.
24
Data Mining in the IW
  • concept description using classification
  • environmental conditions affect hot water energy
    consumption
  • used outside temperature, solar radiation and
    wind speed
  • solar radiation and wind speed not significant
    above 80F and below 50F
  • IF temperature between 20F and 30F
  • THEN energy usage between 47,393 kJ and
    131,875 kJ
  • describes gt50 instances in energy usage range

25
Results Evaluation
  • do results meet clients criteria?
  • novel?
  • understandable?
  • valid (modeling phase)?
  • useful?

26
Results Deployment
  • explain results to client
  • improvements to data collection?
  • ongoing process applied to new data?

27
Benefits of KDD
  • Intelligent Workplace
  • confirmation that system is (not) working
  • continue to monitor control system
  • in future, predict missing values to complete
    energy studies

28
Apply Data Mining to Civil Infrastructure?
  • civil infrastructure meets guidelines for
    selecting potential data mining problems
  • significant impact
  • no good alternatives exist
  • prior/domain knowledge
  • effects of noisy data are mitigated
  • sufficient data
  • relevant attributes are being measured

29
Background
  • sporadic use of KDD techniques in civil
    infrastructure
  • relative youth of data mining research
  • difficult to systematically apply KDD process
  • KDD process tools (CRISP-DM) still under
    development
  • KDD process highly domain dependent
  • time consuming to teach data mining analysts
    domain knowledge

30
Research Objectives
  • develop a framework for systematically applying
    KDD process to civil infrastructure data analysis
    needs
  • set of guidelines for inexperienced analysts
  • checklist for more experienced analysts
  • describe intersection of KDD process
    characteristics and civil infrastructure
  • what problems are well-suited to KDD?
  • what characteristics are unique to infrastructure?

31
Summary
  • increased data collection gt increased need to
    intelligently analyze data
  • KDD process as a power tool for analyzing data
    for high-level knowledge
  • civil infrastructure problems are well-suited to
    data mining but will need to apply entire KDD
    process to get good results
  • proposed framework will help researchers to
    systematically apply KDD process to their data
    analysis problems
Write a Comment
User Comments (0)
About PowerShow.com