Collection of general data mining briefings - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Collection of general data mining briefings

Description:

Presented to: Olin Howard, AFMC/SC, 1/31/96 Walt Shafer, FBIS, 2/1/96 Mike Ware, NSA Y, 6/13/96 – PowerPoint PPT presentation

Number of Views:259
Avg rating:3.0/5.0
Slides: 47
Provided by: ChrisC251
Category:

less

Transcript and Presenter's Notes

Title: Collection of general data mining briefings


1
Data Management Information Management Knowledge
Management for Network Centric Operations
Dr. Bhavani Thuraisingham The University of Texas
at Dallas
October 2005

2
Data, Information and Knowledge Management
Definitions
Knowledge Management
Acquiring knowledge
Collaboration and sharing
Managing the processes Disseminating the
knowledge Taking action
Information Management
Extracting information from the data
Visualizing the data
Data Management
Data administration
Database management
3
What is data management?
  • One proposal Data Management Database System
    Management Data Administration
  • Includes data analysis, data administration,
    database administration, auditing, data modeling,
    database system development, database application
    development

4
Data Administration
  • Identifying the data
  • Data may be in files, paper, databases, etc.
  • Analyzing the data
  • Is the data of good quality?
  • Is the data complete?
  • Data standardization
  • Should one standardize all the data elements and
    metadata?
  • Repositories for handling semantic heterogeneity?
  • Data Security
  • How should data be secured?
  • Data modeling
  • Structure the data, model the data and the
    processes

5
Data Administration (Continued)
  • Data quality provides some measure for
    determining the accuracy of the data
  • Is the data current? Can we trust the source?
  • Data quality parameters can be passed from source
    to source
  • E.g., Trust A 50 and Trust B 30
  • Data may have different semantics
  • E.g, Bank A may send out statement on the 20th
    day of each month and Bank B may send out
    statements on the 5th day of each month
  • Fighter jet and Passenger plane may be considered
    to be one and the same

6
Data Administration (Concluded)
  • Data Standards
  • Standards for data semantics and administration
  • E.g., XML (eXtensible Markup Language) for
    document interchange
  • Data security includes data confidentiality and
    integrity
  • Confidentiality is about preventing unauthorized
    access to the data
  • Integrity is about preventing malicious
    corruption to the data

7
An Example Database System
8
Metadata
  • Metadata describes the data in the database
  • Example Database D consists of a relation EMP
    with attributes SS, Name, and Salary
  • Metadatabase stores the metadata
  • Could be physically stored with the database
  • Metadatabase may also store constraints and
    administrative information
  • Metadata is also referred to as the schema or
    data dictionary

9
Three-level Schema Architecture Details
User B2
User A1
User A2
User A3
User B1
External Schema B
External Model A
External Schema A
External Model B
External/Conceptual Mapping A
External/Conceptual Mapping B
Conceptual Model
Conceptual Schema
Conceptual/Internal Mapping
Stored Database Internal Model
Internal Schema
10
Functional Architecture
Data Management
User Interface Manager
Schema (Data Dictionary) Manager (metadata)
Security/ Integrity Manager
Query Manager
Transaction Manager
Storage Management
File Manager
Disk Manager
11
Types of Database Systems
  • Relational Database Systems
  • Distributed and Federated Database Systems
  • Object Database Systems
  • Deductive Database Systems
  • Other
  • Real-time, Secure, Parallel, Scientific,
    Temporal, Wireless, Functional,
    Entity-Relationship, Sensor/Stream Database
    Systems, etc.

12
Relational Database Example
Relation S S SNAME STATUS CITY S1 Smith
20 London S2 Jones 10
Paris S3 Blake 30
Paris S4 Clark 20 London S5
Adams 30 Athens Relation P P
PNAME COLOR WEIGHT CITY P1 Nut
Red 12 London P2 Bolt
Green 17 Paris P3 Screw
Blue 17 Rome P4 Screw
Red 14 London P5 Cam
Blue 12 Paris P6 Cog
Red 19 London
Relation SP S P QTY S1 P1
300 S1 P2 200 S1 P3 400 S1 P4
200 S1 P5 100 S1 P6 100 S2
P1 300 S2 P2 400 S3 P2
200 S4 P2 200 S4 P4 300 S4 P5
400
13
Example Object
Composite Document Object
Section 2 Object
Section 1 Object
Paragraph 1 Object
Paragraph 2 Object
14
Distributed Database System
15
Query Processing Example
DQP (Distributed Query Processor)
Network
DQP
DQP
DQP
DBMS 3
DBMS 1
DBMS 2
EMP1 (20) EMP3 (50) DEPT3 (30)
EMP2 (30) DEPT2 (20)
EMP1 (20)
Query at site 1 Join EMP and DEPT on D Move
EMP2 to site 3 Merge EMP1, EMP2, EMP3 to form
EMP Move DEPT2 to site 3 Merge DEPT2 and DEPT3
to form DEPT Join EMP and DEPT Move result to
site 1
16
Transaction Processing Example
DTM (Distributed Transaction Manager)
responsible for executing the distributed transact
ion
Issues Concurrency control Recovery Data
Replication
Site 1 Coordinator
Transaction Tj
Subtransaction Tj4
Subtransaction Tj2
Subtransaction Tj3
Site 2 Participant
Site 4 Participant
Site 3 Participant
Two-phase commit Coordinator queries
participants whether they are ready to
commit If all participants agree, then
coordinator sends request for the participants to
commit
17
Interoperability of Heterogeneous Database Systems
Database System A
Database System B
(Relational)
(Object- Oriented)
Network
Transparent access to heterogeneous databases -
both users and application programs Query,
Transaction processing
Database System C (Legacy)
18
Technical Issues on the Interoperability of
Heterogeneous Database Systems
  • Heterogeneity with respect to data models,
    schema, query processing, query languages,
    transaction management, semantics, integrity, and
    security policies
  • Interoperability based on client-server
    architectures
  • Federated database management
  • Collection of cooperating, autonomous, and
    possibly heterogeneous component database
    systems, each belonging to one or more
    federations

19
Different Data Models
Network
Node A
Node B
Node C
Node D
Database
Database
Database
Database
Network Model
Object- Oriented Model
Relational Model
Hierarchical Model
Developments Tools for interoperability
commercial products Challenges Global data
model
20
Schema Integration and Transformation An approach
External Schema III
External Schema I
External Schema II
Global Schema Integrate the generic schemas
Generic schema describing the relational database
Generic schema describing the network database
Generic schema describing the hierarchical databas
e
Generic schema describing the object-oriented data
base
Schema describing the network database
Schema describing the hierarchical database
Schema describing the object-oriented database
Schema describing the relational database
Challenges Selecting appropriate generic
representation maintaining
consistency during transformations

21
Semantic Heterogeneity
  • Semantic heterogeneity occurs when there is a
    disagreement about the meaning or interpretation
    of the same data or same data interpreted
    differently

Object O
Challenges Standard definitions Repositories
Node A
Node B
Database
Database
Object O interpreted as a passenger ship
Object O interpreted as a submarine
22
Federated Database Management
Database System A
Database System B
Federation F1
Cooperating database systems yet maintaining some
degree of autonomy
Federation F2
Database System C
23
Autonomy
component A honors the local request first
request from component
local request
Component A
Component B
Challenges Adapt techniques to handle autonomy
- e.g., transaction processing, schema
integration transition research to products
communication through federation
component A does not communicate with component C
Component C
24
Federated Data and Policy Management
Data/Policy for Federation
Export
Export
Data/Policy
Data/Policy
Export
Data/Policy
Component
Component
Data/Policy for
Data/Policy for
Agency A
Agency C
Component
Data/Policy for
Agency B
25
What is Information Management?
  • Information management essentially analyzes the
    data and makes sense out of the data
  • Several technologies have to work together for
    effective information management
  • Data Warehousing Extracting relevant data and
    putting this data into a repository for analysis
  • Data Mining Extracting information from the data
    previously unknown
  • Multimedia managing different media including
    text, images, video and audio
  • Web managing the databases and libraries on the
    web

26
Data Warehouse
Data Warehouse Data correlating Employees
With Medical Benefits and Projects
Could be any DBMS Usually based on the
relational data model
Users Query the Warehouse
Oracle DBMS for Employees
Sybase DBMS for Projects
Informix DBMS for Medical
27
What is Data Mining?
28
Steps to Data Mining
Clean/ modify data sources
Mine the data
Integrate data sources
Report final results/ Take actions
Examine Results/ Prune results
Data Sources
29
Data Mining Needs for Counterterrorism
Non-real-time Data Mining
  • Gather data from multiple sources
  • Information on terrorist attacks who, what,
    where, when, how
  • Personal and business data place of birth,
    ethnic origin, religion, education, work history,
    finances, criminal record, relatives, friends and
    associates, travel history, . . .
  • Unstructured data newspaper articles, video
    clips, speeches, emails, phone records, . . .
  • Integrate the data, build warehouses and
    federations
  • Develop profiles of terrorists,
    activities/threats
  • Mine the data to extract patterns of potential
    terrorists and predict future activities and
    targets
  • Find the needle in the haystack - suspicious
    needles?
  • Data integrity is important
  • Techniques have to SCALE

30
Data Mining Needs for Counterterrorism
Real-time Data Mining
  • Nature of data
  • Data arriving from sensors and other devices
  • Continuous data streams
  • Breaking news, video releases, satellite images
  • Some critical data may also reside in caches
  • Rapidly sift through the data and discard
    unwanted data for later use and analysis
    (non-real-time data mining)
  • Data mining techniques need to meet timing
    constraints
  • Quality of service (QoS) tradeoffs among
    timeliness, precision and accuracy
  • Presentation of results, visualization, real-time
    alerts and triggers

31
Data Mining as a Threat to Privacy
  • Data mining gives us facts that are not obvious
    to human analysts of the data
  • Can general trends across individuals be
    determined without revealing information about
    individuals?
  • Possible threats
  • Combine collections of data and infer information
    that is private
  • Disease information from prescription data
  • Military Action from Pizza delivery to pentagon
  • Need to protect the associations and correlations
    between the data that are sensitive or private

32
Privacy Preserving Data Mining
User Interface Manager
Privacy Constraints
Constraint Manager
Database Design Tool Structures the database
Data Miner Makes correlations Ensures privacy
Query Processor Constraints during query and
release operations
DBMS
Database
33
Current Status, Challenges and Directions
  • Status
  • Data Mining is now a technology
  • Several prototypes and tools exist Many or
    almost all of them work on relational databases
  • Challenges
  • Mining large quantities of data Dealing with
    noise and uncertainty, reasoning with incomplete
    data, Eliminating False positives and False
    negatives
  • Directions
  • Mining multimedia and text databases, Web mining
    (structure, usage and content), Mining metadata,
    Real-time data mining, Privacy

34
Semantic Web Overview
  • According to Tim Berners Lee, The Semantic Web
    supports
  • Machine readable and understandable web pages
  • Enterprise application integration
  • Nodes and links that essentially form a very
    large database
  • Premise
  • Semantic Web Applications Web Database
    Management
  • Web Services Information Integration - - -
    - -
  • Semantic Web Technologies XML, RDF, Ontologies,
    Rules-ML

35
Layered Architecture for Dependable
Semantic Web
  • Adapted from Tim Berners Lees description of the
    Semantic Web
  • Some Challenges Interoperability between
    Layers Security and Privacy cut across all
    layers Integration of Services Composability

36
What is XML all about?
  • XML is needed due to the limitations of HTML and
    complexities of SGML
  • It is an extensible markup language specified by
    the W3C (World Wide Web Consortium)
  • Designed to make the interchange of structured
    documents over the web easier
  • Key to XML are Document Type Definitions (DTDs)
    and XML Schemas
  • Allows users to bring multiple files together to
    form compound documents

37
What is Knowledge Management?
  • Knowledge management, or KM, is the process
    through which organizations generate value from
    their intellectual property and knowledge-based
    assets
  • Gartner group KM is a discipline that promotes
    an integrated approach to identifying and sharing
    all of an enterprise's information assets,
    including databases, documents, policies and
    procedures as well as unarticulated expertise and
    experience resident in individual workers
  • Peter Senge Knowledge is the capacity for
    effective action, this distinguishes knowledge
    from data and information KM is just another
    term in the ongoing continuum of business
    management evolution

38
Knowledge Management Components
Knowledge
Components of
Management
Components,
Cycle and
Technologies
Cycle
Technologies
Components
Knowledge, Creation
Expert systems
Strategies
Sharing, Measurement
Collaboration
Processes
And Improvement
Training
Metrics
Web
39
KM Strategy, Process and Metrics
  • Strategy
  • Motivation for KM and how to structure a KM
    program
  • Process
  • Use of KM to make existing practice more
    effective
  • Metrics
  • Measure the impact of KM on an organization

40
Strategy Building Learning Organizations
  • Adaptive learning and Generative learning
  • Need to adapt to the changing environment
  • Total quality movement (TQM) in Japan has
    migrated to a generative learning model
  • Look at the world in a new way
  • Changing roles of the leader
  • Migrating from decision makers to designers,
    teachers and stewards
  • Building a shared vision
  • Encouraging ideas, Requesting support, Moving
    beyond blame, Effective communication
  • Learning tools
  • Learning laboratory

41
Knowledge Management in Process Management
  • Types of Processes
  • Simple processes Low level operation
  • Complex and nonadapative processes Systems that
    use the same rules
  • Complex and adaptive Agents carrying out the
    processes are intelligent and adaptive
  • Linking knowledge management with processes
  • Knowledge management is needed for all processes
    critical for complex and adaptive processes
  • Learn from experience and use the experience in
    unknown situations

42
Metrics The Balanced Scorecard
  • Employee Capabilities Measuring the following
  • Employee satisfaction
  • Employee retention
  • Employee productivity
  • Information system capabilities Measuring the
    following
  • Whether each employee segment has information to
    carry out its operations.
  • Motivation and Empowerment Measuring the
    following
  • Suggestions made and implemented
  • Improvement
  • Team performance

43
Knowledge Management Architecture
Knowledge Creation and Acquisition Manager
Knowledge Representation Manager
Knowledge Dissemination and Sharing Manager
Knowledge Manipulation Manager
44
Secure Knowledge Management
  • Protecting the intellectual property of an
    organization
  • Access control including role-based access
    control
  • Security for process/activity management and
    workflow
  • Users must have certain credentials to carry out
    an activity
  • Composing multiple security policies across
    organizations
  • Security for knowledge management strategies and
    processes
  • Risk management and economic tradeoffs
  • Digital rights management and trust negotiation

45
Status and Directions
  • Knowledge management has exploded due to the web
  • Knowledge Management has different dimensions
  • Technology, Business
  • Goal is to take advantage of knowledge in a
    corporation for reuse
  • Tools are emerging
  • Need effective partnerships between business
    leaders, technologists and policy makers
  • Knowledge management may subsume information
    management and data management
  • Vague boundaries

46
Other Ideas and Directions?
  • Prof. Bhavani Thuraisingham
  • Director Cyber Security Center
  • Department of Computer Science
  • Erik Jonsson School of Engineering and Computer
    Science
  • The University of Texas at Dallas
  • Richardson, Texas
  • bhavani.thuraisingham_at_utdallas.edu
  • http//www.utdallas.edu/bxt043000/
  • President
  • Dr-Bhavani Security Consulting
  • Dallas, TX
  • www.dr-bhavani.org

Write a Comment
User Comments (0)
About PowerShow.com