Addressing Diagnostic Complexity - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Addressing Diagnostic Complexity

Description:

Title: Accounts Author: Walter Wong Last modified by: Mark Poepping Created Date: 3/11/2004 10:15:32 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 25
Provided by: Walter230
Category:

less

Transcript and Presenter's Notes

Title: Addressing Diagnostic Complexity


1
Addressing Diagnostic Complexity
  • The EDDY Approach
  • End-to-end Diagnostic DiscoveryY

Chas DiFatta chas_at_cmu.edu Mark Poepping
poepping_at_cmu.edu Joel Smith joelms_at_cmu.edu
2
Problems of IT Management
  • Distributed systems are extremely difficult to
    diagnose, i.e. complexity, scale.
  • Limited number of domain experts.
  • Need to transfer the experience of experts to
    less experienced staff.
  • Increasing amount of internal services are
    increasingly dependant on external resources.

3
Problems of Distributed System Diagnosticians
  • No access to the diagnostic data
  • Discovering valuable information in a sea of data
  • Correlating different diagnostic data types
  • Providing evidence for non-repudiation of a
    diagnosis
  • Finding time to create tools to transfer
    diagnostic knowledge to less skilled
    organizations and/or individuals

4
From the end-users view
  • Workstation Centric
  • The performance of my workstation seems slow!
  • Is the network down?

?
?
?
  • Application centric
  • I cant open or launch the application!
  • I cant authenticate with the application!
  • The application says Im not authorized to use
    it!
  • The application is behaving inconsistently!
  • The application is giving errors!
  • I cant use all the features of the application!
  • The applications performance is poor!

5
From the help desks view
  • Workstation Centric
  • Does the user have the correct version and
    configuration?
  • Can the user connect to the network or is it a
    performance problem?
  • Security problem? E.g. Botnet, virus, worm,
    other compromise?
  • Is the user having a hardware problem?

?
?
?
?
  • Infrastructure centric enterprise based
  • Can the user get to key local services? E.g.
    DHCP, DNS, SMTP, POP/IMAP, NTP, Authn, etc and if
    so, are they operating properly?
  • Application centric
  • Vital Signs Is the applications basic
    functionality working?
  • Authn/Authz can the user log in and what are
    their privileges ?
  • Highly focused Is a specific feature of the
    app not working?

6
From the diagnosticians view
  • Network Centric
  • Is there an connectivity problem between the
    user and the application?
  • Could there be a routing or firewall problem?
  • Is the performance or loss of the network at an
    acceptable level?

?
?
?
  • Service Centric
  • Are the middleware services that the end user is
    dependent upon functioning within an acceptable
    level and can they get to them? E.g. DHCP,
    Authn, Authz, DNS, NTP, Email, VPN, Web, etc.
  • Application Centric
  • Does the application have enough critical
    resources?
  • Is the application reporting any internal or
    external errors?
  • Are the lower, upper middleware and services
    that the application is dependent on functioning
    an expectable level?

7
From the CIOs view
  • Business centric
  • Are my customers being serviced properly?
  • How is the infrastructure operating?
  • Finance centric
  • What will I need to spend resources on?
  • Is my staffs time being used effectively?
  • What is the growth in specific areas?

?
?
?
  • Compliance centric
  • Are our diagnostic procedures consistent with
    our security and privacy policies?
  • Are we within compliance boundaries for
  • Processes and procedures
  • Legal issues

8
The end-users needs
  • A workstation centric tool that
  • Verifies basic hardware operation
  • Reports on software versions and any internal
    errors
  • Reports on network connectivity and performance
  • Verifies that key network services are available
  • Scans system for security external and internal
    security vulnerabilities
  • Publishes results to diagnostic repository

!
!
!
  • Application centric tools that
  • Verifies that the user has the correct resources
  • Conforms a baseline of functions to the user.
    E.g. what can the user do and a test to prove to
    them that they can
  • Provides ways for users to tag errors at any
    phase of the applications operation and publish
    them to the diagnostic repository

9
The help desks needs
  • A workstation centric tool that verifies the
    users
  • Baseline software and hardware configuration
  • Network connectivity and performance
  • Key network services are available (DHCP, DNS,
    NTP, Authn/Authz etc.)

!
!
  • Service centric enterprise based tools
  • Testing enterprise base services (DNS, SMTP,
    POP/IMAP, NTP, Authn, Authz, etc.) from the
    perspective of the user
  • Querying the infrastructure about internal and
    external problems
  • The ability to share diagnostic information with
    internal groups

!
!
  • Application centric tools
  • Real-time views into the application as the user
    is operating
  • Medium depth tools verifies the operation of the
    application and its supporting services
  • The ability to share diagnostic information with
    external groups

10
The diagnosticians needs
  • Network Centric Tools
  • Forensic tools that query the infrastructure
    about internal and external connectivity problems
  • Active testing tools that perform network
    tracers at specific intervals and report
    anomalies into a reporting infrastructure

!
!
!
  • Service Centric Tools
  • Highly focused passive and active lower, upper,
    and middleware diagnostic tools that report
    anomalies
  • Forensic tools that query detailed events of key
    services using their logs and other means
  • Application Centric Tools
  • Verification that the application has the
    correct resources
  • Highly focused tests into specific features and
    modules
  • The ability to share detailed diagnostic
    information to the developer in near real-time

11
The CIOs needs
  • Business centric
  • Reporting on help desk problems and resolution
  • Reporting on infrastructure health

!
!
!
  • Finance centric
  • Reporting on anomaly and problem costs
  • Reporting infrastructure growth
  • Compliance centric
  • Security process event process reporting
  • Reporting that specific processes are being done

12
State of Practice
  • Network, application, system and security events
    separate, therefore extremely difficult to
    correlate
  • Data represents only what has faulted
  • No end-to-end accountability of transactions.
    I.e. email, web, VoIP, intrusion

13
Vision
  • Create an activity audit ledger/application
    that...
  • Provides a means to study the behavior of faults
    and anomalies
  • Explores the impact of an Internet with assured
    electronic communications and its influence on
    infrastructure, security, reliability, privacy
    and trust
  • Assures the default electronic interaction by
    creating a means of non-repudiation between two
    or more parties

14
What if?
  • Events (application, network, system, security,
    environmental) could be collected, disseminated
    and correlated using a common backplane?
  • The backplane provides access to diagnostic data
    to tool developers and researchers?

15
Initial Direction
  • Enabling mechanism for investigating
  • Machine to machine interaction, i.e. services
  • Taxonomic risk analysis of security anomalies
  • Automated diagnostic practices, not just what has
    faulted but how the fault occurred
  • Perceived anomalies verses actual faults
  • Embedded system events
  • High volume event driven systems

16
What is EDDY?
  • Consolidates events using a simple structure
    (CER) to enable a high degree of correlation
  • Event management environment to collect,
    disseminate, store and analyze events
  • Diagnostic tool platform that exposes the events
    to enhance and leverage existing tools as well as
    enable the next generation

17
Gory Technical Details
18
Diagnostic Data Lifecycle
Access
Policy
19
Sponsors
  • This work has been funded and supported by
  • NSF Middleware Initiative (Cooperative Agreement
    No. ANI-0330626)
  • Internet2
  • Carnegie Mellon University

20
To Now
  • Started in Fall 2003 Internet2 mw-e2ed
  • NSF NMI funding
  • Fall 2005 Release 1.0
  • Early code experimentation touch it
  • Models/schema will change (partly the point)

21
Looking Forward
  • Other development help (Duke, others)
  • Expand to other use cases external to CMU
  • Email performance and diagnostics
  • Lionshare, Shibboleth
  • Operational/Commercial Interest
  • Abilene/NLR
  • IBM, Intel, Cisco, Microsoft
  • Architecture, standards, reference
    implementation, experimental environment

22
Looking Forward (2)
  • CMU Campus Adopters initial use cases
  • CS/Cylab security research
  • Dragnet network flows
  • Architecture environment monitoring/control
  • Environmental event data from many ultra small
    devices and embedded systems
  • Intelligent Workplace you may have toured..
  • Computing Services Systems, Middleware, Network,
    Security
  • Consolidation of application log files, fault
    analysis
  • Traffic consumption, network event correlation
  • Security event correlation and forensics

23
Looking Forward (3)
  • Expanding (seeking partners/funding)
  • Mature base technology
  • Spawn effort for diagnostic application
    development
  • Enable multi-subsystem correlation
  • Experiment with extending research data flow
    analysis into multi-campus federating/automating
    some diagnostic data sharing

24
Addressing Diagnostic Complexity
  • The EDDY Approach
  • End-to-end Diagnostic DiscoveryY

Chas DiFatta chas_at_cmu.edu Mark Poepping
poepping_at_cmu.edu Joel Smith joelms_at_cmu.edu
Write a Comment
User Comments (0)
About PowerShow.com