System Administration: Drowning in Management Complexity - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

System Administration: Drowning in Management Complexity

Description:

Equivalent to 72 Gmail accounts OR 4 Blu-Ray Discs. People Ever Born (106,456,367,669) = O(1011) ... on my machine and stealing my bank account passwords? ... – PowerPoint PPT presentation

Number of Views:249
Avg rating:3.0/5.0
Slides: 37
Provided by: Micro244
Category:

less

Transcript and Presenter's Notes

Title: System Administration: Drowning in Management Complexity


1
System AdministrationDrowning in Management
Complexity
  • Chad Verbowski
  • Microsoft Research, Redmond

2
Overview
  • Problem Space
  • Complexity Grows Faster than We Can Handle
  • System Management Approaches
  • A New Approach Data Driven Management
  • Examples/Results using Data Driven Mgmt

3
Motivation
  • Systems Management Complexity Scale
  • The amount of Energy we put into Maintaining our
    Systems
  • Energy Software, Hardware, People, Resource
  • Complexity is Constantly Growing
  • Advances Reducing Development Complexity
  • Simplified Development Enables Complex Systems
  • Growing Number of Devices, Apps, Users
  • Advances Needed in Managing Complexity
  • To Avoid Drowning!!!

4
Systems Management Problem
  • Complexity AND Scale
  • Persistent-State Size O(105)
  • Persistent-State Access Trace Size O(108) per
    day
  • Number of Programs Interacting O(102)
  • Globally
  • Number of Machines O(109)
  • each runs a different combination of
    O(106)programs
  • CyberSecurity (Anti-Malware)
  • Systems Management Adversaries
  • Digital Rights (DRM), Protecting Data
  • Cybersecurity Untrusted Users

5
Complexity ComparisonWith Humans
  • Human DNA 1 Billion Base Pairs 1GB
  • 0.25 Unique Pairs 1.2 MB
  • 6 Billion people 7.2 TB
  • Encoding of Relatives (3.61) 2 TB
  • Lempel-Ziv Compress (101) 200 GB
  • All Living Peoples DNA Fit on a Laptop!
  • Equivalent to 72 Gmail accounts OR 4 Blu-Ray
    Discs
  • People Ever Born (106,456,367,669) O(1011)
  • Storage Required for all Human DNA 3.5 TB
  • Cost (Using 14k for 18TB) 2,725
  • Backup (100 for 1TB Tape) 400

6
Growth in Software Complexity
  • Rate a Developer Can Code
  • OO/CORBA/COM Enables Componentization
  • Libraries Enable Sharing Code
  • New Languages Less Coding
  • Better Tools Easier to Debug, Build, Annotate
  • Density of Developer Collaboration
  • Source Control Systems
  • Improved Communication (Email, IM)
  • Enforceable Software Development Processes
  • Hardware Advancements
  • Software is a gas that expands to fill its
    container

7
Developers Role in Manageability
  • Rely on them for Manageable Software?
  • Probably Not At least not for a while
  • Time to Adopt new platforms, applications, APIs
  • Third Party, In House, Legacy Software
  • Can They Completely Solve This?
  • Not Feeling the Pain of System Administrators
  • Manageability is the Top Priority, Right After
  • Easy to Manage Hardware?
  • Very Few Advancements in this area
  • State-of-the-Art SNMP v1 Circa. 1988

8
The Software LifeCycle
9
The Management LifeCycle
  • Software LifeCycle ! Management LifeCycle
  • ONGOING Cost That Starts With Deployment
  • Configuration, Provisioning
  • Monitoring, Troubleshooting
  • Upgrades, Patching
  • Integration with Other Components
  • Accumulation of Stuff To Manage
  • Applications, Hardware, Devices, Users, and Data
  • Ops Cost gtgt Software Cost
  • BIG Trouble Unless Significant Improvement

10
Who Is Going to Save Us?
  • Sys-Admins Are Ultimately Responsible
  • They Understand the Symptoms Best
  • Limited Time Toolset for Fixing Manageability
  • Need Better Management Tools (Obviously)
  • Their Net Affect Should Not Be More Complexity!
  • They Need to Take Virtually No Input or
    Configuration
  • They Should Not Rely On Application Participation
  • ARE THESE IMPOSSIBLE CONSTRAINTS??

11
Motivation From Albert Einstein
  • Any fool can make things bigger, more complex,
    and more violent. It takes a touch of genius-and
    a lot of courage-to move in the opposite
    direction.

12
System Management TechniquesBad System
Management Can Make Things Worse
  • Software Development Design Choices
  • Componentization is Good
  • But Dont Make Every Class a Component!
  • Security Checks, and Locks Are Good
  • But Dont Unnecessarily Check/Lock At Every
    Layer!
  • System Management Technique Choices
  • No Single Technique Solves All Problems
  • Be Aware of the Capability and Limitations
  • Use Them Appropriately!

13
1. Prescriptive ManagementThe First Line of
Defense
  • Limit the Hardware and Software Used
  • You can only buy THESE Server/Laptop/Desktop
  • Only THESE Versions of App X Are Supported.
  • Benefits
  • Less Stuff to Manage!
  • Challenges
  • Ongoing Cost to Maintain the List
  • Measuring Compliance is Hard
  • Difficult to Clean Up Existing Environments
  • User Happiness ?

14
2. Signature BasedAvoid Solving the Same Problem
Over and Over and
  • Create Rules/Fingerprints for Known Problems
  • (AV/AS) Manual Sample Collection and Signature
    Derivation
  • (Mgmt) Manual Events Rules for Well Known
    Problems
  • Benefits
  • Minimal Troubleshooting Time
  • Early Problem Detection
  • Challenges
  • Costly Hard to Identify Root Cause
  • The Most Costly Issues Frequently Repeat

15
3. ManifestDeep System Understanding Enables
Policy Based Management
  • Complete Description of Environment State
  • Each Items Function is Documented with
    Dependencies, Valid Values
  • Benefits
  • Policy Constraints Can be Created and Enforced
  • Wide and Deep Knowledge Minimizes Troubleshooting
  • Challenges
  • Determining What the Policy Should Be
  • Virtually Impossible to Create for ALL items
  • Third Party, In-House, and Legacy Applications
    (?)
  • Difficulty Resolving Late-Bound dependencies,
    Canonicalization Issues
  • Costly to Create a Manifest for Large
    Applications
  • Keeping the Manifest Current is Challenging

16
4. Simplified Management ModelReduce Complexity
by Creating a Simpler Management Abstraction
  • Manage a Simplified Logical View
  • Complexity is Encapsulated in Components Forming
    a Logical View
  • e.g. A Service Description, and a Service
    Level Agreement
  • Benefits
  • The Management Space is Less Complex
  • Challenges
  • Hard to Define the Right Abstraction for
    Everyone
  • Creating the Model Definition
  • Mapping to New Model is Hard
  • Equivalence Across Vendor/Application/Version
  • Keeping the Real-World and Logical View in Sync

17
Motivation For a New TechniqueHard to Solve
Real-World Change Management Problems
  • My application worked yesterday, but its not
    working today. Whats the problem?
  • My system has been acting weird lately. What has
    changed?
  • If I apply this patch, which of the 3,000
    applications in my company may break?
  • Was this change consistently applied to all 850
    of my servers?
  • Some spyware program is hijacking my home page.
    How can I get rid of it, all of it?
  • Are there any Trojan programs hiding on my
    machine and stealing my bank account passwords?

18
Change Management Struggle
Applications
App Popularity
App Versions
19
InsightsA Pragmatic Look at Change Management
  • Cross Machine State CANT Be That Different
  • Most of the O(109) Systems Are Working Correctly
  • Most Environments Have Small Variation in
    Settings
  • System Workloads are Highly Repetitive
  • We Only Care About The State That Is Used
  • Only 10 of Files and Settings Are Actually Used
  • Process / State Interactions Provides Context
  • For Understanding Process Dependencies and the
    State
  • We Only Care About New System Changes
  • Only 1 of Files and Settings Typically Change

20
5. Data Driven Management Reduce Complexity
using Automated Monitoring and Analysis
  • Manage Only Globally Distinct Differences
  • Instrument the OS to Auto Track Process/State
    Interactions
  • Identify New Process Patterns and State
    Differences
  • Benefits
  • Simplifies the Troubleshooting Problem Space
  • Reduces the Problem Space for Other Techniques
  • Leverage Existing Machine Learning Work
  • Challenges
  • Scalable Low Overhead Data Collection and
    Analysis
  • Determining Cross Machine Equivalence
  • False Positives

21
System Building Challenges
22
Data Driven Examples
  • Troubleshooting Strider Peer Pressure
  • Spyware Detection GateKeeper
  • Patch Impact Analysis
  • Root Kit Detection Ghost buster
  • Exploit Site Discovery Honey Monkey
  • Closing the Change Mgmt Loop LiveOps

23
Strider Troubleshooter
  • My application worked yesterday, but its not
    working today. Whats the problem?
  • Cross-time Diff O(105) ? O(103)
  • Windows XP System Restore Registry snapshot
  • Trace the app O(105) ? O(103)
  • Registry read/write operations
  • Diff-Trace Intersection O(103) ? O(101)
  • Inverse Change Frequency Ranking
  • GeneBank PeerPressure Ranking
  • Mostly good Registry snapshots from the Mass for
    detecting anomalies

O(101) ? O(100)
24
Experimental Results
25
AskStrider Auto-Scanner
  • My system has been acting weird lately. What has
    changed?
  • Running-module Snapshot O(105) ? O(103)
  • Earliest-Latest Diff O(105) ? O(103)
  • Diff-Snapshot Intersection O(103) ? O(102)
  • Last-Update Timestamp Ranking O(102) ? O(101)
  • Patch Filtering O(101) ? O(100)
  • During patch troubleshooting focus on files from
    patches
  • During malware troubleshooting filter out files
    from patches as noise

26
(No Transcript)
27
Patch Impact Analysis
  • If I apply this patch, which of the 3,000
    applications in my company may break?
  • Trace patch installation O(105) ? O(101)
  • Black-box patch manifest
  • For each of the O(103) apps
  • Trace it O(105) ? O(103)
  • Black-box persistent-state app manifest
  • Diff-Trace Intersection O(103) ? O(100) or 0
  • Test prioritization O(103) ? 0 O(101)

28
Improving OS DesignWhat Extensibility Points
Exist in the OS?
  • Extensibility Point Configuration Setting
    Containing the File Name of Code To Be Loaded At
    Application Runtime
  • - Used by Malware to Automatically Start After
    Reboot
  • Solution For Each Module Load Identify
    Previously Read Settings Containing the Module
    Name.
  • Results
  • 364 Classes of EPs with 7227 EP Instances
  • 44 of EP Instances were never modified
  • Recommendation Lock Down
  • 70 of EP Instances were used by a single
    application
  • Recommendation Removal

29
Ghostware The Ultimate Challenge to Trustworthy
Computing
  • Ghostware
  • Malware programs that patch the OS to hide their
    files, Registry entries, processes, loaded
    modules, network ports, etc. from other
    applications and OS utility programs
  • Bad things they can do
  • Install keyloggers to steal information
  • Use the disks as free storage
  • Use the machines to send spam emails
  • Release viruses and worms

30
CWS spyware detected by Ad-aware
31
(No Transcript)
32
CWS Spyware Hidden by Hacker Defender
33
GhostBuster ScanDiff
34
Strider GhostBuster Ghostware Detector
  • Are there any Trojan programs hiding on my
    machine and stealing my bank account passwords?
  • File System Registry Snapshot O(105)
  • Snapshot from a WinPE CD O(105)
  • Diff of the two snapshots O(105) ? O(101)
  • Content-Diff Noise Filtering O(101) ? O(100)
  • Only care about files and Registry entries that
    exist in the second snapshot, but not the first
    one

35
LiveOpsClosing The Change Management Loop
!
Person or Automation
LiveOps
Is The Change Approved?
Change Request
Change Tools
Change Detected
OS Applications Platform
36
Conclusion
  • Think of New Ways to Avoid Complexity
  • Not just accept, find better ways to manage it
  • Invest in Deep Thinking to Advance Ops
  • Not just fighting the fires!
  • (The Nearest Way to the Exit May be Behind You!)
  • To raise new questions, new possibilities, to
    regard old problems from a new angle, requires
    creative imagination and marks real advance in
    science.
Write a Comment
User Comments (0)
About PowerShow.com