Title: Rethinking Infrastructure Architecture: Modelbased Management
1Rethinking Infrastructure Architecture
Model-based Management
- Kevin Sangwell
- Infrastructure Architect
- Microsoft EMEA HQ
2Problem 1
3Problem 1
4Infrastructure Architect, a 2nd class citizen
- Infrastructure Architecture isnt viewed with the
same rigour as solution architecture - The Infrastructure Architect doesnt usually have
direct access to the business - Solution competitive advantage
- Infrastructure cost centre
5Problem 2
- Black box applications are difficult to support
- They require support staff to have specialist
application knowledge
6Solutions dont consider infrastructure or
operations
- Applications are built to solve Functional
Requirements - Exposing the information IT Professionals need to
support the application is the last thing on a
developers mind - If something doesnt work, Solution Architects
can change the solution - Infrastructure has to deploy whatever its given
7Problem 3
- How do you fix these problems?
- My application is slow
- My application keeps failing
- My application doesnt scale
- How do I provide Disaster Recovery?
8Problem 3
9Problem 3
Deployment Patch Management Backup Troubleshooting
x
10Root Causes
- Solutions and Infrastructure Architects are not
connected - Infrastructure Architects dont have all the
tools or methodologies they need - Solutions arent designed to be supported (or
deployed, maintained) - Solutions are mystical boxes
11The Ideal World If only
- Solutions and Infrastructure Architects are peers
- A Visual Studio for Infrastructure Architects
- Solutions are like packaged software
- Where the impact of change is known, understood
and not feared
12DSI Core Technical Principles
- SW platforms and tools
- that enable.
- Knowledge of an IT System
- Architectural intent
- Operational environment
- IT policies
- Resource needs
- Across platforms
- To be captured in
- Software Models
- MOM Management Packs
- Software update manifests
- System Definition Model
- That can be created, modified and operated on ...
- Across the IT Lifecycle
- Develop, Operate, Analyze/Act
13Demo
- Models let you perform what if against your
applications
14Middle-East Bank Topology
gt 100
Abu Dhabi
Dubai
50-100
Category 2 Branch
Bahrain
Category n Branch
Ave. Mailbox size 160Mb Ave. emails received
/day 90 Ave emails sent /day 45 Ave email
size 41Kb
lt 50
Qatar
15Models
- Developer
- Component topology
- Health model
- Configuration model
- Developer constraints
- IT Professional
- System topology
- IT governance model
- Servicing model
- End User
- Service level agreements
- Business requirement model
16You need to model the application
- If you dont understand the application, how can
you plan for it, support it, integrate it? - Models let you perform what if
- What happens if you add users
- What happens if you change network topology
- What if you merge with another infrastructure?
17System Definition Model (SDM)
- System Definition Model (SDM)
- A formal model of a complete system
- All information pertinent to deployment and
operations - Machine-readable, capturing intent of developers
and administrators - System topology
- Developer constraints
- IT policy
- Installation directives
- Health model
- Monitoring rules
- Service Level Agreements
- Reports
Applications
Application Hosts
Network Topology OS
Hardware
18IT Support Costs Limit Opportunity
IT Budgets
19What is a Health Model?
- What's not working?
- How bad is it?
- What should I do to get back to normal?
- How do I know whether I was successful?
20What is not working?
- Application
- Server
- Components
- Connectivity
- Services
- Dependencies
- Security
- SLAs
21How Bad is it?
- Application
- Server
- Task
- Business Process
- Customers
- Cost
- Scope
22Health and Diagnosability Workflow
- Detect
- Verify
- Diagnose
- Resolve
- Re-verify
- Instrumentation
- Events
- Performance Counters
- Automation / Scripting
23What is in a Health Model?
- Components
- Physical Code Elements
- Dependencies
- Managed Entity Hierarchy
- Logical objects Administrators Understand
- Objects that typically have pro-active tasks
- Aspects
- Part of Managed Entity of Interest from a
monitoring perspective - Can only be in one health state at a time
24What is in a Health Model?
- State
- Green ALL OK
- Yellow Something not right
- Red Really bad
- State Transitions
- State Indicators
- Performance Counters
- Windows Events
Event ID 2200
Queue Length gt 20
25State Transitions
Problem detected (Health metrics beyond
thresholds)
Started (All processes started)
Online
Delivered (Bits delivered to the target)
Recovered successfully (Health metrics within
thresholds)
Downloaded
Stopped (All processes stopped)
Installed/ Updated (Installation completed)
Installed (All bits successfully installed)
Installed/ Updated
Offline
Stopped (Cant diagnose the problem)
Uninstalled (All bits removed)
Shut down (All processes killed)
Uninstalled
Failed
Uninstalled (All bits successfully removed)
Failed (Cant recover, cant even stop)
26Who Builds a Health Model?
- Architect
- Developer
- Operations
- Administrator
- Information Worker
27Why Build Health Models?
- Identifies what indicators are needed for
Operations - Provides Useful Instrumentation and Knowledge
- More Manageable Applications
- Cheaper
- Easier
- Predictable
28How to Build a Health Model
- Identify steps and determine instrumentation
required to detect, verify, diagnose and recover
from bad or degraded health states - Identify Components
- Arrange into Hierarchy
- Identify Unit Monitors
- Identify State Roll-up
- Define Detectors
- Define Verification of Detectors
- Define Diagnostics Steps
- Define Resolution Steps
- Define Re-Verification steps
- Implement Health Model in Application
- Build Management Pack
29Identify Components
- Web Services
- Services
- Application Layers - Data
- Shared Components
30Arrange into Hierarchy
- Components
- Physical Code Elements
- Dependencies
- Managed Entity Hierarchy
- Logical objects Administrators Understand
- Objects that typically have pro-active tasks
- Aspects
- Part of Managed Entity of Interest from a
monitoring perspective - Can only be in one health state at a time
31Identify Unit Monitors
- Group by functional Aspects
- System, Application or Business Driven
- Example Print Queue
- Published Printers
- Print Jobs
- Queue Status
- Installation
32Identify State Roll-up
- What effect does a Unit Monitor have on its
Component? - What effect does a sub-component have on its
parent? - Most Severe
- Least Severe
- Percentage
33Define Detectors
- Events
- Performance Counters
- Other Instrumentation
- NOT Lack of Event etc
- Example
- Event ID 24
34Verify Detectors
- Document what is needed to verify the indicated
fault does exist - Automatic
- Manual Intervention
- Could be nothing
- Presence of an event is often enough
- The Detector may be the only indication there is
a problem
35Diagnostics
- Discover the cause of the problem
- Based upon detectors and verification
- Events
- event can generally indicate cause
- Performance Counters
- SLAs
- Investigation
36Resolution
- How to correct the problem
- What steps are required
- Automated
- Scripts
- Notifications
- Business Procedures
37Re-Verification
- How do you know it is fixed?
- Check that the problem has been corrected
- Document steps for operator to perform
38Implement Health Model in Application
- Detectors are Requirements
- Define Tests for Detectors
- Events
- Performance Counters
- WMI
39Build Management Pack
- Using MOM
- Health Model maps to Management Pack (MP)
- Detectors map to MP Rules
- Capture Knowledge in MP
40Health Models
- Simple monitoring service is online/offline
- ping IP or WS / synthetic transaction
- Simple monitoring not aspect-aware
- Print server is online, but published queue has
disappeared from AD - Health Model result web service performance is
poor because response from backend database is
slow - The right Instrumentation in application
41What makes a health model?
- A finite state machine in software
- State transitions defined by rules
- Reflects systems conditions
- Tasks respond to state changes
- Tasks can trigger state change
42How to build a health model
- Top-down
- For service views
- Quasi-boolean logic for state transition
- R critical, Y degraded, G OK
43DSI Foundation
44DSI System Center Roadmap
WindowsServer 2003
Windows Server 2003 R2
Windows VistaClient
WindowsServer 200x
Platform
VStudio 2005
MODEL
WSUS
SMS 2003 SP1
SMS 2003 Update
SMS v4
SMS
MODEL
MODEL
MOM v3
MOM 2005
MOM 2005 SP1
MOM
System Center Reporting Mgr v1
System Center Reporting Mgr v2
MODEL
New Tech
System Center Capacity Manager
System Center CapacityPlanner 2006
System Center Data Protection Mgr
45Roadmap