Title: EarthLink and Micromuse: Growing up Together
1EarthLink and Micromuse Growing up Together
- Doug McClure
- EarthLink Operations
- Sr. Manager, Fault and Performance Mgmt
- June 3, 2004
2Overview
- One of the Nations Largest ISPs
- Headquarters in Atlanta, GA
- Key facilities in Dallas, TX, Pasadena and San
Jose, CA, Knoxville, TN and Seattle, WA - Profitable, strong balance sheet
- Largest DSL footprint
- First-to-market with products that provide the
best possible Internet experience - Customer Advocacy Fighting SPAM with technical
solutions, litigation, legislative support,
industry collaboration and consumer education - Howard Carmack, aka the "Buffalo Spammer," was
sentenced to 3-1/2 to seven years in prison on
May 27th after EarthLink received a 16.4M civil
judgment in May 2003 - 10th Anniversary (1994-2004)
- http//www.redefineyourworld.com
3Overview
- 5.25M Customers
- 4M Dialup (Premium 3.5M, Value 500K)
- 1.2M Broadband (Cable, xDSL)
- 160K Web Hosting (Unix, Windows)
- 50K Wireless (Blackberry, PDA, Laptops, Wi-Fi)
- Dial Access Coverage 90 of US Population
- 16K Local Dial Access Numbers
- 500K Active Modem Ports (50 ELNK, 50
Outsourced) - 400 PoPs (18 Core Backbone PoPs, four data
centers) - Broadband Coverage
- 200 Markets with Broadband Offerings
- Large and Diverse Infrastructure
- 2300 Network Elements
- 1500 Server Elements
- Thousands of Access Circuits, Hundreds of WAN
Circuits
4Overview
- Access Technology Innovation
- Premium and Value Dial-up
- Broadband (Cable, xDSL, Satellite)
- Voice (Converged Devices, VoIP)
- Wireless (WiFi, CDMA, Blackberry, PDA)
- Broadband over Power Lines (BPL)
- Value Added Service and Product Innovation
- Blocker Family spamBlocker, POP-UP Blocker,
ScamBlocker, Virus Blocker, Spyware Blocker - Parental Controls
- Webmail
- Web Accelerator
5Overview
- Exceptional Customer Service
- 2003 PC Magazine Readers' Choice Awards for both
high-speed and dial-up services - 2003 highest ranking in customer satisfaction for
the second year in a row for high-speed Internet
service by J.D. Power and Associates in its
Internet Service Provider Residential Customer
Satisfaction StudySM - 2003 CNET Editors' Choice award
6Innovation Constant Change
- Drivers
- Speed to Market, Competition Do more, faster
- Quality, Performance, Support Costs
- Compliance - Sarbanes-Oxley
- Operational Challenges
- Release Management
- Change Management
- Service Level Management
7Operations Maturity Growing Up
- Production Improvement Program (PIP)
- Foundation in IT Service Management, ITIL, CobIT
- Focusing on four main areas Service Level Mgmt,
Change Mgmt, Release Mgmt, and Production
Security - Over 10 of Operations staff have now attended
ITIL Foundation Training - 1 Master Level Certified (more planned)
- 9 Practitioner Level Trained in CCR Quadrant
(pending certification results) - 114 Foundation Level Trained (most pending
certification results)
8Operations Maturity Growing Up
- Service Level Management
- NOC, Help Desk
- Set and manage expectations internal/external to
Operations - Change Management
- Provide oversight and control of the production
environment - Minimize risk and impact from change activities
- Release Management
- Development ? Operations
- Minimize poor quality production releases
- Enterprise Security
- Compliance, control, audit
9EarthLink and Micromuse Facts
- Very Early Netcool Adopter
- EarthLink (Mindspring) was Micromuses first US
customer - Began evaluating Micromuse Netcool in 1996,
official customer April 1997 - Early Innovation
- Early joint innovation and development helped
build foundation for many of Micromuses key
products - EarthLink and Micromuse are revitalizing joint
development projects with emerging service and
business activity monitoring products - Driving 3rd Party Vendor Integration
Partnerships - EarthLink requires detailed integration with
Micromuse suite much more than just sending
SNMP TRAPs - Quest Software, Compuware, PeopleSoft, Remedy,
Cisco Systems, Arbor Networks - Current Deployment
- Netcool OMNIbus, Internet Service Monitors,
Desktop Clients, Webtop, Impact, numerous
Gateways, Probes, Data Source Adaptors - Two Senior System Engineers, Three System
Engineers, Two System Analysts devoted to Fault
and Performance Management (Netcool Other) - Services provided for NOC (3 shifts, 6 per
shift), Systems Administration (3 shifts, 10 per
shift), Network Engineering
10Moving Beyond MoM and Apple Pie
- EarthLinks Early Micromuse Netcool Deployment
- Focused on Netcool as the Manager of Managers
or MoM - Needed during EarthLinks rapid growth and
expansion - Enabled event management, eliminated swivel
chair NOC - Apple Pie is Event Correlation and
Deduplication - The Netcool sweet spot was providing EarthLink
with event correlation and deduplication - Able to reduce the event stream from 100,000s to
1,000s per week - Further reduction expected to 100s per week
through use of advanced Netcool/Impact policies
and deployment of Netcool/Precision - Enables NOC and support staffs to operate
efficiently - Focus now on End-to-End Service Management
- Netcool Suite allows EarthLink to manage entire
service - Understand service relationships, service levels,
perform service modeling and service discovery - Enables impact assessment, prioritization,
understanding service delivery chain - Eliminates needle in the haystack approach of
event management - This is the problem that needs attention now
(compared to I think this is the event causing
problems)
11Service Management Complexity
Good Customer Experience? Performance?
Source EarthLink Product Group
12Service Management Complexity
- The complexity and amount of data generated from
end-to-end service management is enormous - Networks, Firewalls, Servers, Applications,
Switches, Routers, Load Balancers, Applications,
Databases, etc. - Netcool/ObjectServer is a must have for EarthLink
to effectively manage and understand EarthLinks
service event stream from end-to-end - Impact 3.0s cluster capability will enable
EarthLink to analyze, enrich, suppress, and
manage event stream regardless of our growth
Dealing with EarthLink Service Complexity
- RAD (future)
- Impact
- Precision (future)
- ObjectServer
- RAD (future)
- Impact
Source EarthLink Product Group
13The Customer IS Important
- Customer Experience Monitoring and Management
- The Micromuse Netcool Suite enables proactive,
real-time monitoring of the customers experience
for core EarthLink services - Over 14K Internet Service Monitors (ISM)
instances in operation covering all key services
(HTTP, HTTPS, SMTP, POP3, IMAP) and dedicated
customers (ICMP) - Allows for customer experience monitoring
information to be correlated, analyzed, and
presented in real-time - Micromuse Netcool/ISMs, Keynote, Compuware Client
Vantage, Quest Foglight - External/Internal Synthetic testing ? system
network element monitoring ? system and network
port monitoring - Immediate notification to support groups when
customers experience degrades
14The Business IS Important
- Business Activity Monitoring and Management
- Expands IT Operations visibility vertically and
horizontally - Ties IT Operations data and Business data
together - System Downtime vs. Contact Center Call Volume
- Real-Time Customer Subscriptions vs. Sales
Forecasts - Enables Real Time Monitoring and Management of
Business and IT processes - Change and Downtime Management
- Customer Registration Management
15Production Improvement Program
REQUEST FOR CHANGE (RFC)
Corp Project
Ops Project
Non-Project
STATUS CHANGE (3) Final Change Approval
and Implementation
STATUS CHANGE (2) Change Approval and Proj.
Service Availability
STATUS CHANGE (4) Review Changes
STATUS CHANGE (1) Prioritization, Risk Assessment
and Forward Schedule of Change
Change Mgt
Metrics Reporting
Release Mgt
Policy, Procedures, Standards Guidelines
Security Consulting
Security Assessment
Security Monitoring
Security Test Sign off
Prod Sec
Source EarthLink SLM Group
16Business Activity Monitoring
Managing the Impact of Change and Downtime
Activities on the Business and Operations
17Overview
- Drivers
- Adoption of ITIL/COBIT Best Practices for Change
Management - Production Improvement Program (PIP), SOX
Compliance, etc. - Significant change for many groups Fear,
Uncertainty, Doubt (FUD) - No Real-Time Visibility into Change/Downtime
Management Activities - Business Process
- Who, What, When, Where, Why, and How, Cost, Risk,
and Impact - Workflow Monitor Lifecycle, SLAs, Bottlenecks
Is the process enabling Operations or is it a
bottleneck? - Impact on Infrastructure False Positives,
Contact Center Call Volume (COGS) - Drive out False Positives from Production
Monitoring Systems - Huge burden on NOC and other support staff
- Desire to have Automated Remedy Trouble Ticket
Creation - Reduce time to address problems, reduces MTTR
18Overview
- Solution
- Provide Real-Time Visibility into Change/Downtime
Process - There are 12 pending and 24 scheduled change
requests for tonight, 6 are underway and 8 start
in 15 minutes or less - Create Actionable Information
- Dept. 828 has five outstanding major change
requests, attention is needed - Ensure Business Rules are Guiding/Enabling the
Process Not Hindering It - Eliminate FUD
- Report (dashboards, reports) on Process and
Impact - NOC and other support groups know whats
happening during change and downtime windows - Management has oversight and visibility
- Business understands impact of change and
downtime activity
19Implementation
- Micromuse Netcool/OMNIbus
- Custom integration with Request for Change (RFC)
and Downtime Management System - ObjectServer flexibility allows for definition of
important business and IT data in each event to
capture Change/Downtime Status - Service Impact, Business Impact, Customer Impact,
SLA, Restoral Priority, Escalation Path, etc. - Micromuse Netcool/Impact 3.0
- Impact policies build lists in real time for all
nodes listed in change/downtime request - As change/downtime activity progresses through
its lifecycle, the change/downtime Netcool event
changes states - Change/Downtime event suppression policy updates
all incoming events that match node list during
the maintenance window with Suppression Status
and Change/Downtime Reference Number - Micromuse Netcool/Webtop 1.2 RAD 2.0
- Process owner (Change/Downtime Management Group)
dashboard for monitoring and managing the overall
end-to-end process, workflow, and business impact - Business group dashboards for monitoring
change/downtime activities within area of control
(Network Engineering, MIS, etc.)
20Webtop 1.2 Presentation
21RAD 2.0 Presentation
22Netcool Event Management
Change/Downtime Request Events
Change / Downtime ID
Change / Downtime Status
Suppressed Change/Downtime Activity Events
Event Suppressed by Change / Downtime
23Future Enhancements
- Planned Netcool/Impact Policies
- COGS Impact
- Assess support cost impact due to change and
downtime activities within Operations and
Customer Support in Real-Time - Data Gap Management
- A common question Why does my chart or graph
have gaps? - The solution Annotate graphs, charts, portals,
etc. with the reason for data gaps caused by
planned change/downtime activities - How Integrate change and downtime event
information with all performance, utilization,
and capacity monitoring solutions via Impact 3.0
24Business Activity Monitoring
EarthLink Customer Registration, Provisioning,
and Fulfillment Dashboards
25RAD 2.0 Joint Development
Business Activity Monitoring Real-Time Customer
Registration Dashboard
26RAD 2.0 Joint Development
Business Activity Monitoring Real-Time Customer
Registration Dashboard
27Continuous Improvement
- Building better Network and Systems Management
- Founded Atlanta Network and Systems Management
Technical User Group (ANSMTUG) in January 2004 - http//www.ansmtug.org
- Metro-Atlanta Fortune 100, Service Providers,
Enterprise, Media, and Emerging Technology
Companies - Bell South, The Home Depot, EarthLink, Southern
Company, N2 Broadband, eDeltacom, Delta, CNN,
Cingular, ETrade, Knology Broadband, Cox
Communications - Customers helping Customers
- Use Micromuse and other NSM products better
- Collectively drive product requirements and
features into Micromuse and other NSM vendors - Special Interest Groups (SIG) Forming
- Best practices for NSM using Micromuse Netcool
Suite - Aligning NSM solutions to ITIL, MOF, CobIT, etc.
28Challenges facing Micromuse
- Product Development, Focus, and Release Cycle
- Business Monitoring (BAM, BSM, BI, BTI,
B-I-N-G-O) - Performance Monitoring Management Solution
- Features vs. New Product Finding the Right
Balance - Licensing Needs Review and Simpler Approach
- Support New Technologies Sooner Across Core
Products - Uniform Release Cycle (core architecture
components and capabilities) - Discovery, Root Cause Analysis (RCA), Next-Gen
Polling - Emerging Competition
- Service / Application Discovery RCA
- Universal Poller Concept
- Out of the Box Functionality and Updates
- Appearance of Requiring Too Much Customization
- Competition is focusing on this
- Many customers have product still on the shelf
- Ease of Use
- More out of the box, templates, examples, plug
and play, wizards,
29Closing and QA
- Closing
- QA
- Doug McClure
- Sr. Manager,
- Fault and Performance Mgmt
- EarthLink Operations
- dmcclure_at_corp.earthlink.net
- 404-748-7665 (W)
- 678-362-7712 (C)