Scott Andersen - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Scott Andersen

Description:

Understand what is needed vs what is 'wanted' Design from the technology ... decisions made here are ... 'Gold Plated' machines not much more reliable. S/W ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 18

Provided by: scot94

Category:

more less

Transcript and Presenter's Notes

Title: Scott Andersen

1
Designing a large mail system

Scott Andersen
Chairman, IASA Board of Education

2
Big Rocks

Design for the business
Gather requirements
Understand what is needed vs what is wanted
Design from the technology not TO the technology
Design for operations
Can I recover this?
Can I meet my SLA
Design for the next migration
Sustained innovation

3
Architectural DecisionsA matter of scope
product family architecture decisions
decisions optimize over the whole, making
tradeoffs and compromises across the products for
the overall good of the whole
product architecture decisions
Product family scope
Product B scope
Product A scope
Component scope
Component scope
decisions made here are tuned to product A
Architecture is the set of decisions that cannot
be delegated without compromising overall system
objectives.
4
Design for the business

What does the business need from the solution?
Reliability
Scalability
Process enablement
Process Improvement

5
Design for the business

What does the business want from the solution
Enable new technologies?
Mobility
Remote access
Any time, anywhere access to my email?
Reliable solution (its just there)

6
Design from the technology

Selecting the right solution
Map business needs to the solutions
Map migration considerations to the solutions
Map operations requirements to the solutions
Design Criteria The weeds of the solution
Number of users today
Size of mailbox
User growth (can it happen in this economy)?
Storage design and requirements
Amount of mail moved

7
Design from the technology

Are mobile users worse then a virus?
Alignment with the business
Users - I want to keep all my mail
Legal or HR we would like to be able to capture
the mail of specific users
Legal we do not want mail older than XX days
stored on our network
Operations we want to be able to back up this
solution every night in our available window.
Management we want a mail system that enabled
productivity and collaboration without limiting
users

8
Design for operability
System-to-Admin ratio helps understand admin
costs Tracking overall ops costs rather than head
count doesnt work Outsourcing will save ½ the
expense without solving the underlying
issue Inefficient properties 21 Average
property 1501 Live Search 2,2001 Autopilot
often cited as the solution Autopilot only part
of the solution Hardest part is addressing the
apps issues 80 of operational problems have
genesis in design development
9
Four Major Error Classes

Human operator error is the leading cause of
dependability problems in many domains

In my experience O/L S/W failures
underrepresented in above
H/W issues considerably less common
Automation reduces costs while also eliminating
admin interaction
Every interaction brings risk of error

Source D. Patterson et al. Recovery Oriented
Computing (ROC) Motivation, Definition,
Techniques, and Case Studies, UC Berkeley
Technical Report UCB//CSD-02-1175, March 2002.
Source D. Patterson
10
What does operations do?

Teams Messenger, Contacts and Storage, OSSG and
BUIT services
51 of time spent on deployment incident
management (known resolution technique)

Source Deepak Patil , WLO (8/14/2006)
11
ROC Design Priorities

Recover Oriented Computing (ROC)
Assume everything will fail
S/W H/W can fail at any time
Build redundancy at all levels in system
Scale out rather than up
Gold Plated machines not much more reliable
S/W failures dominate H/W
More workload on a single system increases
potential failure impact
Even good H/W still fails more frequently than
typical SLAs allow
Reduce operations costs
Inexpensive hardware slice
1 to 2 orders of magnitude fewer systems
engineers
Increase reliability through redundancy less
operator interaction
Goal 24x7 availability with 8x5 operations
Lights out operation is more reliable
Write system such that S/W quality problems are
reported but dont show to customers. It should
take many failures to miss SLA

12
Overall Application Design

Implement ROC Principles
Service Design Best Practices
Single-box deployment
Development and test in full environment
Quick service health check
Zero trust of underlying components assume
failure
Pod or cluster independence
Implement test ops tools disaster response
Simplicity in all things
Partition everything
Version everything

13
Dependency Management

Expect latency failures in dependent services
Run on cached data or offer degraded services
Test failure latency frequently in production
Dont depend upon features not yet shipped
It takes time to work out reliability scaling
issues
Select dependent components services
thoughtfully
On-server components need consistent quality
goals
Dependent services should be large granule
(worth sharing)
Isolate services decouple components
Contain faults within services
Assume different upgrade rates
Rather than auth on each connect, use session key
and refresh every N hours (avoids login storms)

14
Release Cycle Testing

Ship often (full release every 90 days)
Small releases ship more smoothly
Increases pace of innovation
Long stabilization periods not required in
services
Support 1 version, no-back-level support, 1
configuration, installed in 1 way, in 1
environment
Use production data to find problems (traffic
capture)
Measurable release criteria
Release criteria includes quality and throughput
data
Never deploy anything without tested roll-back
Test in production via incremental deployment
roll-back
Track all recovered errors to protect against
automation-supported service entropy
Test all error paths in integration in
production
Continue testing after release
2 to 5 load from automated testing is affordable
and finds errors FAST

15
Design for the next migration

The great Oklahoma Land Grab
Would a GPS have been an advantage?
Why are migrations like a land grab?
Can you sustain innovation?

16
Migrations many moving pieces
17
The IO of migrations
Basic it sounds simple even the word is often
applied to simple. Yet Basic represents the most
complex state an organization can be in prior to
a migration. Basic means that the overall IT
maturity is low in the sense that there is not a
lot of automation. This often occurs in older
organizations or organizations that have recently
undergone severe budget cuts. In this scenario
the migration will only be successful if the
organization is moved up a level (to
standardized) or gains a competitive advantage
from the migration. A friend of mine once
described these folks as being in job
protection mode all the time. Standardized is
our next tier - just like basic standardized
sounds like a easy state to migrate from. In
fact, while it represents a more automated IT
environment then Basic does, it is still quite
manual. Manual is the process that forces cost
and risk in migrations. So how do we help a
standardized IT shop move forward? The first
thing is to help them build a plan for a dynamic
IT environment they see the business value of
moving to a Dynamic IT environment but are
looking for the project that can lead them to
this state. A competitive advantage is always a
benefit for the customer. Rationalized represents
an organization well on the way to maturity.
This is the first state that allows for easier
migrations within the organization. Here we have
an IT team that has automated many of the
processes and procedures that involve touching
end users. This is the first step in helping
customers be consistently successful with their
migrations. Consistently successful migrations
lead to additional migrations and allows for the
move from migration to transition. Dynamic
organizations move quickly recognizing the
business value of IT improvement and leveraging
that improvement to automate and streamline
processes. A dynamic organization is ready for
transition and no longer migrates anything. They
now have moved to the concept of transition.

Write a Comment

User Comments (0)