Title: System
1System Network Administration
- Chapter 3 Service
- By Chang-Sheng Chen (200803011)
2Contents of Chapter 3
- 3.1 The Basics
- 3.1.1 Customer Requirements
- 3.1.2 Operational Requirements
- 3.1.3 Open Architecture
- 3.1.4 Simplicity
- 3.1.5 Vendor Relation
- 3.1.6 Machine Independence
- 3.1.7 Environment
- 3.1.8 Restricted Access
- 3.1.9 Reliability
- 3.1.10 Single or Multiple Servers
- 3.1.11 Centralization and Standards
- 3.1.12 Performance
- 3.1.13 Monitoring
- 3.1.14 Service Rollout
- 3.2 The Icing
- 3.2.1 Dedicated Machine
- 3.2.2 Full Redundancy
- 3.3 Conclusion
3The Basics
- The most important thing to consider at all
stages of design and deployment is the customers
requirements. - Talk to the customers and find out what their
needs and expectations are for the services. - Then, build a list of other requirements that are
only visible to the SA team. - Focus on what, rather than how.
- Service should be built on server-class machines
that are kept in a suitable environment.
4The Basics (cont.)
- Access to server machines should be restricted to
SAs for reasons of reliability and security. - An SA has several decisions to make when building
a service. - Choosing vendors and products ( software,
hardware) - Reliability, performance, etc.
5The Basics (cont.)
- Most services rely on other services.
- Understanding in detail how a service works will
give you insight into the service on which it
relies. - For example, almost every service relies on name
service (DNS). DNS relies on network, and
therefore, anything that relies on DNS also
relies on network. - A service should be built as simple as possible,
with as few dependencies as possible, to increase
reliability and make it easier to support and
maintain.
6The Basics (cont.)
- Another method of easing support and maintenance
is to use standard hardware/software, standard
configurations and have documentation in a
standard location. - A key part of implementing any new service is to
make it independent of the particular machine
73.1.1 Customer Requirements
- When building a new service, you should always
start with the customer requirements. - Gathering the customer requirements
- There are very few services that do not have
customer requirements. - DNS, authentication services, etc.
- A Service Level Agreement (SLA)
- An SLA enumerate the services that will be
provided and the level of support they receive.
8 Service Level Agreement(cont.)
- A Service Level Agreement (SLA)
- An SLA enumerate the services that will be
provided and the level of support they receive. - It typically categories problems by severity and
commits to response times for each category. - The SLA usually defines an escalation process
that increases the severity of a problem if it
has not been resolved after a specified time and
calls for managers to get involved if problems
are getting out of hand.
9Service Level Agreement (SLA)
- The SLA process is a forum for the SAs to
understand the customers expectations and to set
them appropriately, so that the customers can
understand what is and is NOT possible and why. - It is a tool to plan what resources will be
required. - The SLA should document the customers
requirements and set realistic goals for the SA
teams in terms of features, availability,
performance, and support. - It should document future needs and capacity so
that all parties will understand the growth plans.
103.1.2 Operation Requirements
- The SA team may have other requirements for the
new service that are not immediately visible to
the customers. - The administrative interface, whether it
interoperates with other existing services and
can be integrated with central service such as
authentication or directory services. - SAs also need to consider how the service scales.
- A related consideration is the upgrade path for
the service - The level of reliability
- Network performance issues
- Monitoring issues ( availability, performance,
etc.) - Budget issues
11Operation Requirements (cont.)
- Questions about an upgrade process
- Does it involve an interruption of service ?
- Does it involve touching every desktop ?
- Is it possible to rollout the upgrade slowly, to
test it on a few willing people before inflicting
it on whole organization ? - Try to design the service, so that upgrades are
easy, can be performed without service
interruption, dont require touching the
desktops, and can be rolled out slowly.
123.1.3 Open architecture
- Whenever possible, a new service should be built
around open protocols and file formats. - Any service with an open architecture can be more
easily integrated with other services that follow
the same standards. - The business case for using open protocols is
simple - it lets you build better services because you can
select from the best server and client, rather
than being forced to pick, for example, the best
client and then getting stuck with a less than
optimal server.
13Open architecture (cont.)-The ability to
decouple the client and server selections
- A better way to select protocols based on open
standards ad permit each side (i.e., client and
server) to select their own software. - Customers are free to choose the software that
best fits their own needs, biases, and even
platforms. - SAs can independently choose a server solution
based on their needs for reliability,
scalability, and manageability. - The SAs can now choose between competing server
products, rather than being locked into the
(potential difficult to manage) server software
and platform required for a particular client
application. - Open protocols provide a level playing field that
inspire competition between vendors, which
benefits you.
14Open architecture (cont.)
- Open protocols and file formats are typical quite
static (or only change in upward compatible ways)
and widely support, - giving you the maximum product choice and maximum
chance of reliable, interoperable products. - The other benefit of using open systems is that
you dont require a gateway to the rest of world. - Gateways are additional services that require
capacity planning, engineering, monitoring, and
everything else mentioned in this chapter - Case Study
- Hazards of Proprietary Email Software
- Primarily based on client user interface and
features (e.g., Graphic User Interface, etc.) and
no concerns for server management, reliability
and scalability - All messages from all users in a single large
file - Protocol Gateway Reduce Reliability
- Microsoft Exchange Server
153.1.4 Simplicity
- When architecting a new service, simplicity
should be your foremost consideration. - The simplest solution that satisfying all the
requirements will be the most reliable, easiest
to maintain, easiest to expand, and easiest to
integrate with other systems. - As the system grows, it will become complex.
Therefore, starting out as simple as possible
delays the day when a system has become too
complex. - Sometimes, one or two requirements from the
customer or SAs may add considerably to the
complexity of the system. - Reevaluate the importance of these requirements
- These requirements could be met, but at a cost to
reliability, support levels, and on-going
maintenance.
163.1.5 Vendor Relations
- When choosing hardware and software for a
service, you should be able to talk to sale
engineers from your vendors to get advices on the
best configuration from your application. - Hardware vendors sometimes have product
configurations that are tuned for particular
applications, such as database or web server. - If there is more than one server vendor in your
environment, and it seems that more than one of
your vendors has an appropriate product, You
should use the situation to your advantage. - Get those vendors biding against each other
- the same price for more performance, reliability,
or scalability - Get a better price and be able to invest the
surplus - Even if you know which vendor you will choose,
dont let them know that you have decided until
you are convinced that you have the best deal
possible.
173.1.5 Vendor Relations (cont.)
- When choosing a vendor, particularly for software
product, it is important for you to understand
the direction in which the vendor is taking the
product. - For key, central service, such as authentication
or directory services, it is essential to stay in
touch with the product direction, or you may
suddenly discover that the vendor no longer
supports your platform. - If possible, try to stick to vendors who develop
the product primarily on the platform you use,
rather than port it to other platform. - Having fewer bugs, receiving new features first,
and better support, etc.
183.1.6 Machine Independence
- For Name-based Service (Ch.6 Name Service)
- Clients should always access a service using a
generic name that is based on the function of the
service. - E.g., Smtp.nctu.edu.tw, pop3.nctu.edu.tw
- The machine should never have a primary machine
name that is functional-based, - because ultimately the function may need to move
to another machine. For example, - Primary name DcMg.nctu.edu.tw
- Alias (service) name smtp.cc.nctu.edu.tw
193.1.6 Machine Independence (cont)
- For IP address based services,
- we could also use some techniques (such as layer
4 switching) to give the machine that the service
runs on multiple virtual IP addresses in addition
to the primary real IP address. - Then the virtual address and the service can be
moved to another machine relatively easily.
203.1.7 Environment
- A fundamental piece of building a service is
providing a reasonable high level of
availability, which means placing all the
equipments associated with that service into a
data center (cf. Ch.17). - A data center provides protected power, plenty of
cooling, controlled humidity (vital in dry or
damp climates), fire suppression, and a secure
location where the machine should be free from
accidental damage or disconnection. - In addition, a server often needs much high speed
network connections (e.g., high-speed links, more
interfaces) than its clients because it needs to
be able to communicate at reasonable speeds with
many clients simultaneously. - High-speed network cabling and hardware typically
are expensive to deploy
21Environment (cont.)
- None of the components of the service should rely
on anything than runs on a machine that is not
located in the data center. - The service is only as reliable as the weakest
link in the chain of the components that need to
be working for the service to be available. - If that is the case, find a way to change the
situation - Move the machine into a data center
- Replicate that service onto a data center machine
- Remove the dependency on the less reliable
machine - Case Study
- Hazards of servers relying on Non-servers
- NFS automount
223.1.8 Restricted Access
- Restricting server access to the SA team from the
beginning is the best approach to ensure
reliability and expected performance levels. - There should be no reason for anyone to log in to
a server other than an SA performing
administrative work on the server. - The fewer people who log in to a machine, the
more stable it is. - If a customer can and becomes accustomed to
logging in to a particular server, he probably
will start running other jobs on it that take CPU
and I/O cycles away from the services, without
realizing that he is adversely affecting the
service. - E.g., NFS server
233.1.9 Reliability
- If you have redundant hardware available, use it
as effectively as you can. - The single most effective way to make a service
as reliable as possible is to make it as simple
as possible. - Find the simplest solution that meets all the
requirements. - When you are building a service at a central
location that will be accessed from remote sites,
it is particularly important to take network
topology into account. - If connectivity to the main site is down, can the
service still be available to remote sites ? - Some, Yes ?stale name service, authentication
service - Others, No, ? database or file service
243.1.10 Single or Multiple Servers
- Independent services (or daemons) should always
be on separate machines, if cost and
staffing-levels permitting. - However, if the service that you are building is
actually composed of more than one new
application or daemon and the communication
between those components is over a network, you
need to consider whether to put all of the
components on one machine or to split them across
many machines. - E.g., a website with a database, a mail system
with many filtering mechanisms (e.g., anti-spam,
anti-virus, etc.) - The choice may be determined by security,
performance, or scaling concerns.
25Single or Multiple Servers (cont.)
- In other cases, one of the components will
initially only be used for this one application,
but may later be used by other applications.
E.g., - calendar service LDAP server (Initially)
- Mail service LDAP server (later)
-
- If a service, such as LDAP, may be used by other
services in the future, it should be placed on
dedicated machines, - so that the calendar service can be upgraded and
patched independently of the (ultimately more
critical) LDAP servers.
26Single or Multiple Servers (cont.)
- Sometimes, two applications or daemons may be
completely tied together and will never be used
apart from each other. - In this situation, it makes sense to put them
both on the same machine. - E.g., mail server DNS caching server
- Video Streaming Server
- Encoding, Streaming Server
273.1.11 Centralization and Standards
- An element of building a service is centralizing
the tools, applications, and services that your
customers need. - Centralization (???/????) means that the tools,
applications, and services are primarily managed
by one central group of SAs on a single central
set of servers. - Support for these services is provided by a
central helpdesk. - Centralizing services and building them in
standard ways make them easier to support and
lower training costs. - The service should be designed and documented in
some consistent way, so that the SA answering the
support call knows where to find everything and
thus can respond more quickly.
28Centralization and Standards (cont.)
- Centralization does not preclude centralizing on
regional or organization boundaries, particularly
if each region or organization has its own
support staff. - Some services, such as e-mail, authentication
services and networks, are part of the
infrastructure and need to be centralized. - For large sites, these services can be built with
a central core that feeds information to and from
distributed regional and organizational systems. - Other services, such as file services and CPU
farms, are more naturally centralized around
departmental boundaries.
293.1.12 Performance
- From a customers view, two things are important
in any service - Does it work ? and Is it fast ?
- When designing a service, you need to pay
attention to its performance characteristics, - even though there may be many other difficult
technical challenges to overcome. - Performance expectations increase constantly as
networks, graphics, and processors get faster. - To build a service that performs well, you need
to understand how it works and perhaps look at
ways of splitting it effectively across multiple
machines.
303.1.12 Performance (cont.)
- Performance expectations increase constantly as
networks, graphics, and processors get faster. - Performance that is acceptable now, may not be
six months or a year from now. - To build a service that performs well, you need
to understand how it works and perhaps look at
ways of splitting it effectively across multiple
machines. - You also needs to consider how to scale the
performance of the system as usage and
expectation rise above what the initial system
can do.
313.1.12 Performance (cont.)
- When choosing the servers that run the service,
consider how the service works. - A lot of disk I/O ?
- More disk read than write (or vice versa)
- Keeping large tables of data in main memory ?
- Lots of fast memory and larger memory caches
- A network-based service that sends large amount
of data to clients or between servers ? - Multiple dedicated servers with high-speed
interfaces, clusters of servers, etc.
32Performance (cont.)
- Case Study
- Bad capacity planning makes a bad first
impression - Performance at remote sites (i.e., over wide area
links) - Web site (e.g., different content for Modem, T1,
High speed links, etc.) - Solution Proxy server (HTTP accelerator )
- Handset windows vs. computer windows
33Performance at remote sites
- Performance of the service for remote sites may
also be an issue. - In some cases, quality of service or intelligent
queuing mechanisms can be sufficient to make
performance acceptable. - E.g., mail relays/forwarders, web proxies, etc.
- In others, you may need to look at ways for
reducing the network traffic. - Different content on a web system for different
speed of links (e.g., text-only versions for
low-speed links (modem, T1) and graphical
versions for high-speed links, etc.)
343.1.13 Monitoring (Ch.24)
- A service is not complete and cannot be called a
service unless it is being monitored for
availability, problems, and performance and there
are capacity planning mechanisms in place. - The helpdesk, or front-line support group, must
be automatically alerted to problems with the
service so that they can start fixing them before
too many people are affected by these problems. - Likewise, the SA group should monitor the service
on an ongoing basis from a capacity planning
standpoint. - E.g., network bandwidth, server performance,
transaction rates, license and physical device
availability, etc.
35Monitoring Example- Statistics for
mail.nctu.edu.tw
36Monitoring Example (cont.)
37Monitoring Example (cont.)
38Monitoring Example (cont.)
393.1.14 Service Rollout(???? )
- Make sure the customers first impression are
positive. - The rollout and the customers first experiences
with the service will color the way that they
view the service in the future. - One of the key pieces of making a good impression
is having all of the documentation available, the
helpdesk familiar with and trained on the new
service, and all the support procedures in place. - There is nothing worse than having a problem with
a new application and finding out that no one
seems to know anything about it when you look for
help.
403.1.14 Service Rollout (cont.)
- The rollout also includes building and testing a
mechanism to install new software and
configuration settings that are needed on each
desktop. - One-some-many technique
- One ?Some ? Many
- Ideally, no new desktop software or configuration
should be required for the service, because that
is less disruptive for your customers and reduce
maintenance, - but installing new client software on the
desktops is frequently. - E.g., enabling IEEE 802.1x authentication scheme,
web browser (IE vs. Firefox) - New Trend
- Example SSL VPN vs. PPTP VPN
413.2 The Icing
- 3.2.1 Dedicated Machine
- 3.2.2 Full Redundancy
- E.g., Name Service Authentication Services
- Primary vs. Secondary (duplicate) set of servers
- Failed-over, backup
- Tightly coupled vs. loosely-coupled servers
- Load-sharing, performance-increasing
423.2.1 Dedicated Machine
- Having dedicated machines for each service
- More reliable
- Debugging easier when there are reliability
problem - Outage (??????) more limited in scope,
- And upgrades and capacity planning much easier
43Dedicated Machine (cont.)
- Sites that grow from a small company to a larger
one generally end up with one central
administrative machine. - Eventually, this machine will have to be split up
and the services spread across many servers
because of increased load. - IP address dependencies are the most difficult to
deal with when splitting services from one
machine to many. - Name service (e.g., DNS, NIS), Security service
(e.g., router of firewall rules ), etc.
443.2.3 Full Redundancy
- Consider which services will benefit your
customers most to have completely redundant and
start there. - Name service and authentication services are
typically the first services to have full
redundancy. - They are designed for secondary servers
- they are so critical
- Other critical services, such as e-mail,
printing, and networks, tend to be considered
much later because they are more complicated or
more expensive to make completely redundant.
45Full Redundancy
- Another benefit of full redundancy
- It makes upgrade procedure easier.
- A rolling update can be performed
- Case Study Design Email services for Reliability
- Incoming mail path vs. Outgoing mail path
- Mail relays vs. mail routing hosts
- Mail delivery hosts
- Firewall
46Appendix
- Background - Internet Applications
- Networking Troubleshooting Process
- Case Study
- E-mail system operations and design
considerations - Security events
47Background - Internet Applications
48Truth Depends on Interpretation (e.g., Anti-spam
or anti-virus mail filtering)
Filtering with H1(msg)
Mail Spool
Accept
Filtering With H2(msg)
Discard
- MTA Mail Transfer Agent
- MUA Mail User Agent
49?? E-mail ?????
- Incoming SMTP Gateway Farm
Internet
- Mail Filtering
- BL/GL/WL
- Auto-learn
- Outgoing SMTP Gateway Farm
50Incoming Flow of a Typical Mail System
Internet
LDA
MTA
POP3/IMAP server
Mail spool
MUA
- Netscape,
- MS-outlook, etc.
user Mail storage
Anti-virus programs
51Generic E-mail Transmission Path
SMTP
Firewall, filtering
2
3
1
4
5
6
Firewall, filtering
52A Hybrid Model for Anti-spam -- Generic Mail
Filtering
Client
(1)
Generic Mail Filtering
White List
Pass
(2)
Reject
Black List
Fail
(3)
Grey List
Mail Spool
Fail temporarily
(4)
Automatic SPAM Learning
Fail
Update
Pass
53Sample Statistics anti-spam in
mail.TN.edu.tw(http//ms2.tn.edu.tw/report/day/ )
All Msg.
SpamAssassin
ClamAV
25
27
Greylist
5
Virus
Rejected
Passed SpamLevel (6-15)
3
17
2
73
Passed
Blocked SpamLevel gt 16
54Networking Troubleshooting Process
SMTP Filtering
Router/Switch Filtering
DNS Filtering
SMTP_a
Client
Router_a
DNS_a
SMTP Filtering
Router/Switch Filtering
DNS Filtering
SMTP_b
Router_b
DNS_b
55Port-scanning summary on DNS servers of neighbor
sites
56???? DNS server ????- Sample scenario
- 2000 ?, ????????, ????? DNS servers
- ??, ????, ??????? server
- ? server-A ? security hole, ??????
- ???, ????? server-A, ????????
- ???????? abuse, postmaster ???????, ????? root
mail ?????? - ????, ?????????????????? e-mail
- ???? router ????, ?????????? DNS ??
- ????? (??)
57Multiple outgoing paths and distributed DNS
Layer-1
Layer-2
ISP-1
.com
Internet
.arpa
Others
SMTP
www, proxy
ISP-2
58 Traffic Amplifying Attacks via DNS Zone Transfer
Q zone transfer Dn n -gtsome large number
A Attacker
Q(n)
Q(1)
D1
D2
Dn
R(1)
R(n)
R(2)
V attacked site ( Victum)
59Common Terms
- Reliability (???,??? ) --From Wikipedia,
- In general, reliability (systemic def.) is the
ability of of a person or a system to perform and
maintain its functions in routine circumstances,
as well as hostile or unexpected circumstances. - The IEEE defines it as ". . . the ability of a
system or component to perform its required
functions under stated conditions for a specified
period of time."
60Common Terms
- In telecommunications and reliability theory, the
term availability has the following meanings - 1. Simply put, availability is the proportion of
time a system is in a functioning condition. - Note 1 The conditions determining operability
and committability must be specified. - Note 2 Expressed mathematically, availability is
1 minus the unavailability.
61Common Terms
- In telecommunications and reliability theory, the
term availability has the following meanings - 2. The ratio of (a) the total time a functional
unit is capable of being used during a given
interval to (b) the length of the interval. - Note 1 An example of availability is 100/168 if
the unit is capable of being used for 100 hours
in a week. - Note 2 Typical availability objectives are
specified either in decimal fractions, such as
0.9998, or sometimes in a logarithmic unit called
nines, which corresponds roughly to a number of
nines following the decimal point, such as "five
nines" for 0.99999 reliability.
62Definition of availability
- Barlow and Proschan 1975 define availability of
a repairable system as "the probability that the
system is operating at a specified time t." - Representation
- The most simple representation for availability
is as a ratio of the expected value of the uptime
of a system to the aggregate of the expected
values of up and down time, or