Title: Web Server Technologies
1Web Server Technologies Part I HTTP Getting
Started
Joe LimaDirector of Product Development Port80
Software, Inc. jlima_at_port80software.com
2Web Server Technologies Part I HTTP Getting
Started
Tutorial Content
Introduction to HTTP
- TCP/IP and application layer protocols
- URLs, resources and MIME Types
- HTTP request/response cycle and proxies
Setup and deployment
- Planning Web server site deployments
- Site structure and basic server configuration
- Managing users and hosts
3Web Server Technologies Part I HTTP Getting
Started
Preliminaries - Recommended Texts
Administrating Web Servers, Security
Maintenance Larson and Stephens, Prentice
Hall HTTP The Definitive Guide Gourley and
Totty, et al., OReilly Online resources are
plentiful and will be cited along the way.
4Web Server Technologies Part I HTTP Getting
Started
The Role of a Web Server
- Web servers serve various resources - As file
(document) servers - As application front ends - Other servers also provide services on the
Internet, each speaking its own protocol - - SMTP, POP, IMAP, NNTP, FTP, etc.
- Web server HTTP server
- HTTP servers serve HTTP clients (browsers and
other user agents) with the help of HTTP
intermediaries (proxies)
a box or a service?
5Web Server Technologies Part I HTTP Getting
Started
An Introduction to HTTP
- Hyper Text Transfer Protocol
- One of the application layer protocols that make
up the Internet - HTTP over TCP/IP - Like
SMTP, POP, IMAP, NNTP, FTP, etc. - The underlying language of the Web
- Three versions have been used, two are in
common use and have been specified - - RFC 1945 HTTP 1.0 (1996) - RFC 2616 HTTP
1.1 (1999)
6Web Server Technologies Part I HTTP Getting
Started
A Brief Digression on TCP/IP
HTTP sits atop the TCP/IP Protocol Stack
HTTP
Application Layer
TCP
Transport Layer
IP
Network Layer
Network Interfaces
Data Link Layer
7Web Server Technologies Part I HTTP Getting
Started
A Brief Digression on TCP/IP, cont.
- IP provides packets that are routed based on
source and destination IP addresses - TCP provides segments that ride inside the IP
packets and add connection information based on
source and destination ports
- The ports let TCP carry multiple protocols that
connect services running on default ports - HTTP on port 80
- HTTP with SSL (HTTPS) on port 443
- FTP on port 21
- SMTP on port 25
- POP on port 110
- SSH on port 22
8Web Server Technologies Part I HTTP Getting
Started
A Brief Digression on TCP/IP, cont.
- TCP also provides mechanisms to make the
connection a reliable bit pipe - 3-way handshake, sequence numbers, checksums,
control flags - A data stream is chopped up into chunks that are
reassembled, complete and in correct order on the
other endpoint of the connection - TCP segments, riding inside IP packets, carry the
chunks of data - When HTTP is the Application Layer protocol on
top of the stack, these chunks of data are the
contents of the HTTP Message
9Web Server Technologies Part I HTTP Getting
Started
A Brief Digression on TCP/IP, cont.
How an HTTP Message is delivered over TCP/IP
connection
HTTP Messages data stream is chopped up into
chunks small enough to fit in a TCP segment
GET /index.html HTTP/1.1 Host
www.hostname.com Con
The chunks ride inside TCP segments used to
reassemble them correctly on the other end of the
connection
The segments are shipped to the right destination
inside IP datagrams
10Web Server Technologies Part I HTTP Getting
Started
A Brief Digression on TCP/IP, cont.
- HTTPS (HTTP SSL/TLS)Although a different
protocol, service and port, HTTPS is usually
integrated with the Web server - FTPOften run on the same box as the HTTP server
to provide file transfer capabilities - SMTPSometimes run with Web server (email
gateways) - SSHWidely used instead of telnet for remote admin
Other application layer protocols use TCP/IP to
provide Internet services often found in company
with HTTP
11Web Server Technologies Part I HTTP Getting
Started
Introduction to HTTP - continued
- HTTP and URLs
- URLs used early on by all Internet protocols,
including various document retrieval protocols - More specifications (both from 1994) -
Uniform Resource Locators - RFC 1738 -
Universal Resource Identifiers - RFC 1630 - Hypertext came to predominate as the most
efficient way of providing access to resources
- Fast, flexible, generic, extensible -
Facilitated searching, collaboration, annotation - HTTP now the central mechanism for requesting
and serving URL based resources
12Web Server Technologies Part I HTTP Getting
Started
Introduction to HTTP - continued
- A Digression on MIME Types
- URLs point to resources (content)
- Resources are represented using different Media
Types (MIME Types) - Multipurpose Internet Mail Extensions RFC2045,6
- Should be registered with IANA (www.iana.org)
- MIME Type tells how content should be handled
- File extensions are mapped to certain MIME Types
- .html usually means a MIME Type of text/html
- .jpg usually means a MIME Type of image/jpeg
- But mapping by file extension is dependent on
local softwares conventions and might not be
shared across applications or machines
13Web Server Technologies Part I HTTP Getting
Started
Introduction to HTTP - continued
- HTTP allows MIME Type info to be passed between
client and server so both agree about the media
type of the resource - primary-type/sub-type
- The most common MIME Types used on the Web come
from the text, image and application top-level
groups - text/html, text/css
- image/gif, image/jpeg, image/png
- application/pdf, application/octet-stream
- application/x-javascript, application/x-shockwave
-flash
14Web Server Technologies Part I HTTP Getting
Started
Introduction to HTTP - continued
- HTTP servers turn URLs into resources through a
request-response cycle - User agent (client) issues an HTTP request to a
host (server) for a given resource using its URL - Server resolves the URL, acts on the
resource - Retrieves, but also launches,
modifies etc. - Server sends an HTTP response back to the
client - Usually (not always) a representation
of the requested resource - Can also be info
about the resource, its state, etc. - Each request is discontinuous with all previous
requests HTTP is stateless
15Web Server Technologies Part I HTTP Getting
Started
Basic HTTP Request/Response Cycle
HTTP Request
HTTP Response
Resource
/bar
HTTP Client
Asks for resource by its URL http//www.foo.com/b
ar.html
HTTP Server
www.foo.com
16Web Server Technologies Part I HTTP Getting
Started
An HTTP Request/Response Chain
DMZ
Internet
LAN
HTTP Server
HTTP Client
Transparent Proxies
Reverse Proxy
Egress Proxy
Network at Hosting Provider
Local DNS
External DNS Servers
Root DNS Servers
17Web Server Technologies Part I HTTP Getting
Started
Types and Uses of Proxy Servers
- Proxies are HTTP Intermediaries
- All act as both clients and servers
- Major types of proxies can be distinguished by
where they live and how they get traffic - - Explicit (e.g., Egress)
- - Transparent/Intercepting
- - Reverse/Surrogate
- Three primary uses for proxies
- - Security
- - Performance
- - Content Filtering
18Web Server Technologies Part I HTTP Getting
Started
Looking into HTTP
To really understand Web servers (and clients),
study the grammar, syntax and semantics of HTTP
requests and responses
- Look at the parts of the transaction you dont
normally see in a browser - Issue requests manually to understand how a user
agent gets resources from a server - Use protocol analyzers to spy on the HTTP
conversation - Learn to troubleshoot problems by reading and
writing HTTP
19Web Server Technologies Part I HTTP Getting
Started
Looking into HTTP - continued
- HTTP requests and responses are both types of
Internet Messages (RFC 822), and share a general
format - A Start Line, followed by a CRLF
- Request Line for requests
- Status Line for responses
- Zero or more Message Headers
- field-name field-value CRLF
- An empty line
- Two CRLFs mark the end of the Headers
- An optional Message Body if there is a payload
- All or part of the Entity Body or Entity
20Web Server Technologies Part I HTTP Getting
Started
Making a simple HTTP request
- Open a TCP connection to a host
- Can borrow telnet protocol to do this, by
pointing it at the default HTTP port (80) - C\telnet www.google.com 80
- Ask for a resource using a minimal request
syntax - GET / HTTP/1.1
- Host www.google.com
- A Host header is required for HTTP 1.1
connections, though not for HTTP 1.0
21Web Server Technologies Part I HTTP Getting
Started
A Closer Look at the Request Line
- Consists of three major parts
- The Request Method followed by a SP
- GET, POST, HEAD, TRACE, OPTIONS, PUT, DELETE and
CONNECT - Extension methods such as those specified by
WebDav (RFC 2518) - The Request URI followed by a SP
- The URL associated with the resource
- By far the most complex part of any Start Line
- Defined by intension rather than extension
- The HTTP Version followed by the CRLF
- 0.9, 1.0, 1.1
22Web Server Technologies Part I HTTP Getting
Started
A Closer Look at the Request Methods
- GET
- By far most common method
- Retrieves a resource from the server
- Supports passing of query string arguments
- HEAD
- Retrieves only the Headers associated with a
resource but not the entity itself - Highly useful for protocol analysis, diagnostics
- POST
- Allows passing of data in entity rather than URL
- Can transmit of far larger arguments that GET
- Arguments not displayed on the URL
23Web Server Technologies Part I HTTP Getting
Started
More Request Methods
- OPTIONS
- Shows methods available for use on the resource
(if given a path) or the host (if given a ) - TRACE
- Diagnostic method for assessing the impact of
proxies along the request-response chain - PUT, DELETE
- Used in HTTP publishing (e.g., WebDav)
- CONNECT
- A common extension method for Tunneling other
protocols through HTTP
24Web Server Technologies Part I HTTP Getting
Started
A Closer Look at the Request URI
- Absolute URI vs. Absolute Path
- Explicit Proxies Require Absolute URIs
- Client is connected directly to the proxy
- Protocol and host name needed to resolve request
- Grammar of the Absolute Path
- Like Absolute URI minus the http//hostname
- Initial / equivalent of the hosts document
root - In HTTP 1.1 with name-based virtual hosting Host
header directs request to appropriate document
root - Subsequent slashes left-to-right imply less
significant distinctions - The form used to query entire host
25Web Server Technologies Part I HTTP Getting
Started
A Closer Look at the Status Line
- Consists of three major parts
- The HTTP Version followed by a SP
- Just like third part of Request Line
- Status Code followed by a SP
- 5 groups of 3 digit integers indicating the
result of the attempt to satisfy the request - 1xx are informational
- 2xx are success codes
- 3xx are for alternate resource locations
(redirects) - 4xx indicate client side errors
- 5xx indicate server side errors
- The Reason Phrase followed by the CRLF
- Short textual description of the status code
26Web Server Technologies Part I HTTP Getting
Started
A Closer Look at HTTP Headers
- Headers come in four major types, some for
requests, some for responses, some for both - General Headers
- Provide info about messages of both kinds
- Request Headers
- Provide request-specific info
- Response Headers
- Provide response-specific info
- Entity Headers
- Provide info about request and response entities
- Extension headers are also possible
27Web Server Technologies Part I HTTP Getting
Started
A Closer Look at General Headers
- Connection lets clients and servers manage
connection state - Connection Keep-Alive (HTTP 1.0)
- Connection close (HTTP 1.1)
- Date when the message was created
- Date Sat, 31-May-03 150000 GMT
- Via shows proxies that handled message
- Via 1.1 www.myproxy.com (Squid/1.4)
- Cache-Control Among the most complex of
headers, enables caching directives - Cache-Control no-cache
28Web Server Technologies Part I HTTP Getting
Started
A Closer Look at Request Headers
- Host The hostname (and optionally port) of
server to which request is being sent - Required for name-based virtual hosting
- Host www.port80software.com
- Referer The URL of the resource from which the
current request URI came - Misspelled in the specification, so Sic
- Referer http//www.host.com/login.asp
- User-Agent Name of the requesting application,
used in browser sensing - User-Agent Mozilla/4.0 (Compatible MSIE 6.0)
29Web Server Technologies Part I HTTP Getting
Started
Some More Request Headers
- Accept and its variants Inform servers of
clients capabilities and preferences - Enables content negotiation
- Accept image/gif, image/jpegq0.5
- Accept- variants for Language, Encoding, Charset
- If-Modified-Since and other conditionals
- Frequently used by browsers to manage caches
- If-Modified-Since Sat, 31-May-03 150000 GMT
- Cookie How clients pass cookies back to the
servers that set them - Cookie id23432level3
30Web Server Technologies Part I HTTP Getting
Started
A Closer Look at Response Headers
- Server The servers name and version
- Server Microsoft-IIS/5.0
- Can be problematic for security reasons
- Vary Tells client proxy caches which headers
were used for content negotiation - Vary User-Agent, Accept
- Set-Cookie This is how a server sets a cookie
on a client - Set-Cookie id234 path/shop expiresSat,
31-May-03 150000 GMT secure
31Web Server Technologies Part I HTTP Getting
Started
A Closer Look at Entity Headers
- Allow Lists the request methods that can be
used on the entity - Allow GET, HEAD, POST
- Location Gives the alternate or new location of
the entity - Used with 3xx response codes (redirects)
- Location http//www.ibm.com/us/
- Content-Encoding specifies encoding performed
on the body of the response - Used with HTTP compression
- Corresponds to Accept-Encoding request header
- Content-Encoding gzip
32Web Server Technologies Part I HTTP Getting
Started
More Entity Headers
- Content-Length The size of the entity body in
bytes - Value shrinks when compression is applied
- Content-Length 24000
- Content-Location The actual URL of the resource
if different than its request URL - Often used to show the index or default page
- Content-Location http//www.foo.com/home.html
- Content-Type specifies Media (MIME) type of the
entity body - Corresponds to Accept header
- Content-Type image/png
33Web Server Technologies Part I HTTP Getting
Started
More Entity Headers
- Etag Uniquely identifies a particular instance
of a given resource - Used with conditional request headers to validate
cached instances of the resource - If-Match, If-None-Match
- Etag adkskdashjgk07563AF
- Expires Gives expiration for the instance of
the resource for use in caching - Expires Sat, 31-May-03 190000 GMT
- Last-Modified Date/time the entity was last
changed (or created) - Last-Modified Fri 30-May-03 090000 GMT
34Web Server Technologies Part I HTTP Getting
Started
Planning Web Server Deployments
- Major issues to consider when planning a Web
server or Web site deployment - What is the appropriate form of Web hosting?
- What type of server software will be used?
- What are the sizing requirements?
- How will DNS be handled?
- There are no fixed answers to any of these
questions - Planning should be guided by the goals of the
deployment and should harmonize with the related
business processes
35Web Server Technologies Part I HTTP Getting
Started
Choosing Among the Hosting Options
- Host your own
- Pro Complete control over the physical box
- Con Expensive and difficult to maintain well
- Hosting provider schemes
- Dedicated Server
- Pro Control without the hardware purchase
- Con Must manage the box remotely
- Co-located Server
- Pro Admin control of entire box
- Con Must purchase box and manage remotely
- Virtual Hosting
- Pro Cheapest and easiest to maintain solution
- Con Server is shared, admin access limited
36Web Server Technologies Part I HTTP Getting
Started
Choosing Server Software
- Beware of sectarian quarrels, especially over
performance and security - Apache has the best reputation historically
- OS started out more stable, secure and scalable
- Features rapidly extended refined via modular
and open development model - Strong administrator ethos well managed boxes
- IIS formerly favored mainly for ease of use in
less demanding environments, but 5.0 on Win2K
closed most of the remaining quality gap - Any modern HTTP server is very solid software
that is terribly vulnerable when deployed used
naively
37Web Server Technologies Part I HTTP Getting
Started
Choosing Server Software, cont.
- In real world, usually a conditioned choice if
not a forgone conclusion - Biggest single factors are type of deployment and
prior commitment to an underlying OS - Apache on UNIX and Linux predominates in
universities, research institutes and for virtual
hosting setups has majority of hosted domains - Netscape/iPlanet used to have large enterprise
market almost to itself - IIS started with smaller companies, often as part
of LAN server, but has now taken over Netscapes
leading role in the enterprise
38Web Server Technologies Part I HTTP Getting
Started
Sizing a Web Server
- Sizing is process of determining the physical
resources required to meet anticipated demand - Processing power and memory are not typically a
problem for the Web server - Basic HTTP server job of fetching files is not
processor intensive - Resource constraints on the box probably an
effect of other server-side mechanisms - Automated session management by app servers
- Manipulation of large database queries
- Lots of non-optimized code in Web applications
39Web Server Technologies Part I HTTP Getting
Started
Sizing a Web Server, cont
- Network bottlenecks
- Available bandwidth should accommodate max HTTP
operations (hits) under peak load - Assuming an average file size of 14,000 bytes
- 56K Modem could handle about 0.5 hits/sec
- T1 line (1.5Mb) could handle about 13 hits/sec
- T3 (45Mb) could handle about 400 hits/sec
- OC3 (155Mbps) could handle about 1380 hits/sec
- Bandwidth sizing should be adjusted based on your
actual request frequency and size - Assume peaks at triple the average loads
- Also watch out for collisions and overloading of
routers, switches, hubs and NICs on the network
40Web Server Technologies Part I HTTP Getting
Started
Dealing with DNS
- Making a site available by domain name requires
its registration and use of DNS - A domain name can be registered with many
different registrars - During registration, a DNS server is designated
to maintain the domains DNS records - These records propagate to other DNS servers
- DNS servers use them to resolve a domain such as
www.port80software.com to a four-octet IP address
such as 66.45.42.237 - ISPs offer DNS services you can also maintain
your own or use a 3rd party service that lets you
manage the records without running a DNS box
41Web Server Technologies Part I HTTP Getting
Started
A Simplistic Model of the DNS System
- Client asks its ISPs DNS to resolve foo.com
- That DNS asks root DNS whom to ask about foo.com
- Root DNS points to 2nd ISPs DNS
- 1st ISPs DNS asks 2nd ISPs DNS
- 2nd ISPs DNS responds with IP
- 1st ISPs DNS replies and caches
Root DNS Server
2
3
1
4
5
6
ISP DNS Server
ISP DNS Server
42Web Server Technologies Part I HTTP Getting
Started
Dealing with DNS, cont.
- You should learn to use nslookup to verify your
DNS lookups are working and troubleshoot DNS
problems - Command line utility also built into network
analyzers like free ieHTTPHeaders - C\nslookup google.com
- You can also point nslookup at specific DNS
servers to test their ability to resolve - C\nslookup
- Server 206.13.30.12
- google.com
43Web Server Technologies Part I HTTP Getting
Started
Virtual and Physical Site Structure
- Think of a site as having not one structure but
two virtual and physical - Virtual structure is described by the URLs used
to request resources from the site - This is the public view of the site the site as
visitors will see it when they browse to it - Physical structure is the organization of the
files and directories in the file system on the
host machines hard disk - This is the private view of the site seen only by
you and those users you choose to give access - It will become obvious why this distinction is
necessary to keep things straight
44Web Server Technologies Part I HTTP Getting
Started
Configuring Virtual-Physical Mappings
- The Document Root
- A directory in the file system of the host
machine where the Web server looks for the files
that constitute the Web site - Also called the root directory
- Often given an index or default document that
serves as the homepage of the site. - Corresponds to the / at the end of hostname
portion of the URL - http//www.foo.com/index.html (virtual)
- /var/www/index.html (physical)
- C\inetpub\wwwroot\index.html (physical)
45Web Server Technologies Part I HTTP Getting
Started
Configuring Virtual-Physical Mappings
- Notice how the hostname portion of the URL maps
to the same place pointed to by the physical path
that lies to the left of the the / representing
the document root - The URL is virtual to the left of the document
root, but it seems to be physical to the right of
the document root - In fact, a URL is purely virtual there is no
guarantee that the path to the right of the
document root looks this way on disk - In this simple case, virtual and physical paths
happen to coincide from the document root down,
but such is not always the case
46Web Server Technologies Part I HTTP Getting
Started
Configuring Virtual-Physical Mappings
- A virtual directory or alias in the URL path
preempts the lookup in the document root - This extends the virtual structure to the right
of (or below) the root / in the URL path - http//www.foo.com/virtual/index2.html
- /htdocs/physical/index2.html
- Here a virtual directory virtual points to a
physical directory that is outside of the
document root altogether - Nested virtual directories are also possible
47Web Server Technologies Part I HTTP Getting
Started
Configuring Virtual-Physical Mappings
- You can (and should) take advantage of this
virtual/physical distinction to - Preserve the sites URL scheme even if the
physical structure has to change - Avoids broken links due to site
expansion/revision - Manage directory and file locations in ways that
minimize security risks and facilitate backup
procedures - Reduce redundant physical directories for
supporting files - Allow developers to keep relative URLs in source
code simple
48Web Server Technologies Part I HTTP Getting
Started
Virtual Hosting
- We know the hostname part of the URL is a virtual
locator for files that live (physically) in a
sites document root - The idea of virtual hosting takes this a step
further by allowing a single server to host many
domains, each with its own document root - Two methods of virtual hosting
- Old way multiple IP addresses per server
- New way name-based using host headers
49Web Server Technologies Part I HTTP Getting
Started
Managing Users and Hosts
- Users (developers) will need remote access
allowing them to transfer files to and from the
sites physical structure - FTP (and other file transfer mechanisms) allow
the administrator to restrict this access - to sub-sections of the site
- by user account or client IP
- These restrictions should be backed up by access
control lists on the directories that enforce the
principle of least access
50Web Server Technologies Part I HTTP Getting
Started
Managing Users and Hosts
- Similar rules apply to managing access to the Web
site itself by visitors - ACLs in the Web sites physical file structure
should be set to the minimum required by the Web
server to serve the resources on the site - This gets tricky with server side programming
- If the Web site (or part of it) does not need to
be available for anonymous access from everywhere
then users, groups, hosts and IPs should be
restricted - HTTP Authentication can also be employed to
require make all or part of a site private and
require login
51Web Server Technologies Part I HTTP Getting
Started
Managing Users and Hosts
- Although HTTP authentication now offers
safeguards like checksums and password
encryption, it is not very secure - Lack of end-to-end encryption of the entire
message transmission makes hijacking, scanning
and spoofing easy - If all or part of the site requires
authentication and serious security for users
login credentials, form based authentication over
SSL is the only choice
52Web Server Technologies Part I HTTP Getting
Started
Basic SSL Configuration
- Initiate an application for a certificate from a
recognized Certificate Authority (CA) - The site (domain) owner will have to prove they
are who they say they are - Create a Certificate Signing Request (CSR)
- Contains the sites Public Key and matches up
with a Private Key that is created simultaneously
and stored on the server - Submit the request to the CA and pay up
- Retrieve the certificate and install it
- Test the certificate with an HTTPS request
53Web Server Technologies Part I HTTP Getting
Started
About Port80 Software
- Solutions for Microsoft IIS Web Servers
- Port80 software exposes control to server-side
functionality for developers, and streamlines
tasks for administrators - Increase security by locking down what info you
broadcast and blocking intruders with ServerMask
and ServerDefender - Protect your intellectual property by preventing
hotlinking with LinkDeny - Improve performance compress pages and manage
cache controls for faster load time and bandwidth
savings with CacheRight, httpZip, and ZipEnable - Upgrade Web development tools Negotiate content
based on device, language, or other parameters
with PageXchanger, and tighten code with
w3compiler. - Visit us online _at_ www.port80software.com