Web Servers - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Web Servers

Description:

User-agent: Mozilla/4.0. Host: www.csie.ncnu.edu.tw:8080. Connection: Keep-alive ... For non-persistent connections, the server is expected to close its side of ... – PowerPoint PPT presentation

Number of Views:544
Avg rating:3.0/5.0
Slides: 47
Provided by: csie6
Category:

less

Transcript and Presenter's Notes

Title: Web Servers


1
Web Servers
  • Herng-Yow Chen

2
Outline
  • Survey many different types of software and
    hardware web servers.
  • Describe how to write a simple diagnostic web
    server in Perl.
  • Explain how web servers process HTTP
    transactions, step by step.

3
Different types of web servers
  • General-purpose software web server
  • Web server appliances
  • Embedded web servers

4
Jobs of web servers
  • Implement HTTP and the related TCP connection
    handling.
  • Manage the server-slide resource and provide
    administrative features to configure, control,
    and enhance the web service.

5
Jobs of Operating System
  • Manages the hardware details of the underlying
    computer system
  • Provide TCP/IP network support
  • Provide filesystems to hold web resources
  • Provide process management to control computing
    activities.

6
General-purpose software web server
  • General-purpose software web servers run on
    standard, network-enabled computer system.
  • Open source software (such as Apache or W3Cs
    Jigsaw).
  • Commercial software (such as Microsofts and
    iPlanets web servers).
  • Web server software is available for just about
    every computer and operating systems.

7
General-Purpose Software Web Servers
In September 2003, the Netcaft survey
(http//www.netcraft.com/survey/)
8
Web server appliances
  • Web server appliances are prepackaged
    software/hardware solutions. The vendor
    preinstalls a software server onto a
    vendor-chosen computer platform and preconfigures
    the software.
  • Sun/cobalt Raq web appliance(http//www.cobalt.co
    m)
  • Toshiba Magnia SG10 (http//www.toshiba.com)
  • IBM Whistle web server application
    (http//www.whistle.com)
  • Appliance solutions remove the need to install
    and configuration software and often greatly
    simplify administration. However, the web server
    often is less flexible, feature-rich, and the
    serer hardware is not easily upgradable.

9
Embedded web servers
  • Embedded servers re tiny web servers intended to
    be embedded into consumer producers (e.g.,
    printers or home appliances).
  • Allow users to administer their consumer devices
    using a convenient web browser interface.
  • IPic match-head sized web server
  • (http//www-ccs.cs.umass.edu/shri/iPic.html)
  • NetMedia SitePlayer SP1 Ethernet web server
  • (http//www.siteplayer.com)

10
A Minimal Perl Web server
  • Type-o-serve a minimal Perl web server used for
    HTTP debugging
  • http//www.http-guide.com/tools/type-o-serve.pl

11
A Minimal Perl Web Server
HTTP request message
Type-o-serve dialog
GET /blah.txt HTTP/1.1 Accept / Accept-language
en-us Accept-encoding gzip, deflate User-agent
Mozilla/4.0 Host www.csie.ncnu.edu.tw8080 Conne
ction Keep-alive
./type-o-serve.pl 8080 ltltResquest From
'www.csie.ncnu.edu.tw'gtgt GET /blah.txt
HTTP/1.1 Accept / Accept-language
en-us Accept-encoding gzip, deflate User-agent
Mozilla/4.0 Host www.csie.ncnu.edu.tw8080 Connec
tion Keep-alive ltltType Response followed by
'.gtgt HTTP/1.0 200 OK Connection
close Content-type text-plain Hi there!
HTTP response message
HTTP/1.0 200 OK Connection close Content-type
text/plain Hi there!
12
What do web servers do?
  • Set up connection
  • Receive request
  • Process request
  • Access resource
  • Construct response
  • Send response
  • Log transaction

13
What Real Web Servers Do
User space
HTTP server software process
(3)Process request
(5)Create response
(2)Receive request
(4)Access resource
(7) Log transaction
TCP/IP network stack
(1)Set up connection
Network interface
Object Storage
(6)Send response
Operating system
14
Step 1 accepting client connections
  • Handling new connections
  • Exacting client IP from a new TCP connection
  • Client hostname identification
  • Using reverse DNS
  • Determining the client user through ident
  • Some web servers support the IETF ident protocol

15
Handling new connection
  • When a client requests a TCP connection to the
    web server, the web server establishes the
    connection and determines which client is on the
    other side of the connection, extracting the IP
    address from the TCP connection. (e.g., using
    getpeername call in UNIX socket)
  • The server is free to reject and immediately
    close connections, because the client IP is
    unauthorized or is known malicious client.
  • Once a new connection is established and
    accepted, the server adds the new connection to
    its list of existing connections and prepares to
    watch for data on the connection.

16
Client host identification
  • Most web servers can be configured to convert
    client IP addresses into client hostnames, using
    reverse DNS.
  • The hostname information is used for detailed
    access control and logging.
  • Note that hostname lookups can take a long time,
    slowing down web transactions. Many
    high-performance web servers either disable
    hostname resolution or enable it only for
    particular content.
  • Ex Configuring Apache to lookup hostnames for
    HTML and CGI resources
  • HostnameLookups off
  • ltFiles \. (html htm cgi)gt
  • HostanmeLookups on
  • lt/Filesgt

17
Determining the client user through ident
  • The ident protocol let servers find out what
    username initiated an HTTP connection.
  • The username information is particularly useful
    for logging the 2nd field of the popular Common
    Log Format contains the ident username of each
    HTTP request. (RFC931, the updated ident
    specification is documented by RFC 1413).
  • If a client supports the ident protocol, the
    client listens on TCP port 113 for ident
    requests.

18
Determining the Client User Through ident
(a) Mary establishes new HTTP connection
Port 80
Port 4236
HTTP connection
(c)Server sends request
4236, 80
(b)Server establishes ident connection
Mary
Port 80
Web server
Port 113
4236, 80USERIDUNIXMARY
(d)Client returns ident response
19
Ident protocol (cont.)
  • Ident can work inside organizations, but it does
    not work well across public Internet for the
    following reasons.
  • Many client PC dont run the identd
    identification protocol daemon software.
  • The ident protocol significantly delays HTTP
    transactions.
  • Many firewalls wont permit incoming ident
    traffic.
  • The ident protocol is insecure and easy to
    fabricate.
  • The ident protocol doesnt support virtual IP
    address well.
  • There are privacy concerns about exporting client
    usernames.
  • Enable ident lookup in Apache
  • IdentityCheck on
  • Common Log Format log files typically contain
    typhens (-) in the 2nd filed if no ident
    information is available.

20
Step 2 Receiving request messages
  • As the data arrives on connections, the server
    read out the data and start parsing the request
    message.
  • Parse the request line looking for the request
    method, the specified URI, and the version
    number.
  • Read the message headers, each ending in CRLF.
  • Detect the end-of-headers blank line, ending in
    CRLF.
  • Reads the request body, if any (length specified
    by Content-Length header)
  • Internet Representations of Messages
  • Some web servers also store the request message
    in internal data structures that make the message
    easy to manipulate.

21
Receiving Request Messages
Request message being read from network
GET /specials/hychen.gif HTTP/1.0CRLF Accept
image/gifCRLF Host www.j
Internet
LF CR LF CR moc.erawdrah-seo
server
client
22
Internal Representations of Message
GET /specials/saw-blade.gif HTTP/1.0CRLF Accept
image/gifCRLF Host www.joes-hardware.comCRLF CRLF
Parse
method 1 version 1.0 uri ? header
count 2 headers ? body -
specials/saw-blade.gif
www.joes-hardware.com
Image/gif
NameHost
Value ?
NameAccept
Value ?
23
Different web server architectures
  • Single-threaded web servers
  • Multi-process and multi-threaded web servers
  • Multiplexed I/O web servers
  • Non-blocking network accessing
  • Multiplexed multi-threaded web servers

24
Connection Input/Output Processing Architectures
25
Step 3 Processing requests
  • Once the web server has received a request, it
    can process the request using method, resource,
    headers, and optional body.
  • Some method (e.g., POST) require entity body data
    in the request message. A few methods (e.g., GET)
    forbid entity body data in the request message.

26
Step 4 Mapping and Accessing resources
  • Docroot
  • Virtually hosted docroots
  • User home directory docroots
  • Directory Listings
  • Dynamic content resource maping
  • Server-Side Include (SSI)
  • Access Control

27
Docroots
  • Web servers support different kinds of resource
    mapping, but the simplest form of mapping uses
    the request URI to name a file in the web
    servers filesystem.
  • Typically, a special folder in the web server
    filesystem is reserved for web content. The
    folder is called the document root, or docroot.
  • The web server takes the URI from the request
    message and appends it to the document root.
  • The docroot setting in apache servers
  • DocumentRoot /usr/local/httpd/files
  • Servers must be careful not to let relative URLs
    back up out of a document root and expose other
    parts of the filesystem.
  • E.g., http//www.csie.ncnu.edu.tw/../

28
Docroots
/usr/local/httpd/files
Internet
Request message
GET /specials/hychen.gif HTTP/1.0 Host
www.csie.ncnu.edu.tw
Object Storage
client
Web server
Request URI /specials/hychen.gif
Server resource /usr/local/httpd/files/specials/h
ychen.gif
29
Virtually hosted docroots
  • Virtually hosted web servers host multiple web
    site on the same web server, giving each site its
    own distinct document root on the server.
  • A virtual hosted web server identifies the
    correct document root to use from the IP or
    hostname in the Host header.

30
Apaches virtual host configuration
  • ltVirtualHost www.joes-hardware.comgt
  • ServerName www.joes-hardware.com
  • DocumentRoot /docs/joe
  • TransferLog /log/joe.access_log
  • ErrorLog /logs/joe.error_log
  • lt/VirtualHostgt
  • ltVirtualHost www.marys-hardware.comgt
  • ServerName www.marys-hardware.com
  • DocumentRoot /docs/mary
  • TransferLog /log/mary.access_log
  • ErrorLog /logs/mary.error_log
  • lt/VirtualHostgt

31
Virtually hosted docroots
Internet
Request message A
GET /index.html HTTP/1.0 Host www.joes-hardware.c
om
GET /index.html HTTP/1.0 Host www.marys-antiques.
com
client
Request message B
www.joes-hardware.com www.marys-antiques.com
32
User home directory docroots
Request message A
GET /bob/index.html HTTP/1.0
/home/bob/public_html
Internet
/home/betty/public_html
GET /betty/index.html HTTP/1.0
client
Request message B
www.joes-hardware.com www.marys-antiques.com
33
User home directory docroots
  • Another common use of docroots gives people
    private web site on a web server.
  • A typical convention maps URIs whose paths begin
    with a slash and tilde (/) followed by a
    username to a private document root for that
    user.
  • The private docroot is often the folder called
    public_html inside that users home directory,
    but it can be configured differently (e.g., in
    the NCNU web server, we use WWW as the users
    private document root.)
  • In apaches configuration,
  • UserDir public_html

34
Directory listings
  • A web serer can receive request for directory
    URLs, where the path resolves to a directory, not
    a file.
  • Most web servers can be configured to take a few
    different actions when a client requests a
    directory URL
  • Return an error.
  • Return a special, default, index file instead
    of the directory.
  • Scan the directory, and return an HTML page
    containing the contents.
  • Most web servers look for a file named index.html
    or index.htm inside a directory to represent that
    directory.
  • In apache configuration
  • DirectoryIndex index.html index.htm home.html
    home.html index.cgi
  • Disable the automatic generation of directory
    index files with the apache directive
  • Option -Indexes

35
Dynamic content resource mapping
  • Web server also can map URIs to dynamic resources
    that is, to programs that generate content on
    demand.
  • In fact, a whole class of web servers called
    application servers connect web servers t
    sophisticated backend applications.
  • The web server need to be able to tell when a
    resource is a dynamic resource, where the dynamic
    content generator program is located, and how to
    runt he program.
  • In apaches configuration
  • ScriptAlias /cgi-bin/ /usr/lcoal/etc/httpd/cgi-pro
    grams/
  • AddHandler cgi-script .cgi
  • CGI is an early, simple, and popular interface
    for executing server-side applications. Modern
    application servers have more powerful and
    server-side dynamic content support, including
    Active Server Pages, java servlets, and PHP.

36
Dynamic Content Resource Mapping
Internet
client
server
37
Server-Side Includes (SSI)
  • Many web servers also provide support for
    server-side includes.
  • If a resource is flagged as containing
    server-side includes, the server processes the
    resource contents before sending them to the
    client.
  • The content are scanned for certain special
    patterns, which can be variable name or embedded
    scripts. The special patterns are replaced with
    the values of variables or the output of
    executable scripts.
  • This is an easy way to create dynamic content.

38
Access controls
  • Web servers also can assign access controls to
    particular resource.
  • When a request arrives for an access-controlled
    resource, the web server can control access based
    on the IP address of the client, or it can issues
    a password challenge to get access to the
    resource.
  • We will see more details in the later lecture
    (HTTP authentication).

39
Step 5 Building Responses
  • Once the web server has identified the resource,
    it performs the action described in the request
    method and returns the response message, which
    contains status code, response header, and a
    response body.
  • Response Entities
  • MIME Typing
  • Redirection

40
Response entities
  • If the transaction generated a response body, the
    content is sent back with the response message,
    which usually contains
  • a Content-Type header, i.e. MIME typing
  • a Content-Length header, describing body size
  • The actual message body content

41
MIME typing
  • The web server is responsible for determining the
    MIME type of the response body.
  • There are many ways to configure servers to
    associate MIME types with resources
  • mime.types extension-based type association
  • Magic typing content-based association, scanning
    a known patterns
  • Explicit typing force particular files or
    directory contents to have a MIME types,
    regardless of the file extension or contents.
  • Type negotiation server is configured to store a
    resource in multiple document formats. In a
    client-server negotiation process the server can
    determine the best format to use.

42
MIME Typing
hychen.gif file
HTTP request message contains the command and the
URI
GET /specials/hychen.gif HTTP/1.1 Host
www.csie.ncnu.edu.tw
www.csie.ncnu.edu.tw
client
43
Redirection
  • Web servers sometimes return redirection
    responses (indicated by a 3XX return code)
    instead of success messages. The Location
    response header contains a URI for the new or
    preferred location of the content. Redirections
    are useful for
  • Permanently moved resources
  • Temporarily moved resources
  • URL augmentation
  • Load balancing
  • Server affinity
  • Canonicalizing directory names

44
Step 6 Sending Responses
  • The servers may have many connections to many
    clients, some idle, some sending data to the
    server, and some carrying response data back to
    the clients.
  • The servers needs to keep track of connection
    state and handle persistent connections with
    special care.
  • For non-persistent connections, the server is
    expected to close its side of connection when the
    entire message is sent.
  • For persistent connections, the connection may
    stay open, in which case the server needs to be
    extra cautious to compute the Content-Length
    header correctly, or the client will have no way
    of knowing when a response ends.

45
Step 7 Logging
  • Finally, when a transaction is complete, the web
    server notes an entry into a log file, describing
    the transaction performed.
  • Most web servers provide several configurable
    forms of logging. (Later lectures for details)

46
Reference Web server
  • http//www.apache.org
  • The apache web site
  • http//www.w3c.org/Jigsaw
  • Jigsaw- W3Cs Server
  • http//www.ietf.org/rfc/rfc1413.txt
  • RFC 1413, Identification Protocol, By M. St.
    Johns.
Write a Comment
User Comments (0)
About PowerShow.com