Internet Intranet CIS536 - PowerPoint PPT Presentation

About This Presentation
Title:

Internet Intranet CIS536

Description:

Built-In Search Engines. Built-In ImageMap Handling. Multimedia Support. Session Emulation ... Single Server With Multiple IP Addresses. Supports Multiple Languages ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 35
Provided by: anv2
Category:

less

Transcript and Presenter's Notes

Title: Internet Intranet CIS536


1
Internet / IntranetCIS-536
  • Class 4
  • Web Server Technology
  • HTTP Protocol
  • Log Files

2
Class 4 Agenda
  • Discuss Homework
  • Overview of Web Servers and Server Technology
  • HTTP
  • The Protocol For Communication Between Web
    Browser and Server
  • Log Files

3
Web Servers
  • A Basic Web Server is Just a File Server
  • Client Requests a File via HTTP Protocol
  • Server Delivers the File via HTTP Protocol
  • Server Maps URL to a Subdirectory
  • Web Server Needs Appropriate Permissions to
    Access Files/Directories
  • Supports Non-HTTP Protocols
  • FTP, Gopher, etc.
  • A Web Server is Not HTML Specific
  • Typically Identifies a Filetype by Extension
  • Or Directory Where File Exists

4
Additional Common Web Server Features
  • Additional Security Beyond That Provided by O/S
  • Scripting
  • Ability to Dynamically Create a Web Page
  • Run a Program Instead of Returning a File (CGI)
  • Return the Program Output as the Requested File
  • Administration
  • Log Files
  • Performance Monitoring

5
Advanced Web Server Features
  • Virtual Hosting
  • Allow Multiple URLs to Map to Same Computer
  • Performance Optimization
  • Caching
  • Reliability
  • Scalability
  • Proxy Servers (For Security and Performance)
  • Fetch Documents That are on Other Computers
  • Cache Them Locally
  • Allows for Easy Scalability
  • Multiple Proxy Servers Can Cache Documents From
    One Source Computer
  • Embedded Scripting
  • Server Side Includes
  • Custom Scripting Languages
  • Server API

6
Web Servers Added Functionality
  • Database Connectivity
  • SQL, MySQL
  • Directory Listings
  • Icons, etc.
  • Built-In Search Engines
  • Built-In ImageMap Handling
  • Multimedia Support
  • Session Emulation
  • Streaming Multimedia
  • Advanced Security
  • Encrypted HTTP
  • S-HTTP (Secure HTTP) CommerceNet
  • SSL (Secure Sockets Layer) - Netscape
  • Web Server Add-Ons
  • CGI Substitutes / CGI Optimizations
  • Cold Fusion

7
Web Server History
  • All Web Servers Have a Common Root
  • httpd (NCSA)
  • UNIX Orientation
  • Many Features are Essentially UNIX Features
  • Apache
  • Website (OReilly)
  • Netscape Enterprise Server
  • Microsoft Internet Information Server
  • A Slew of Others

8
Apache
  • UNIX Origins Now Ported to NT
  • Evolved From httpd
  • Freeware
  • Typical UNIX Application
  • Public Source Code
  • Many Defaults, Conventions
  • BUT All is Configurable
  • No GUI Interface
  • Configured via Scripts, Shell Commands, Config
    Files
  • Various Flavors
  • Many Optional Features
  • API
  • ApacheSSL

9
IIS / Netscape
  • Microsoft IIS
  • Not Strictly Derived From httpd/Apache
  • Windows NT
  • However Functionally Very Similar to Apache
  • Emulates Many UNIX Conventions
  • E.g. Forward Slashes
  • Configuration via GUI
  • Personal Web Server
  • Peer Web Server
  • Netscape
  • Multi-Platform
  • UNIX is Preferred Platform
  • Less Open Than Apache
  • More Secure?

10
UNIX File Structure
  • Forward Slashes (/) to Separate Filenames,
    Directories
  • Case Sensitive File Names
  • Windows is Not
  • No Limit on Filename Size / Extensions
  • Extensions are by Convention
  • Root is /
  • User Home Directory is /
  • Symbolic Links / Aliases
  • Directories Can Be Spread Over Multiple Drives
  • Can Create Non-Hierarchical Structure
  • File Permissions
  • Read, Write, Execute
  • Separate Permissions for Owner, Group, All
  • Directories are Special Cases of Files
  • Execute Permissions Able to Browse Directory

11
Web Server Configuration
  • Directory Structure
  • Virtual Document Tree
  • Access to User Directories
  • UNIX user
  • Symbolic Links
  • Be Careful May Link You Out of Directory
    Structure
  • Case Sensitivity
  • Ownership Access
  • Server is a Process Started by a User.
  • Has the Permissions of the User Who Started It.
  • Default Documents
  • Allow Directory Browsing
  • Scripting
  • Who is Allowed to Run Scripts?
  • How are Scripts Identified?

12
Web Server File Access Control / Security
  • Directory
  • O/S Level Security
  • IP, Domain Level Security
  • Spoofing
  • Directory Access
  • .htaccess
  • Microsoft Front-Page Extensions
  • Encryption
  • S-HTTP
  • Web Protocols Only
  • SSL
  • TCP/IP Level
  • V1.0 V2.X Security Holes Found, Fixed
  • V3.0 Is Current
  • Uses Port 443
  • Microsoft PCT
  • Response to Holes in SSL 2.0
  • Now Use SSL

13
Server Administration
  • Need Sysadmin and O/S Expertise
  • Lots of Holes Gotchas Whenever Scripts are
    Allowed
  • FTP
  • Who is Allowed to Change Documents?
  • Who is Allowed to Change Server Configuration?
  • How do They Get Access?
  • Direct Access
  • Remote Access (e.g. FTP)
  • Log Files
  • Accessibility
  • Directory Structure
  • Management

14
HTTP
  • The Protocol For Requesting and Delivering Web
    Pages
  • Not Restricted to Returning HTML Files
  • Client Server Model
  • Request / Reponse
  • TCP/IP Protocol Using Port 80
  • Supports Other Ports, Can Be Run Over Other
    Protocols
  • Replaced FTP as the Primary Method For Internet
    File Transfer
  • Stateless
  • Uses MIME Format to Encapsulate Data
  • Message Structure Similar to SMTP Mail Messages
  • Message Header (metadata)
  • Message Body (data)
  • Separated From Header by a Blank Line
  • Browser Only Displays Body, Not Header
  • No Restrictions on Message Size / Format (as with
    SMTP)

15
HTTP Versions
  • HTTP 1.0 - Commonly Used Version
  • HTTP 1.1
  • Formalizes Many Extensions to Version 1.0
  • Supports Persistent Connections
  • Supports Compression/Decompression
  • Supports Virtual Hosting
  • Single Server With Multiple IP Addresses
  • Supports Multiple Languages
  • Supports Byte Range Transfers
  • Useful For Re-Sending Interrupted Data Transfers
  • Similar to Process Used By XMODEM, etc.

16
HTTP OVERVIEW

HTTP Request
Client (Browser)
Web Server
File System
HTTP Response HTML
HTML
CGI
Server Application
HTML
17
HTTP Commands
  • Simple Structure
  • Main Methods
  • GET ltURIgt HTTP/1.0
  • Request the File Specified By the URL
  • URI is URL Without Protocol/Port
  • HEAD
  • Request the HTTP Header Information Only
  • Dont Return the File Itself
  • POST
  • Sends Data to The Server
  • Typically Data From a Form
  • Defined, But Not Widely Implemented
  • PUT
  • DELETE
  • LINK
  • UNLINK

18
Common HTTP Header Fields
  • Additional Parameters to the HTTP Commands
  • Used in HTTP Requests
  • Accept
  • Lists the MIME Types That Client Can Accept
  • E.g. Accept text/plain, text/html or Accept
  • Accept-Charset
  • Lists Accepted Character Sets That Client Can
    Accept
  • ASCII, ISO-8859-1 Are Assumed
  • Accept-Encoding
  • Accept-Language
  • Authorization
  • Basic UserNamePassword (Base64 Encoding)
  • Cookie
  • From
  • E-mail Address of Requesting User
  • Not Typically Used For Privacy Reasons
  • Primarily Used By Automated Clients (e.g. Bots)

19
Common HTTP Header Fields (2)
  • Host
  • Virtual Host One Server Handles Multiple Sites
  • If-Modified-Since
  • Only Return Data if it Has Been Modified Since
    This Date
  • Pragma
  • General Purpose For Additional Headers Not in
    Standard
  • Referrer
  • The URL That Referred One to This URL
  • User-Agent
  • Name/Version of the HTTP Client
  • Used in HTTP Responses
  • Allow
  • Lists the Available Commands Supported by Server
  • Content-Encoding
  • Allows for Passing Data in Compressed Formats
  • Content-Language
  • Describes the Natural Language of the Intended
    Audience

20
Common HTTP Header Fields (3)
  • Content-Length
  • Size of the Message Body
  • Content-Type
  • The MIME Type For the Data
  • Date
  • Expires
  • HTTP Clients Should Not Cache Data After This
    Date
  • Last-Modified
  • Location
  • Used For Redirection
  • MIME-Version
  • Pragma
  • E.g. no-cache
  • Retry-After
  • When Server is Unavailable. Info On When to Try
    Back
  • Server
  • Name/Version of the HTTP Server

21
Common HTTP Header Fields (4)
  • Title
  • Descriptive Title of the File
  • WWW-Authenticate
  • When Authorization Denied, Tells Client Which
    Methods of Authentication are Supported
  • HTTP Status Codes
  • Returned By the Server In First Line of Response
  • Informational (100-199)
  • Successful (200-299)
  • Redirection (300-399)
  • Location in HTTP Header Specifies Redirection
  • Client Error (400-499)
  • Server Error (500-599)

22
Common Status Values
  • 200 OK
  • 201 Created (Post Request Was Fulfilled)
  • 204 - No Content (OK. Nothing For Client to
    Display
  • 300 - Multiple Choices
  • Requested Resource Available From Multiple
    Locations.
  • List of Locations Returned in the Response.
  • 301 - Moved Permanently
  • 302 - Moved Temporarily
  • 304 - Not Modified
  • Document Hasnt Been Modified Since If-Modified
    Since Date
  • 400 - Bad Request
  • 401 Unauthorized
  • 403 - Forbidden
  • 404 Not Found
  • 500 Internal Server Error
  • 501 Not Implemented (Server Does Not Support
    ThisRequest)
  • 502 Bad Gateway (Invalid Response From Server)
  • 503 Service Unavailable

23
Cookies
  • Cookies Are Name Value Pairs
  • Stored by the Client
  • Passed in the HTTP Header
  • Cookies Have Associated Expiration
  • Session (Default)
  • Date / Time
  • Associated With a URL Path, Not a Page!
  • Allows Passing Parameters Between Web Pages
  • Thus Cookies are Used to Provide State
    Information to a Stateless Protocol

24
Web Server HTTP Functionality
  • Content Negotiation
  • Choose From Several Different Formats Based on
    Request
  • Language Negotiation
  • Choose From Versions of Same Document Based on
    Request
  • Support for HTTP-Put, HTTP-Delete
  • Keep-Alive
  • As-Is
  • Server Doesnt Add HTTP Headers
  • Allows You to Create Specific Behavior
  • Redirect to Another Site
  • Never Saved in Browsers Cache

25
Some Definitions
  • Hits
  • Each HTTP Request is a Hit
  • Accessing a Web Page May Result in Multiple Hits
  • E.g. Each Graphic is a Hit
  • Page Views
  • Accessing a Single Web Page is a Page View
  • E.g. Typing in a URL or Clicking on a Link
  • Visits
  • A Single Clients Visit to Your Entire Site
    (Session)
  • May Include Multiple Page Views
  • What Constitutes a Second Visit From the Same
    Client?
  • Why is This Important?
  • Terms are Sometimes Used Interchangeably and
    Improperly
  • Compare Apples to Apples
  • Important for Commercial Web Sites
  • Advertising is Based on Site Access
  • Typically Sold on Page View Basis

26
Server Log Files
  • Many Variations to Web Server Log File Formats
  • Four Log Files
  • Access (Transfer) Log
  • Each Hit is Recorded
  • User, Date/Time, HTTP Request, etc.
  • Error Log
  • Date/Time, Error
  • Referrer Log
  • Referring Page, Destination Page
  • Agent (User) Log
  • Clients Browser
  • Clearly a Need for Standardization
  • Linking the Four Log Files Together

27
Common Log Format
  • Host
  • IP Address (or Hostname) of Client
  • Some Servers Perform Lookup of IP Address
  • RFC931
  • HTTP Request From
  • Seldom Used.
  • Authuser
  • HTTP Request Authorization
  • UserName if Username Authorization is Required
  • Time Stamp
  • HTTP Response Date
  • E.g. 10/Jun/1998142334 -0700
  • Request
  • The Actual HTTP Request
  • E.g. GET /index.htm HTTP/1.1

28
Common Log Format (2)
  • Status
  • The HTTP Response Status Code
  • Transfer Volume
  • HTTP Response Content-Length

29
Extended Log File Format
  • Seven Common Log Format Fields Plus
  • Referrer
  • HTTP Request Referrer
  • User Agent
  • HTTP Request User-Agent
  • Identifies Browser
  • Other Common Fields
  • Cookies
  • Can Help Identify Users

30
Issues
  • Client vs. User
  • Typically Dont Have User Level Information
  • Only Record IP Address of Computer Used For
    Access
  • If Fixed IP Address For a Single Users Machine
  • This Can Identify the User
  • Dynamically Assigned IP Addresses
  • Identifies the Overall Domain (e.g. AOL.com)
  • Proxy Servers
  • All Clients Have IP Address of Proxy Server
  • Multiple Sessions at Same Time
  • Impossible to Have Truly Accurate Information
  • Log File Analysis Software Has Algorithms to
    Identify Page Views, Visits
  • Client Level Caching Affects Logs
  • ISP Level Caching Affects Logs
  • E.g. AOL Maintains a Cache
  • No Requirement for Clients, ISPs to Follow
    Expiration Info

31
Log File Maintenance on Server
  • Log Files Grow Rapidly
  • Log Files Compress Very Nicely
  • Server Configurable
  • Generate Daily/Weekly/Monthly Logs
  • Maintenance Scripts to Cleanup Log Files
  • Compress
  • Archive
  • Cycle
  • E.g. Maintain Current Months Files

32
Log File Analysis
  • Big Business
  • Bread and Butter of Sites Driven By Advertising
    Revenue
  • Evaluation Factors
  • Log File Formats Supported
  • Ability to Link Multiple Logs
  • How Log Files are Accessed (e.g. via FTP)
  • Display Methodology
  • E.g. Available Via Web Pages
  • Lookup Capabilities
  • E.g. Map User-Agent to Browser
  • E.g. Resolve IP Addresses to Domains, Regions
  • Level of Analysis
  • E.g. Calculating Visits, Return Visitors
  • Configurability
  • Drill-Down Capabilities
  • Enterprise Capabilities
  • Ability to Manage Multiple Sites

33
Log File Analysis Options
  • Important to Understand the Core Log Files
  • Log File Analysis Programs Make Some Assumptions
  • Freeware
  • Commercial
  • Service Bureaus

34
Resources
  • HTTP
  • Server Comparison
  • http//webcompare.internet.com/chart.htm
  • Apache Server
  • www.apache.org
  • Website Server
  • http//website.ora.com
  • Microsoft IIS http//www.microsoft.com/NTWorkstati
    on/downloads/Recommended/ServicePacks/NT4OptPk/Def
    ault.asp
Write a Comment
User Comments (0)
About PowerShow.com