LIS901N lecture 5: http URI and apache

About This Presentation

Title:

LIS901N lecture 5: http URI and apache

Description:

415 Unsupported Media Type. 416 Requested range not satisfiable. 417 ... Directives that define the parameters of the 'main' or 'default' server, which ... – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 75

Provided by: open6

Learn more at: https://openlib.org

Category:

more less

Transcript and Presenter's Notes

Title: LIS901N lecture 5: http URI and apache

1
LIS901N lecture 5http URI and apache

Thomas Krichel
2003-01-19

2
Structure

http
URI
apache

3
http

Stands for the hypertext transfer protocol. This
is the most important application layer protocol
on the Internet today, because it provides the
foundation for the world wide web.
defined in Fielding, Roy T., James Gettys,
Jeffrey C. Mogul, Paul J. Leach, Tim Berners-Lee
Hypertext Transfer Protocol -- HTTP/1.1''
(1999), RFC 2616

4
history

1990 version 0.9 allows for transfer of raw
data.
1996 rfc1945 defines version 1.0. by adding
attributevalue headers.
1999 rfc 2616 adds support for
hierarchical proxies
caching,
virtual hosts and some
support for persistent connections
and is more stringent.

5
http resource identification

identification of resources is assumed through
Uniform Resource Identifiers (URI).
As far as http is concerned, URIs are string.
http can use absolute'' and relative'' URIs.
A URL is a special case of a URI.

6
rfc about http

An application-level protocol for distributed,
collaborative, hypermedia information systems.
HTTP is also used as a generic protocol for
communication between user agents and
proxies/gateways to other Internet systems,
including those supported by the SMTP, NNTP, FTP,
Gopher, and WAIS protocols. In this way, HTTP
allows basic hypermedia access to resources
available from diverse applications.

7
overall operation client side

Client sends request, required items are
method
request URI
protocol version
optional items are
request modifiers
client information

8
overall operation server side

Server sends response, required items are
status line
protocol version
success or error code
optional items are
server information
body

9
middleman

intermediaries come in three flavors
proxies, i.e. forwarding agents
gateways, i.e. receiving agents
tunnels, i.e. relay points that do not change the
message such as an encryption and decryption
device

10
http assumes transport

http assumes that there is a reliable way to
transport data from one host on the Internet to
another one.
All http requests and responses are separate TCP
connections. The default is TCP port 80, but
other ports can be used.

11
Absolute http URL

the absolute http URL is
http//hostportabs_path?query
If abs_path is empty, it is /.
The scheme name "http" and the host name are
case-insensitive.
Characters other than those in the reserved''
and unsafe'' sets of RFC 2396 are equivalent to
their HEX HEX'' encoding.
optional components are in

12
character sets

A character set is a method used with one of more
tables to convert a sequence of binary digits
into a sequence of characters.
http shares the same registry as the MIME
multimedia email extensions. It is based at the
IANA, at
http//www.isi.edu/innotes/iana/
assignments/media-types/media-types
The default character set is ISO-8859-1.

13
http messages

There are two types of messages.
Requests are sent form the client to the server.
Responses are sent from the server to the client.
The generic format is the same as for email
messages
start line
message headers
empty line
body
Empty lines before the start line are ignored.
The request's start line is called the
request-line
The response start line is called the
status-line.

14
The request headers

Accept Accept-Charset
Accept-Encoding Accept-Language
Authorization Expect
From Host
If-Match If-Modified-Since
If-None-Match If-Range
If-Unmodified-Since Max-Forwards
Proxy-Authorization Range
Referer TE
User-Agent

15
The status line

The status line is a set of lines that are of
the form
HTTP-Version Status-Code Reason-Phrase
The status code is a 3-digit number used by the
computer.
The reason line is a friendly note for a human to
read.

16
Status code classe

1 Informational Request received, continuing
process
2 Success The action was successfully received,
understood, and accepted
3 Redirection Further action must be taken in
order to complete the request
4 Client Error The request contains bad syntax
or cannot be understood
5 Server error The request is valid but can not
be executed by the server

17
Error codes

100 Continue
101 Switching Protocols
200 OK
201 Created
202 Accepted
203 Non-Authoritative Information
204 No Content
205 Reset Content
206 Partial Content

18
Error codes II

300 Multiple Choices
301 Moved Permanently
302 Found
303 See Other
304 Not Modified
305 Use Proxy
307 Temporary Redirect

19
Error codes III

400 Bad Request
401 Unauthorized
402 Payment Required
403 Forbidden
404 Not Found
405 Method Not Allowed
406 Not Acceptable
407 Proxy Authentication Required
408 Request Time-out

20
Error codes IV

409 Conflict
410 Gone
411 Length Required
412 Precondition Failed
413 Request Entity Too Large
414 Request-URI Too Large
415 Unsupported Media Type
416 Requested range not satisfiable
417 Expectation failed

21
Error codes V

500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Gateway Time-out
505 HTTP Version not supported

22
Response headers

Accept-Ranges
Age
Etag
Location
Proxy-Authenticate
Retry-After
Server
Vary
WWW-Authenticate

23
Entityheaders, common to reponse and request

Allow
Content-Encoding
Content-Language
Content-Length
Content-Location
Content-MD5
Content-Range
Content-Type
Expires
Last-Modified

24
The body

The entity-body (if any) sent with an HTTP
request or response is in a format and encoding
defined by the entity-header fields.
When an entity-body is included with a message,
the data type of that body is determined via the
header fields Content-Type and Content-Encoding

25
GET and HEAD method

The GET method means retrieve whatever
information (in the form of an entity) is
identified by the Request-URI. If the Request-URI
refers to a data-producing process, it is the
produced data which shall be returned as the
entity in the response and not the source text of
the process, unless that text happens to be the
output of the process.n the response.
The HEAD method is identical to GET except that
the server MUST NOT return a message-body in the
response.

26
Conditional partial GET

The semantics of the GET method change to a
conditional GET'' if the request message
includes an
If-Modified-Since
If-Unmodified-Since
If-Match
If-None-Match
If-Range header
The semantics of the GET method change to a
partial GET'' if the request message includes a
Range header field. A partial GET requests that
only part of the entity be transferred

27
The POST method

The POST method is used to request that the
origin server accept the entity enclosed in the
request as a new subordinate of the resource
identified by the Request-URI in the
Request-Line. POST is designed to allow a uniform
method to cover the following functions
Annotation of existing resources
Posting a message to a bulletin board, newsgroup,
mailing list, or similar group of articles
Providing a block of data, such as the result of
submitting a form, to a data-handling process
Extending a database through an append operation.

28
PUT and DELETE methods

The PUT method requests that the enclosed entity
be stored under the supplied Request-URI. If the
Request-URI refers to an already existing
resource, the enclosed entity should be
considered as a modified version of the one
residing on the origin server.
The DELETE method requests that the origin server
delete the resource identified by the Request-URI.

29
URIs (background)

URI uniform resource identifier
Originally, a generalization of
URL (uniform resource locator),
URN (uniform resource name),
URC (uniform resource citation),
and potentially others,
but mainly, URL and URN

30
The difference (in theory) between URL and URN

a URL is bound to a location
when resource moves, url changes
a URN is a name
thus location independent, and, in theory,
persistent (whatever persistent means)

31
The Other View

Distinction between URL and URN is artificial
Both terms should be abolished and replaced by
URI
thus all identifier schemes would be URI
schemes (even http) and no prefix would be
necessary (URL, URN, or even URI).

32
Reasoning

Original URI philosophy
URLs were a short-term solution and URNs
long-term .
URL would be a temporary identification mechanism
until a location-independent, persistent
identifier was developed, the URN.
Now it seems
URNs wont be any more persistent than URLs.
persistence is a social problem, not a technical
problem

33
URI vs URL

The term URL or Universal Resource Locator is
not used in standards anymore. It generally means
a URI that contains a domain-name but it is
historical only.
This presentation uses the term URI exclusively.
The term URL is still sufficient to convey the
meaning but should not be used when precision is
necessary.

34
What does a URI identify?

A URI identifies a Resource.
A URI only comes into existence when it is bound
to a Resource.
A Resource is defined as anything that is
identified by a URI.
Resources only come into existence when a URI is
bound to it.
A URI cannot exist without a Resource.
A Resource cannot exist without a URI.

35
it all comes from Plato

The URI identifies an abstract Resource
formalism assumes the Platonic concept of form.
A Resource, once bound to a URI and brought into
existence, is only the abstract essence of the
real world thing we perceive.
Any physical or digital version of that Resource
is only one of all possible physical
representations of that Resource.
For example, http//openlib.org/home/krichel is a
URI for a homepage. Using language and content
negotiation it is possible to request that page
in many languages and formats. Which version is
the Resource?
Answer none of them. Each is only a
representation. It is possible to assign a URI to
even the representations. But even still, each
Resource is only the abstraction of the physical
or digital thing, not the thing itself.

36
What is resolution?

Resolution means accessing some representation
of the Resource that a URI identifies.
For http//foo.com/ it means accessing the
homepage of foo.com
For mailtokrichel_at_openlib.org it can mean
sending an email message to that address.
For URIs that contain network location
information it is simply a matter of visiting
that location and doing some function. I.e.
foo.com is the exact network host that can give
you the web page.

37
The history

Tim Berners-Lee came to the IETF in 1992 to
develop the WorldWideWeb standards. At the time
URIs were known as Universal Resource Locators.
RFC 1738 Uniform Resource Locators (URL) was
published in 1994.
RFC 1738 was updated by RFC 1808, RFC 2368, RFC
2396.
RFC 2396 Uniform Resource Identifiers (URI)
Generic Syntax is the current standard.
RFC 2396 may be updated to reflect developments
in internationalization, terminology updates, and
registration procedures.

38
Confusion

Due to misunderstandings and the formation of the
W3C separately from the IETF, there was a long
term disagreement on certain aspects of URIs,
especially when it came to Uniform Resource Names
(URNs).
A join IETF/W3C URI Interest Group was formed in
2000 to investigate work that needed to be done
with URIs in general.
That group published URIs, URLs, and URNs
Clarifications and Recommendations Report from
the joint W3C/IETF URI Planning Interest Group
(draft-mealling-uri-ig-01.txt ) which begins to
clarify the problems and proposes solutions.

39
URN Uniform Resource Names

Are defined by RFC 2141 as a particular URI
scheme with these characteristics
Permanent Once a URN is assigned to some
Resource it can never be re-assigned to something
else.
Location Independent The actual URN should not
contain any network location information such as
domain-names, IP addresses, file path-names, etc.

40
RFC2396

Berners-Lee, Tim Roy T. Fielding and Larry
Masinter (1998) Uniform Resource Identifiers
(URI) Generic Syntax'', rfc2396
A Uniform Resource Identifier (URI) is a compact
string of character for identifying an abstract
or physical resource.
They provide a simple and extensible means for
identifying a resource.

41
operations on a URI

There is a set of operations that can be applied
to URIs. For example, for a URL, the access to
the resource.
To understand if a given URI instance is valid,
we have to study the operations applied to URIs.

42
benefits of uniformity

It allows different type of resource identifiers
to be used in the same context, even when the
mechanisms used to access those resources may
differ
it allows uniform semantic interpretation of
common syntactic conventions across different
types of resource identifiers
it allows introduction of new types of resource
identifiers without interfering with the way that
existing identifiers are
it allows the identifiers to be reused in many
different contexts, thus permitting new
applications or protocols to leverage a
pre-existing, large, and widely-used set of
resource identifiers.

43
Resources and Identity in the RFC

A resource can be anything that has identity.
Not all resources are network retrievable''.
The resource is the conceptual mapping to an
entity or set of entities, not necessarily the
entity which corresponds to that mapping at any
particular instance in time.
An identifier is an object that can act as a
reference to something that has identity. In the
case of URI, the object is a sequence of
characters with a restricted syntax.

44
URI, URL, URN in the RFC

A URI can be further classified as a locator, a
name, or both. The term Uniform Resource
Locator'' (URL) refers to the subset of URI that
identify resources via a representation of their
primary access mechanism (e.g., their network
location), rather than identifying the resource
by name or by some other attribute(s) of that
resource.
The term Uniform Resource Name'' (URN)
refers to the subset of URI that are required to
remain globally unique and persistent even when
the resource ceases to exist or becomes
unavailable.

45
URN in the RFC

A URN differs from a URL in that it's primary
purpose is persistent labeling of a resource with
an identifier. That identifier is drawn from one
of a set of defined namespaces, each of which has
its own set name structure and assignment
procedures. The urn scheme has been reserved
to establish the requirements for a standardized
URN namespace, as defined in URN Syntax
RFC2141 and its related specifications.

46
transcribability

The URI syntax was designed with global
transcribability as one of its main concerns. A
URI is a sequence of characters from a very
limited set, i.e. the letters of the basic Latin
alphabet, digits, and a few special characters.
A URI may be represented in a variety of ways.

47
consequences of transcribability

A URI is a sequence of characters, which is not
always represented as a sequence of octets.
A URI may be transcribed from a non-network
source, and thus should consist of characters
that are most likely to be able to be typed into
a computer, within the constraints imposed by
keyboards (and related input devices) across
languages and locales.
A URI often needs to be remembered by people, and
it is easier for people to remember a URI when it
consists of meaningful components.

48
URI characters

URI consist of a restricted set of characters,
nota sequence of octets. The allowable characters
primarily chosen to aid transcribability and
usability both in computer systems and in
non-computer communications. Characters used
conventionally as delimiters around URI are
excluded.
In the simplest case, the original character
sequence contains only characters that are
defined in US-ASCII, and the two levels of
mapping are simple and easily invertible each
'original character' is represented as the octet
for the US-ASCII code for it, which is, in turn,
represented as either the US-ASCII character.

49
reserved characters

Many URI include components consisting of or
delimited by, certain special characters. These
characters are called reserved'', since their
usage within the URI component is limited to
their reserved purpose. If the data for a URI
component would conflict with the reserved
purpose, then the conflicting data must be
escaped before forming the URI.
they are / ? _at_ ,
They are allowed within a URI, but which may not
be allowed within a particular component of the
generic URI syntax.

50
unreserved excluded characters

Those are the characters that are allowed and
never take any special meaning. They are
the upper and lowercase letters a to z and A to
Z
the decimal digits 0 to 9
the following - _ . ! ( )
All characters that are not reserved or
unreserved are excluded
lt gt
and the blank
are excluded. They have to be escaped.

51
escaping

When you want to use a character in a URI that
not one of the excluded characters, you have to
escape it The way that this done is to write a
construction of the form
hex hex
where hex is a digit or the letters a to f
(uppercase or lowercase). The two hex characters
represent the value of the character in unicode
in hex. For example 7eis the character

52
The Semantic Web

The W3C has been developing a new architecture
that applies knowledge representation technology
to the WWW.
Using the Resource Description Framework (RDF),
Statements are made using a Subject, Predicate
and Object (very similar to Lisp and other
predicate based languages).
Each Subject, Predicate or Object are Resources
in the URI sense and are identified by URIs
within an RDF Statement using XML Namespaces.

53
example

This statement says that the Resource identified
by the URI http//openlib.org/home/krichel was
created by the person Thomas Krichel
lt?xml version"1.0"?gt ltRDF xmlns"http//www.w
3.org/1999/02/22-rdf-syntax-ns"gt ltDescription
about"http//openlib.org/home/krichel"gt ltCreator
xmlns"http//description.org/schema/"gtOra
Lassilalt/Creatorgt lt/Descriptiongt lt/RDFgt

54
The Semantic Web

The combination of Web Services and the Semantic
Web should give the Web the ability to turn any
existing Web Resource into a full node in a
purposefully built knowledge representation
system with a functional component that allows
that knowledge to be acted on.
And both are based on the simple Uniform Resource
Identifier.

55
Apache

Is a free, open-source web server that is
produced by the Apache Software Foundation, see
http//www.apache.org
It has over 50 of the market share.
It runs best on UNX systems but can run an a
Mickeysoft OS as well.
I will cover it here because it is freely
available.
I am covering version 1.3

56
Apache in debian

/etc/apache/httpd.conf in set main configuration
file.
/etc/init.d/apache action, where action is one of
start
stop
Restart
is used to fire the daemon up or down.
The daemon runs user www-data

57
Virtual host

On a single installation of apache serveral web
servers can be supported.
That means the server can behave in a different
way according to how it is being addressed.
The easiest way to implement addressing a server
in different was is through DNS host names.

58
Directives in httpd.conf

The configuration directives are grouped into
three basic sections
Directives that control the operation of the
Apache server process as a whole (the 'global
environment').
Directives that define the parameters of the
'main' or 'default' server, which responds to
requests that aren't handled by a virtual host.
These directives also provide default values for
the settings of all virtual hosts.
Settings for virtual hosts, which allow Web
requests to be sent to different IP addresses or
hostnames and have them handled by the same
Apache server process.

59
Server type

On a UNX machine, the server can either be fired
up on its own, or it can be run as part of the
overall Internet daemon inetd.
Usually standalone is used.

60
Server root

Sets the directory where apache finds its own
configuration files.
If log files names are not given as absolute
paths, they will be placen in the server root
directory.

61
Timeout

This set s the number of seconds that the server
waits for the result of a request to be comupted
before sending a timeout.
On wotan this is set to 300 seconds, this is
rather a long time, the user will have gone for
coffee by then.

62
Listen

Tells the server which port and ip address to
listen to. This can be used to have the server
only to respond to requests to a certain IP
address or to listen to a non-standard port, i.e.
Not port 80

63
Loadmodule

To extend apache, modules have written. They have
to be loaded explicitly
LoadModule module file
Where module is the name of the module and file
is the name of the file that contains the module
Looking at this gives you vital information about
what the server can do.

64
Server directives

User
Gives the user name apache runs under
Group
Gives the group name the server runs under
ServerAdmin
Email of a human who runs the default server
ServerName
The name of the default server
DocumentRoot
The top level directory of the default server

65
Directory options

Many options for a directory can be set with
ltdirectory namegt instructionsltdirectorygt
Name is the name of a directory.
Instructions can be a whole lot of stuff

66
Directory instructions

Options sets global options for the directory, it
can be
None
All
Or any of
Indexes (form directory indexes?)
Includes (all server side includes?)
FollowSymlinks (allow to follow server-side
includes)
ExecCGI (allow cgi-scripts?)
MultiViews

67
Access control

Can be part of ltdirectorygt to set directory level
access control
Example
Allow from friendly.com
Deny from evil.com
Sometimes you have to set the order, example
Order allow, deny

68
Authentication

This is used to enable password access. In that
case the authentication is handled by a file
.htaccess in the directory.
The AllowOverride instruction is used to state
what the user can do within the .htaccess file.
Depending on its values, you can password protect
a web site.
We will not discuss this further here.

69
Userdir

This sets the directory that is created by the
user in her home directory to be accessed by
requests to user.
On wotan, we have
UserDir public_html
That is the default, actually.

70
Set up permission for user home directories

ltDirectory /home//public_htmlgt
AllowOverride FileInfo AuthConfig Limit
Options Includes
Options MultiViews Indexes SymLinksIfOwnerMatc
h IncludesNoExec
ltLimit GET POST OPTIONS PROPFINDgt
Order allow,deny
Allow from all
lt/Limitgt
ltLimit PUT DELETE PATCH PROPPATCH MKCOL COPY
MOVE LOCK UNLOCKgt
Order deny,allow
Deny from all
lt/Limitgt
lt/Directorygt

71
Logs

The web server logs every transaction.
The are severeal types of logs that used to be
kept separately, in early days.
209.73.164.50 - - 26/Jan/2003091951 -0500
"GET /ramon/videos/ntsc175.html
HTTP/1.1" 206 808
Additional information may be kept in the referer
and user agent log.
The referer log may have some interesting
information on who links to your pages.

72
Alias

Is a directive to make links between things that
are seen at the URL level and the file structure
on the physical machine.
Example
Alias /home/krichel/stuff /stuff
Will show the content of /home/krichel/stuff at
the url http///stuff.
Scriptalias works in the same way but allows for
scripts to be executed.

73
Virtural hosts

Most apache directive can be wrapped in a
ltvirtualhostgt lt/virtualhostgt grouping.
This implies that the only hold for the virtual
host. Example, from wotan
ltVirtualHost gt
ServerAdmin krichel_at_openlib.org
DocumentRoot /home/connect/public_html
ServerName connections2003.liu.edu
ErrorLog /var/log/apache/connections2003-error
.log
CustomLog /var/log/apache/connectios2003-acces
s.log common
lt/VirtualHostgt