Title: Web Programming Course
1Web Programming Course
- Lecture 6 Distributed Programming 1
2Networking
- Early computers were highly centralized.
- Single point of failure
- User has to access the computer.
- Low cost computers made it possible to get past
these 2 primary disadvantages with a network. - Network ... communication system for
connecting end-systems
3Networking
- End-systems also known as hosts
- PCs, workstations
- dedicated computers
- network components
- Advantages of networking
- Sharing of resources
- Price/Performance
- Centralized administration
- Computers as communication tools
4Networking
- Mechanisms by which software running on two or
more endpoint can exchange messages - Java is a network centric programming language
- Java abstracts details of network implementation
behind a standard API
5Networking - Traditional Uses
- Communication (email)
- Resource Sharing
- File exchange, disk sharing
- Sharing peripherals (printers, tape drives)
- Remote execution
6Networking - New(er) Uses
- Information sharing
- Peer-to-Peer computing
- Entertainment, distributed games
- E-Commerce
- Collaborative computing
- Forums
- Chats
- WWW
7LAN - Local Area Network
- Connects computers that are physically close
together ( lt 1 mile). - high speed
- Technologies
- Ethernet 10 Mbps, 100Mbps
- Token Ring 16 Mbps
8WAN - Wide Area Network
- Connects computers that are physically far apart
(long-haul network). - typically slower than a LAN.
- typically less reliable than a LAN.
- Technologies
- telephone lines
- satellite communications
9Client/Server Architecture
- Classical network architecture is a client/server
(C/S) architecture - A server is a process (not a machine!) waiting
for requests from a client. - A client is a process (not a machine!) sending
requests to a server and waiting for a reply. - Both client and server are software entities
10Client/Server Architecture
- Server examples
- finds a document.
- prints a file for client.
- records a transaction.
- Servers are generally more complex
- Two basic types of servers
- Iterative - handles one client at a time.
- Concurrent - handles many clients in parallel.
11Iterative Server
- Naïve server implementation is sequential.
- handles one request at a time.
- Consider a server that needs to read data from a
disk - Reading a file from disk takes a long time
- The server will be idle while it waits for the
data to be read - Other clients will be waiting
12Iterative Server
13Concurrent Server
- Threaded servers can process several requests at
once. - Each request is handled by a separate thread.
- This does not increase the overall amount of work
done, but reduces the wastage! - Threaded operation is worthwhile when threads are
expected to block, awaiting I/O operations
14Concurrent Server
15Networking Models
- Using a formal model allows us to deal with
various aspects of networking abstractly. - We will look at a popular model OSI reference
model. - ISO proposal for the standardization of the
various networking protocols (1984) - The OSI reference model is a layered model.
16Layering
- Divide a task into pieces and then solve each
piece independently (or nearly so). - Establish a well defined interface between layers
. - Major advantages
- Independence
- Extensibility
17Layered System Example Unix OS
18OSI 7-Layer Model
High level protocols
- 7 Application
- 6 Presentation
- 5 Session
- 4 Transport
- 3 Network
- 2 Data-Link
- 1 Physical
Low level protocols
19OSI Communication Model
20Communication Protocols
- Communication between two sides is defined
through a protocol - Protocol An agreed upon convention for
communication - Both sides need to understand the protocol.
- Examples TCP/IP, UDP, others
- Protocols must be formally defined and
unambiguous - Tons of documentation
21Interface vs.Peer-to-PeerProtocols
- Interface protocols describe the communication
between layers on the same side. - Peer-to-peers protocols describe the
communication between the sides at the same
layer.
22Layers
- Physical Layer transmission of raw bits over a
communication channel - Data Link Layer divides data into packets and
provides an error-free communication link - Network Layer selects path between the two
sides, fragmentation reassembly, connection
between network types
23The Transport Layer
- Transport Layer provides virtual end-to-end
links between peer processes - TCP Transmission Control Protocol
- Connection oriented
- Reliable, keeps order
- UDP User Datagram Protocol
- Connectionless
- Unreliable
- Fast
24Layers
- Session Layer establishes, manages, and
terminates sessions between applications - Presentation Layer responsible for data
compression and encryption - Application Layer anything above previous layers
(specific applications)
25The Internet
- A worldwide network connecting millions of hosts
- WAN interconnecting many LANs of various types
- Applications
- World-Wide Web
- Email
- FTP
- more and more
26The Web
- The term World-Wide Web (or simply Web) describes
a collection of pieces of information that - are stored as files on particular hosts
- can be reached by other connected hosts
- These hosts are called Web servers
27Web or Internet?
- They are not the same things.
- The Internet is a collection of computers or
networking devices connected together. - They have communication between each other.
- The Web is a collection of documents that are
interconnected by hyper-links. - These documents are provided by Web servers and
accessed through Web browsers.
28How does the Web Works?
- The Web information is stored in the Web pages
- In HTML format.
- The Web pages are stored in the hosts called Web
servers - In the Web server file system.
- The computers reading the pages are called Web
clients using specific Web browser - Most commonly Internet Explorer or Netscape.
- The Web server waits for the request from the Web
clients over the Internet - Internet Information Server (IIS) or Apache.
29HTML
- Much of the information that is found on the Web
is stored as HTML files. - HTML is a scripting language for storing
formatted text. - allows to combine other types of information
(such as images) in the documents. - Allows interconnection (links) between the
documents.
30Browsers
- Are used to display HTML documents.
- The browser is responsible for
- fetching the documents
- displaying them according to the HTML rules.
- Browsing refers to the activity of viewing Web
documents through following the links.
31Addresses
- Each communication endpoint must have an address.
- Consider 2 computers communicating over a
network - the communication protocol must be specified
- the name of the host (end-system) must be
specified - the specific process of the host must be
specified.
32URLs
- Each Web document has a unique identifying
address called a URL (Uniform Resource Locator). - A URL takes the following form
- http//cs.haifa.ac.il/courses/webp/index.htm
- URL structure ltschemegt//ltusergtltpasswordgt_at_lthostgt
ltportgt/ltpathgtltparamsgt?ltquerygtltfraggt
file
protocol
host
33URL fields
- The protocol field specifies the way in which the
information should be accessed. - The host field specifies the host on which the
information is found. - The file field specifies the particular location
on the hosts disk (path) where the file is found
and the name of the file - There could be more complex forms of URLs but we
do not discuss them
34IP Addresses
- Hostnames (i.e., URLs) are used by people.
- Network mechanisms use IP-addresses instead.
- Every host connected to the Web has a unique IP
address that identifies it. - IP addresses are
- 32-bit (4 byte) numbers
- usually written as four decimal numbers separated
by dots, e.g. 135.17.98.240, where the numbers
refer to the above 4 bytes.
35Ports
- As data traverses the Web, each packet carries
not only the address of the host but also the
port on that host to which it is aimed. - 65,536 ports are available at each host.
- A port does not represent anything physical like
a serial or parallel port. - Hosts are responsible for reading the port number
from the packets they receive to decide which
program should process that data.
36Ports
- On Unix systems, ports between 1 and 1023 are
reserved for the OS processes. - Any process can listen for connections on ports
of 1025 to 65,535 as long as the port is not
already occupied. - In Windows and Mac-OS, any process can listen to
any port.
37Well-Known Ports
- Many services run on well-known ports.
- Web HTTP servers listen for connections on port
80. - SMTP servers listen on port 25.
- Echo servers listen on port 7.
- FTP servers listen on port 21.
- Telnet servers listen on port 23.
- DayTime servers listen on port 13.
- whois servers listen on port 43.
- finger servers listen on port 79.
38Client-Server Model
Server application
Client application
Port 5746
Server machine 144.12.34.99
Client machine 190.30.42.155
39Hostnames
- However it is inconvenient for people to remember
IP addresses and ports. - Many hosts have in addition to IP address a human
readable hostname. - www.haifa.ac.il
- www.cnn.com
40Hostnames
- Hostnames have hierarchical structure.
- Hostname www.cs.haifa.ac.il, refers to the host
www in the computer science (cs) department of
the Haifa University, which is an Academic Campus
(ac) in Israel (il). - The rightmost part describes the main domain of
the host. Left to it, a sub-domain, and further
left more specific sub-domains.
41Domains
- There are generic domains
- com commercial organizations
- edu educational institutions
- gov U.S. governmental organizations
- Most countries use country domains
- il Israel
- uk United Kingdom
- jp Japan
42DNS Servers
- The mapping between the hostnames and the
corresponding IP address is done by DNS. - It is not feasible for the Web browser to hold a
table mapping all the hostnames to their
IP-addresses. - New hosts are added to the Web every day
- Hosts change their names and IP addresses.
43DNS Servers
- Web applications use a DNS (Domain Name System)
servers to map IP addresses to the hostnames. - DNS servers also use a hierarchical scheme for
naming hosts - DNS hierarchy is right-to-left
44Web Protocols
- It is a special set of rules that endpoints (both
clients and servers) in the Web use to handle
communication. - Transmission Control Protocol (TCP) To exchange
messages with other endpoints at the information
packet level. - Internet Protocol (IP) To send and receive
messages at the address level. - Hypertext Transfer Protocol (HTTP) To deliver
HTML, sound, audio files on the World Wide Web.
45HTTP Protocol
- HyperText Transfer Protocol
- Used between Web-clients (e.g., browsers) and
Web-servers - Text based
- Built on top of TCP protocol
- Stateless protocol
- No data about the communicating sides is stored
46HTTP Transaction - Request
- Client sends a request that looks like
- GET /index.html HTTP 1.0
- GET is a keyword
- Index.html is the requested document
- HTTP 1.0 is the protocol version that the client
understands - The request terminates always with \r\n\r\n.
- Client may sends optional information
- For example, ltkeywordvaluegt list
- User-Agent browser name
- Accept formats the browser understand
47HTTP Transaction - Request
- Request example
- GET /index.html HTTP 1.0
- User-Agent Lynx/2.4 libwww/2.1.4
- Accept text/html
- Accept text/plain
- In addition to GET, clients can request
- HEAD Retrieve only header for the file
- POST Send data to the server
- PUT Upload a file to the server
48HTTP Transaction - Response
- Server response
- sends status line
- HTTP/1.0 200 OK
- sends header information
- Content-type text/html
- Content-length 3022
- ...
- sends a blank line (\n)
- sends document contents (e.g., html file)
49HTTP Transaction - Response
HTTP/1.1 200 OK Date Fri, 16 Apr 2004 184813
GMT Server Apache/1.3.29 (Darwin) Last-Modified
Fri, 16 Apr 2004 101559 GMT ETag
"58db37-89-407fb25f" Accept-Ranges
bytes Content-Length 137 Connection
close Content-Type text/html lthtmlgt ltbodygt ltpgtWe
lcomelt/pgt ltimg srcsmiley.gif"gt lt/bodygt lt/htmlgt
HTTP Header
Blank line
Data
50HTTP Transaction Example
51HTTP 1.0 response codes
- 2xx Successful
- respond codes between 200-299 indicate that
respond accepted, understood and accepted - 200 OK the most popular respond indicate
success - 201 created respond to successful POST request
- 202 accepted respond to POST request, meaning
processing is not over yet - 204 no content the server successfully
processed the request, but has no content to send
back
52HTTP 1.0 response codes
- 3xx Redirection
- respond codes between 300-399 indicate that the
web browser needs to go to a different page - 301 Moved Permanently the page has moved to a
new URL. - 302 Moved Temporarily the page has moved
temporarily to a new URL. - 304 Not Modified get request with
If-Modified-Since get, such respond if requested
file has not been changed.
53HTTP 1.0 response codes
- 4xx Client Error
- respond codes between 400-499 indicate that the
server got error request - 400 Bad Request improper request
- 401 Unauthorized unauthorized request (need
username password) - 403 Forbidden the server refuses to process the
request - 404 Not Found the server cannot find the
requested page
54HTTP 1.0 response codes
- 5xx Server Error
- respond codes between 500-599 indicate that
something has gone wrong with the server, and it
cannot be fixed - 500 Internal Server Error unexpected error
occurred at the server - 501 Not Implemented the server does not have
the feature that is needed to fulfill the
request. - 502 Bad Gateway applicable only to proxies
servers - 503 Service Unavailable the server temporarily
unable to handle the request (due to overload or
maintenance)
55HTTP 1.1
- HTTP 1.1 has much more responses defined.
- HTTP 1.1 is an official standard (unlike 1.0).
- Primary improvement of version 1.1
- HTTP 1.0 open new connection or every request.
- HTTP 1.1 allows a client to send many requests
over a single connection, that remains open until
explicitly closed. Thus, overheads are reduced. - Requests and responds are asynchronous. Clients
can send many requests without waiting for
response before sending the next request.
56HTTP Daemons
- How can servers recognize incoming requests?
- In order to recognize the incoming requests, each
server runs an HTTP-daemon - Constantly running on the server
- Clients request for a service through the
servers daemon - Technically, any host connected to the Web can
act as a server by running HTTP-daemon
57Client - HTTPD interaction
- The user requests http//www.haifa.ac.il/index.htm
l - The browser contacts the HTTP-daemon running on
the host www.haifa.ac.il and requests the
document /index.html - The HTTP-daemon translates the requested name to
an access to a specific file in its local
file-system. - The server reads the file index.html from its
disk and sends its content to the client. - The client receives the document, parses it and
the browser displays it graphically.
58Client - HTTPD interaction
user requests http//www.haifa.ac.il /index.html
GET /index.html
host www.haifa.ac.il
sends the content of index.html
HTTPD application
Browser
Disk
59Proxy Servers
- Act as delegates of Web-browsers for accessing
the Web. - The browser transfers the requests for a document
to the Proxy Server - The Proxy Server contacts the relevant Web Server
and fetches the document on behalf of the
browser.
60Proxy server
proxy asks the document from the HTTPD
user requests a document
browser request the document from the proxy
sends thecontents of the document
Proxy server
Browser
Proxy application
Cache
61Proxy Servers
- Proxy servers have several advantages over direct
data access - Security
- Can be combined with a firewall to enable
restricted access to the Web. - Communication
- Enable caching of popular documents.
- Portability
- Perform mediation' between different network
protocols
62Dynamically Generated Documents
- Many Web documents should be generated
dynamically, upon requests from clients - News items
- Web-based email
- Personalized applications
- Contents of these pages can not be prepared
manually - They are generated dynamically by Common Gateway
Interface (CGI) programs
63Dynamically Generated Documents
- The HTTP request invokes a program on the server.
- The program creates a new page on the fly and
sends it to the client as a response. - This program may use details sent in the request
in order to generate the page. - The CGI programs may be written in any language
- Most popular are Perl and Java.
- HTTP server that gets request to CGI program,
usually invokes the CGI program in an independent
new process.
64Dynamically Generated Documents
user requests http//www.excite.com/search?whatso
mething
GET /search?whatsomething
host www.excite.com
sends thecontents of the document
HTTPD application
Browser
execution of a search program
65The java.net package
- The java.net package contains classes that allow
your programs to send and receive data across the
Internet. - Java java.net.InetAddress class represents an
abstraction of Web addresses. - Encapsulates an address
- Contains methods to convert IP addresses to
hostnames and vice versa.
66Parsing InetAddressess
- InetAddress object can be represented by
- its host name as a string,
- its IP address as a string,
- its IP address as a byte array
- public String getHostName()
- public String getHostAddress()
- public byte getAddress()
67Creating InetAddress objects
- try //using hostname InetAddress address
InetAddress.getByName("www.oreilly.com")
System.out.println(address) catch
(UnknownHostException e)
System.out.println("Could not find!") - OR
- try //using IP address InetAddress address
InetAddress.getByName("208.201.239.37")
System.out.println(address) catch
(UnknownHostException e)
System.out.println("Could not find!")
68Given address, find a hostname
- try
- InetAddress ia
- InetAddress.getByName("152.2.22.3")
- System.out.println(ia.getHostName())
-
- catch (Exception e)
- e.printStackTrace()
-
69Hosts with multiple addresses
- try
- InetAddress addresses
InetAddress.getAllByName("www.microsoft.com") - for (int i 0 i lt addresses.length i)
- System.out.println(addressesi)
-
- catch (UnknownHostException e)
- System.out.println("Could not find
www.microsoft.com") -
-
70Local Host
- try
- InetAddress address InetAddress.getLocalHost()
- System.out.println(address)
-
- catch (UnknownHostException e)
- System.err.println(e)
-
- Returns an InetAddress object that contains the
address of the computer the program is running
on. - In addition, local host may be accessed through
an IP address 127.0.0.1.
71InetAddress.equals()
- try
- InetAddress oreilly InetAddress.getByName("ww
w.oreilly.com") - InetAddress helio InetAddress.getByName("heli
o.ora.com") - if (oreilly.equals(helio))
- System.out.println ("www.oreilly.com is the
same as helio.ora.com") - else
- System.out.println ("www.oreilly.com is not
the same as helio.ora.com") -
- catch (UnknownHostException e)
- System.out.println("Host lookup failed.")
72Parsing InetAddressess Example
- try
- InetAddress me InetAddress.getLocalHost()
- System.out.println("My name is "
me.getHostName()) - System.out.println("My address is "
me.getHostAddress()) - byte address me.getAddress()
- for (int i 0 i lt address.length i)
- System.out.print(addressi " ")
-
- System.out.println()
-
- catch (UnknownHostException e)
- System.err.println("Could not find local
address")
73The java.net.URL class
- The java.net.URL class represents a URL.
- Accessing a documents through URL object allows
to hide protocol-dependent operations. - Protocol handler is responsible for communicating
with the server - handles any necessary negotiation with the server
- returns the actual contents of the requested
file.
74The java.net.URL class
- When a URL object is constructed
- Java looks for the appropriate protocol handler
(such as "http" or "mailto"). - It is presumed to be a part of the URL.
- If no such handler is found, the constructor
throws a MalformedURLException. - JDK 1.1 supports 10 protocols
- file, ftp, gopher, http, mailto, appletresource,
doc, netdoc, systemresource, verbatim
75Constructing URL Objects
- Java provides 4 constructors
- public URL(String u) throws MalformedURLException
- public URL(String protocol, String host, String
file) throws MalformedURLException - public URL(String protocol, String host, int
port, String file) throws MalformedURLExcepti
on - public URL(URL context, String u) throws
MalformedURLException -
76Constructing URL Objects
- URL u null
- try
- u new URL("http//cs.haifa.ac.il/courses/
webp/index.htmlInfo") -
- catch (MalformedURLException e)e.printStackTrace(
) - OR
- URL u null
- try
- u new URL("http","cs.haifa.ac.il",
"/courses/ webp/index.htmlInfo") -
- catch (MalformedURLException e)e.printStackTrace(
)
77Constructing URL Objects
- URL u null
- try
- u new URL("http","cs.haifa.ac.il", 80,
"/courses/webp/index.htmlInfo") -
- catch (MalformedURLException e)e.printStackTrace(
) - OR
- URL u1 null, u2 null
- try
- u1 new URL("http","cs.haifa.ac.il",
"/courses/ webp/index.htmlInfo") - u2 new URL(u1,"hw1.doc")
-
- catch (MalformedURLException e)e.printStackTrace(
)
78Parsing URLs
- The java.net.URL class has 5 methods to split a
URL into its component parts. - try
- u new URL("http//cs.haifa.ac.il/courses/
webp/index.htmlInfo") -
- catch (MalformedURLException e)
e.printStackTrace() - System.out.println("Protocol is "
u.getProtocol()) - System.out.println("Host is " u.getHost())
- System.out.println("Port is " u.getPort())
- System.out.println("File is " u.getFile())
- System.out.println("Anchor is " u.getRef())
-
79Parsing URLs
- If a port is not explicitly specified in the URL,
it is set to -1. - This does not mean that the connection is
attempted on port -1 (which does not exist) - This means that the default port (80) is used.
- If the anchor does not exist, it is null, so
watch out for NullPointerExceptions.
80Reading Data from a URL
- public final InputStream openStream() throws
IOException - The openStream() method opens a connection to the
specified URL - This allows to download data from the URL
- Any headers coming before the actual data are
stripped off before, as the stream is opened
81Reading Data from a URL
- try
- URL u new URL(args0)
- InputStream in u.openStream()
- in new BufferedInputStream(in)
- Reader r new InputStreamReader(in)
- int c
- while ((c r.read()) ! -1)
- System.out.print((char) c)
-
- catch (MalformedURLException e)
- System.err.println("unparseable URL")
-
- catch (IOException e)
- e.printStackTrace()
-
82openConnection()
- openConnection() opens a socket (to be defined
later) to the server - Socket facilitates direct communication with the
server . - Particularly, it gives an access to everything
sent by the server document, protocol headers,
etc.
83Reading Data from a URL
- try
- URL u new URL(args0) URLConnection uc
u.openConnection() InputStream in
uc.getInputStream(in) Reader r new
InputStreamReader(in) int c
while ((c r.read()) ! -1) - System.out.print((char) c)
-
- catch (MalformedURLException e)
- System.err.println("unparseable URL")
-
- catch (IOException e)
- e.printStackTrace()
-
84Using the Connection
- try URL u new URL(args0)URLConnection uc
u.openConnection()System.out.println("Content-
type " uc.getContentType())System.out.printl
n("Content-encoding " uc.getContentEncoding())
System.out.println("Content-length "
uc.getContentLength()) - System.out.println("Date " new Date
(uc.getDate()))System.out.println("Last
modified "new Date (uc.getLastModified()))Sys
tem.out.println("Expiration date " new Date
(uc.getExpiration())) - catch (Exception e) e.printStackTrace()
85getContent()
- getContent() returns the downloaded data as an
object. - HTML or text file usually will become some sort
of InputStream object. - Image such as GIF or JPEG will become some sort
java.awt.ImageProducer object. - Casting can be made to the appropriate type.
- getContent() uses the content-type field in the
header of the data accepted from the server.
86getContent()
- try
- URL u new URL(args0)
- try
- Object o u.getContent()
- System.out.println("I got a "
- o.getClass().getName())
-
- catch (IOException e)
- e.printStackTrace()
-
-
- catch (MalformedURLException e)
- System.err.println("unparseable URL")
-
87class URLEncoder
- The class contains a utility method for
converting a String into a "x-www-form-urlencoder"
format. - To convert a String, each character is examined
- Characters 'a' to 'z', 'A' to 'Z', and '0' to '9'
remain the same. - The space character is converted into a plus sign
''. - Other characters are converted into a 3-character
string xx, where xx is the two-digit
representation of the character. - String encode(String s) translates a string into
x-www-form-urlencoded format.
88class URLDecoder
- The class contains a corresponding class to
URLEncoder. - String decode(String s) translates a
x-www-form-urlencoded format into ASCII.
89Using Lycos Search Engine
- import java.net.
- import java.io.
- public class LycosUser
- public static void main (String args)
- String querystring ""
- for (int i 0 i lt args.length i)
- querystring argsi " "
- querystring querystring.trim()
- querystring "query"
URLEncoder.encode(querystring) -
90Using Lycos Search Engine
- try
- String thisLine
- URL u new URL("http//www.lycos.com/cgi- bi
n/pursuit?" querystring) - DataInputStream retHTML new
DataInputStream(u.openStream()) - while ((thisLine retHTML.readLine()) ! null)
- System.out.println(thisLine)
-
- catch (Exception e)
- e.printStackTrace()
-
-