Title: URL Programming
1URL Programming
2Agenda
- What is URL
- How to Apply URL with Java
- Encoding and Decoding
- Access Data from URL through InputStream and
OutputStream Object !! - Precisely controlling by URLConnection and
HttpURLConnection
3The java.net.URL class
- A URL object represents a URL.
- The URL class contains methods to
- create new URLs
- parse the different parts of a URL
- get an input stream from a URL so you can read
data from a server - get content from the server as a Java object
4Content and Protocol Handlers
- Content and protocol handlers separate the data
being downloaded from the the protocol used to
download it. - The protocol handler negotiates with the server
and parses any headers. It gives the content
handler only the actual data of the requested
resource. - The content handler translates those bytes into a
Java object like an InputStream or ImageProducer.
5Finding Protocol Handlers
- When the virtual machine creates a URL object, it
looks for a protocol handler that understands the
protocol part of the URL such as "http" or
"mailto". - If no such handler is found, the constructor
throws a MalformedURLException.
6Supported Protocols
- The exact protocols that Java supports vary from
implementation to implementation though http and
file are supported pretty much everywhere. Sun's
JDK 1.1 understands ten - file
- ftp
- gopher
- http
- mailto
- appletresource
- doc
- netdoc
- systemresource
- verbatim
7URL Constructors
- There are four (six in 1.2) constructors in the
java.net.URL class. - public URL(String u) throws MalformedURLException
- public URL(String protocol, String host, String
file) throws MalformedURLException - public URL(String protocol, String host, int
port, String file) throws MalformedURLException - public URL(URL context, String url) throws
MalformedURLException - public URL(String protocol, String host, int
port, String file, URLStreamHandler handler)
throws MalformedURLException - public URL(URL context, String url,
URLStreamHandler handler) throws
MalformedURLException
8Constructing URL Objects (1)
- An absolute URL like http//entry.hit.edu.tw/jame
schen/aaa.htmlbk1 - try
- URL u new URL("http/entry.hit.edu.tw/jamesc
hen/aaa.htmlbk1") -
- catch (MalformedURLException e)
- // take some action !!
9Constructing URL Objects (2)
- You can also construct the URL by passing its
pieces to the constructor, like this - URL u null
- try
- u new URL("http", entry.hit.edu.tw",
"/jameschen/aaa.htmlbk1") -
- catch (MalformedURLException e)
- // some action to take !!
10Constructing URL Objects (3) -- including the
Port
- URL u null
- try
- u new URL("http", entry.hit.edu.tw", 8000,
/jameschen/aaa.htmlbk1") -
- catch (MalformedURLException e)
- // some action !
11Relative URLs
- Many HTML files contain relative URLs.
12Constructing Relative URLs
- The fourth constructor creates URLs relative to a
given URL. For example, - try
- URL u1 new URL("http//metalab.unc.edu/index.h
tml") - URL u2 new URL(u1, "books.html")
-
- catch (MalformedURLException e)
-
- This is particularly useful when parsing HTML.
13Parsing URLs
- The java.net.URL class has five methods to split
a URL into its component parts. These are - public String getProtocol()
- public String getHost()
- public int getPort()
- public String getFile()
- public String getRef()
14For example,
- try
- URL u
- unew URL("http//entry.hit.edu.tw/jameschen/a
aa.htmlbk1") - System.out.println("The protocol is "
u.getProtocol()) - System.out.println("The host is "
u.getHost()) - System.out.println("The port is "
u.getPort()) - System.out.println("The file is "
u.getFile()) - System.out.println("The anchor is "
u.getRef()) -
- catch (MalformedURLException e)
15Parsing URLs
- JDK 1.3 adds three more
- public String getAuthority()
- public String getUserInfo()
- public String getQuery()
16Missing Pieces
- If a port is not explicitly specified in the URL
it's set to -1. This means the default port. - If the ref doesn't exist, it's just null, so
watch out for NullPointerExceptions. Better yet,
test to see that it's non-null before using it. - If the file is left off completely, e.g.
http//java.sun.com, then it's set to "/".
17Reading Data from a URL Object
- The openStream() method connects to the server
specified in the URL and returns an InputStream
object fed by the data from that connection. - public final InputStream openStream() throws
IOException - Any headers that precede the actual data are
stripped off before the stream is opened. - Network connections are less reliable and slower
than files. Buffer with a BufferedReader or a
BufferedInputStream.
18Example Webcat v1
- import java.net.
- import java.io.
- public class Webcat
- public static void main(String args)
- for (int i 0 i lt args.length i)
- try
- URL u new URL(argsi)
- InputStream in u.openStream()
- InputStreamReader isr new
InputStreamReader(in) - BufferedReader br new BufferedReader(isr)
- String theLine
- while ((theLine br.readLine()) ! null)
- System.out.println(theLine)
-
- catch (IOException e) System.err.println(e
) -
-
19The Bug in readLine()
- What readLine() does
- Sees a carriage return, waits to see if next
character is a line feed before returning - What readLine() should do
- Sees a carriage return, return, throw away next
character if it's a linefeed
20Example Webcat v2
- import java.net.
- import java.io.
- public class Webcat
- public static void main(String args)
- for (int i 0 i lt args.length i)
- try
- URL u new URL(argsi)
- InputStream in u.openStream()
- InputStreamReader isr new
InputStreamReader(in) - BufferedReader br new BufferedReader(isr)
- int c
- while ((c br.read()) ! -1)
- System.out.write(c)
-
- catch (IOException e) System.err.println(e
) -
-
21URL Encoding
- Alphanumeric ASCII characters (a-z, A-Z, and 0-9)
and the -_.!'(), punctuation symbols are left
unchanged. - The space character is converted into a plus sign
(). - Other characters (e.g. , , , , , , , and
so on) are translated into a percent sign()
followed by the two hexadecimal digits
corresponding to their numeric value.
22For example,
- The comma(,) is ASCII character 44 (decimal) or
2C (hex). Therefore if the comma appears as part
of a URL it is encoded as 2C. - The query string
- "AuthorSadie, JulieTitleWomen Composers"
- is encoded as
- "AuthorSadie2CJulieTitleWomenComposers"
23The URLEncoder class
- The java.net.URLEncoder class contains a single
static method which encodes strings in
x-www-form-url-encoded format - URLEncoder.encode(String s)
24For example the wrong one !
- String qs "AuthorSadie, JulieTitleWomen
Composers" - String eqs URLEncoder.encode(qs)
- System.out.println(eqs)
- This output should be
- Author3dSadie2cJulie26Title3dWomenComposers
- Is the output is what you need ?
25For example the correct one !
- String eqs "Author" URLEncoder.encode("Sadie,
Julie") - eqs ""
- eqs "Title"
- eqs URLEncoder.encode("Women Composers")
- This should print the properly encoded query
string - AuthorSadie2cJulieTitleWomenComposers
26GET URLs with Query String
- String eqs
- "Author" URLEncoder.encode("Sadie, Julie")
- eqs ""
- eqs "Title"
- eqs URLEncoder.encode("Women Composers")
- try
- URL u new URL("http//www.superbooks.com/sea
rch.cgi?" eqs) - InputStream in u.openStream()
- //...
-
- catch (IOException e)
- //...
-
27The URLDecoder class
- In Java 1.2 the java.net.URLDecoder class
contains a single static method which decodes
strings in x-www-form-url-encoded format - URLEncoder.decode(String s)
28URLConnections
- The java.net.URLConnection class is an abstract
class that handles communication with different
kinds of servers like ftp servers and web
servers. - Protocol specific subclasses of URLConnection
handle different kinds of servers. - By default, connections to HTTP URLs use the GET
method.
29URLConnections vs. URLs
- Can send output as well as read input
- Can post data to CGIs
- Can read headers from a connection
30URLConnection five steps
- 1. The URL is constructed.
- 2. The URLs openConnection() method creates the
URLConnection object. - 3. The parameters for the connection and the
request properties that the client sends to the
server are set up. - 4. The connect() method makes the connection to
the server. (optional) - 5. The response header information is read using
getHeaderField().
31I/O Across a URLConnection
- Data may be read from the connection in one of
two ways - raw by using the input stream returned by
getInputStream() - through a content handler with getContent()
- Data can be sent to the server using the output
stream provided by getOutputStream()
32Example URLConnection
- try
- URL u new URL("http//www.w3c.org/")
- URLConnection uc u.openConnection()
- uc.connect()
- InputStream in uc.getInputStream()
- // read the data...
-
- catch (IOException e)
- //...
-
33Reading Header Data
- The getHeaderField(String name) method returns
the string value of a named header field. - Names are case-insensitive.
- If the requested field is not present, null is
returned. - String lm uc.getHeaderField("Last-modified")
34getHeaderFieldKey()
- The keys of the header fields are returned by the
getHeaderFieldKey(int n) method. - The first field is 1.
- If a numbered key is not found, null is returned.
- You can use this in combination with
getHeaderField() to loop through the complete
header
35Example -- getHeaderField(key) and
getHeaderFieldKey(i)
- String key null
- for (int i1 (key uc.getHeaderFieldKey(i))!nul
l) i) -
- System.out.println(key " "
uc.getHeaderField(key))
36getHeaderFieldInt() and getHeaderFieldDate()
- Utility methods that read a named header and
convert its value into an int and a long
respectively. - public int getHeaderFieldInt(String name, int
default) - public long getHeaderFieldDate(String name, long
default)
37More about getHeaderFieldDate()
- The long returned by getHeaderFieldDate() can be
converted from long into a Date object using a
Date() constructor like this - long lm uc.getHeaderFieldDate("Last-modified",
0) - Date lastModified new Date(lm)
38Six Convenience Methods
- These return the values of six particularly
common header fields - public int getContentLength()
- public String getContentType()
- public String getContentEncoding()
- public long getExpiration()
- public long getDate()
- public long getLastModified()
39Example
- try
- URL u new URL("http//entry.hit.edu.tw/")
- URLConnection uc u.openConnection()
- uc.connect()
- String keynull
- for(int n 1
- (keyuc.getHeaderFieldKey(n)) ! null
- n)
-
- System.out.println(key " "
- uc.getHeaderField(key))
-
-
- catch (IOException e)
- System.err.println(e)
-
40Writing data to a URLConnection
- Similar to reading data from a URLConnection.
- First inform the URLConnection that you plan to
use it for output - Before getting the connection's input stream, get
the connection's output stream and write to it. - Commonly used to talk to CGIs that use the POST
method - Must Construct both header and data parts.
41A POST request includes
- the POST line
- a MIME header which must include
- content type
- content length
- a blank line that signals the end of the MIME
header - the actual data of the form, encoded in
x-www-form-urlencoded format.
42POST CGIs
- A typical POST request to a CGI looks like this
- POST /cgi-bin/booksearch.pl HTTP/1.0
- Referer http//www.macfaq.com/sampleform.html
- User-Agent Mozilla/3.01 (Macintosh I PPC)
- Content-length 60
- Content-type text/x-www-form-urlencoded
- Host utopia.poly.edu56435
- usernameSadie2CJulierealnameWomenComposers
Header
Data
43Eight Steps for Writing Data
- 1. Construct the URL.
- 2. Call the URLs openConnection() method to
create the URLConnection object. - 3. Pass true to the URLConnections setDoOutput()
method - 4. Create the data you want to send, preferably
as a byte array. - 5. Call getOutputStream() to get an output stream
object. - 6. Write the byte array calculated in step 5 onto
the stream. - 7. Close the output stream.
- 8. Call getInputStream() to get an input stream
object. Read from it as usual. (optional)
44Writing data to a URLConnection --
setDoOutput(true), setDoInput(true)
- A URLConnection for an http URL will set up the
request line and the MIME header for you as long
as you set its doOutput field to true by invoking
setDoOutput(true). - If you also want to read from the connection, you
should set doInput to true with setDoInput(true)
too.
45For example
- URLConnection uc u.openConnection()
- uc.setDoOutput(true)
- uc.setDoInput(true)
46Writing data to a URLConnection --
getOutputStream()
- The request line and MIME header are sent as
soon as the URLConnection connects. Then
getOutputStream() returns an output stream on
which you can write the x-www-form-urlencoded
name-value pairs.
47HttpURLConnection
- java.net.HttpURLConnection is an abstract
subclass of URLConnection that provides some
additional methods specific to the HTTP protocol.
- URL connection objects that are returned by an
http URL will be instances of java.net.HttpURLConn
ection.
48HttpURLConnection cont.
Server
Client
- setRequestMethod()
- Connect()
- // Response Info
- getResponseCode()
- getResponseMessage()
- // Redirect setting
- setFollowRedirects(true)
- getFollowRedirects()
- disconnect()
- getRequestMethod()
- // retrieve form data
- // send back status info
49Recall HTTP Response
- a typical HTTP response from a web server begins
like this - HTTP/1.0 200 OK
- Server Netscape-Enterprise/2.01
- Date Sat, 02 Aug 1997 075246 GMT
- Accept-ranges bytes
- Last-modified Tue, 29 Jul 1997 150646 GMT
- Content-length 2810
- Content-type text/html
50Get Response Codes and Messages
- The getHeaderField() and getHeaderFieldKey()
don't return the HTTP response code - After you've connected, you can retrieve the
numeric response code--200 in the above
example--with the getResponseCode() method - and the message associated with it--OK in the
above example--with the getResponseMessage()
method.
51HTTP Protocols
- Java 1.0 only supports GET and POST requests to
HTTP servers - Java 1.1/1.2 supports GET, POST, HEAD, OPTIONS,
PUT, DELETE, and TRACE. - The protocol is chosen with the
setRequestMethod(String method) method. - A java.net.ProtocolException, a subclass of
IOException, is thrown if an unknown protocol is
specified.
52getRequestMethod()
- The getRequestMethod() method returns the string
form of the request method currently set for the
URLConnection. GET is the default method.
53disconnect()
- The disconnect() method of the HttpURLConnection
class closes the connection to the web server. - Needed for HTTP/1.1 Keep-alive
54For example,
- try
- URL u new URL( "http//www.metalab.unc.edu/jav
afaq/books.html" ) - HttpURLConnection huc (HttpURLConnection)
u.openConnection() - huc.setRequestMethod("PUT")
- huc.connect()
- OutputStream os huc.getOutputStream()
- int code huc.getResponseCode()
- if (code gt 200 lt 300)
- // put the data...
-
- huc.disconnect()
-
- catch (IOException e)
- //...
55Using Proxy ??
- The boolean usingProxy() method returns true if
web connections are being funneled through a
proxy server, false if they're not.
56Redirect Instructions
- Most web servers can be configured to
automatically redirect browsers to the new
location of a page that's moved. - To redirect browsers, a server sends a 300 level
response code and a Location header that
specifies the new location of the requested page.
- HTML is returned for browsers that don't
understand redirects, but most modern browsers
jump straight to the page specified in the
Location header instead. - Because redirects can change the site which a
user is connecting without their knowledge so
redirects are not arbitrarily followed by
URLConnections. (only in HttpURLConnection
Objects )
57- GET /elharo/macfaq/index.html HTTP/1.0
- HTTP/1.1 302 Moved Temporarily
- Date Mon, 04 Aug 1997 142127 GMT
- Server Apache/1.2b7
- Location http//www.macfaq.com/macfaq/index.html
- Connection close
- Content-type text/html
- ltHTMLgtltHEADgt
- ltTITLEgt302 Moved Temporarilylt/TITLEgt
- lt/HEADgtltBODYgt
- ltH1gtMoved Temporarilylt/H1gt
- The document has moved ltA HREF"http//www.macfaq.
com/macfaq/index.html"gtherelt/Agt.ltPgt - lt/BODYgtlt/HTMLgt
58Following Redirects
- HttpURLConnection.setFollowRedirects(true) method
says that connections will follow redirect
instructions from the web server. - Un-trusted applets are not allowed to set this.
- HttpURLConnection.getFollowRedirects()
- returns true if redirect requests are honored,
false if they're not.