Tips for Downloading HTML and Webpages

 

Let’s see how to write a simple Java program for downloading documents and web pages we are interested in through connecting to a Web server, using URLConnection classes and Java’s URL in the java.net package when downloading content and data. Basic familiarity with Internet, Object oriented programming, Java, and a Java SDK to compile and run the program are enough to carry out the task.

The address that allows any internet page to be identified uniquely on the World Wide Web is called URL (uniform resource locator). This is an example of a URL linking to the homepage of javawebster.net:

http://www.javawebster.net:80/index.php,

where “http” is the protocol identifier, “www.javawebster.net” is the hostname, “80” is the port’s number (which is optional whether to be specified or not; if not specified the system uses a default port), and “index.php” is the pathname or filename leading to the file on the machine.

Class URL is encapsulated with the concept of a URL by Java, containing the former in the java.net.* package. This package is used by Java programs to represent the URL address. To use a URL string a java.net.URL object instance is used. It has a pattern like this:
URL string: protocol://host:port/filepath#ref

It’s noteworthy that while dealing with the details of the URL, the Class URL isn’t opening a connection to it. When creating an object of type URL, network communications are not being initiated, with only the string argument being parsed in the URL constructor.
The next step after getting the URL is that of opening the connection with the URLConnection Class. Being abstract, it cannot be instantiated directly, but through invoking the openConnection() method on a URL object. The latter will return an object of the URLConnection class’ subclass. Here is a sample code demonstrating how it works:

// After creating the URL object, open the connection
try {
URLConnection connection = javacodingURL.openConnection();
BufferedReader br = new BufferedReader ( new
InputStreamReader(connection.getInputStream()));
String line = “”;
while ((line = br.readLine()) != null)
System.out.println(line);
br.close();
}catch(UnknownHostException e){
System.out.println(”Unknown Host”);
return;
}catch(IOException e){
System.out.println(”Error in opening URLConnection”);
return;
}
Once the connection opens, reading and writing becomes available using OutputStream/ InputStream.