bild
Skolan för
elektroteknik
och datavetenskap

Assignment 6 - A simple web reader

Goals

  • To become acquainted with a few additional advanced swing components:
    JEditorPane, JScrollPane, JTable and HTML.
  • To be able to read and understand documentation about the advanced components.
  • To be able to extract information from a HTML document.

Assignment

Write a simple webreader that only can interpret html code. Besides showing a html web page, the program must make a table from all the links of the web page. The running webreader program could look like this.

Labb 6

Use the following three swing components: JEditorPane, JScrollPane and JTable. Another important component to use is HTML which stores a html document.

It is very likely that you will need to import the following classes.

   import java.io.*;
   import java.awt.*;
   import java.awt.event.*;
   import java.net.*;
   import javax.swing.*;
   import javax.swing.event.*;
   import javax.swing.text.html.*;
   import javax.swing.text.*;

The graphical interface

It is often a good idea to write a prototype of your program as a first step towards the final version. The prototype looks like the real program but has little of its functionality. Here, let the prototype show a JFrame with three parts: In its NORTH field, a text field, in its CENTER field a JEditorPane and to the EAST, put a JTable with two columns and fifty rows. Put the table in a scroll pane, fifty rows is too much for one screen.
JTable table = new JTable(50,2);
JScrollPane links = new JScrollPane(table);
It is even more important to put the web reader (center component) in a scroll pane. The web page shown will often be larger than the available space.

Run the program and try to write text in all three components. The user should be able to write the desired web address in the textfield. When ENTER is pressed, the chosen web page should be displayed in the JEditorPane. Therefore a listener object must be attached to the text field.

The web reader

Write a class Webreader that inherits from JEditorPane. Make sure that you cannot write in this window (method setEditable). Write a method in the class, e.g. showPage(webaddr). Read the documentation about JEditorPane to learn how to make it display a web page. If the chosen web address is not valid, give an error message with a JOptionPane. Find out how to use it from the API.

Now use Webreader in your main class instead of JEditorPane.

Before proceeding, run the program and make it load a few different web pages. The page displayed in the above image works reasonably well. The followin pages belong to staff members of CSC and are also simple enough to work in our web reader:

http://www.nada.kth.se/~orjan
http://www.nada.kth.se/~ala
http://www.nada.kth.se/~viggo
http://www.nada.kth.se/~vahid
http://www.nada.kth.se/~johanh
http://www.nada.kth.se/~ann

The link table – part of the basic assignment

The task here is to extract information for the table to the right in the above image. Write a new program (a class with only a main method is sufficient) that prints all links from a web page in the terminal window with System.out.println(...). When this program (main method) works, change it into a method that returns the information as a string matrix (String[][]). More instructions follow!

Extract information from a web page

The contens of a web page may be printed character by character like this.
   String webpage="http://www.nada.kth.se/~henrik";
   InputStream in=new URL(webpage).openConnection().getInputStream();   
   InputStreamReader reader= new InputStreamReader(in);
   while(reader.ready()) 
      System.out.print((char)reader.read());
try-catch must be inserted since the internet connection (second line above) may fail. Instead of reading the instream and print it in the terminal window, create an empty HTMLDocument (doc) and let a HTMLEditorKit read the page into the document with
   doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
   new HTMLEditorKit().read(reader,doc,0);
If you omit the first statement, a ChangedCharSetException will be thrown for all swedish pages properly stating the character encoding to be iso-8859-1.

A HTML page contains different kinds of tags, the tag used for links is the A-tag. The A-tag has an attribute HREF and the value of this attribute is the web address of the link. E.g. the ninth row in the above link table may correspond to the follow text in the HTML document:
<A HREF = "http://www.nada.kth.se/~gerd">hustru</A>

Iterate throug the A-tags of the document. For each A-tag, read its HREF-attribute and print its value. Use an Tag.A-iterator-object and a HREF-attribute-object. Print the HREF-attribute for each A-tag. When you have managed to correctly print the HREF-attributes, also print the text between the A-tag and its corresponding end tag (</A>). Use it.getStartOffset() and it.getEndOffset(). Read the documentation for HTMLDocument and find out how to extract text from the document. In the last excercise class ("övning 6"), a similar example will be shown. The code will be posted on the web page for "Övningar".

Put addresses and texts in a matrix

When you can successfully produce the links and texts, change the program into a method that returns a string matrix (String[][]) with all the information. The matrix should have a limited size, e.g. 50 x 2. If a web page has more than 50 links, only extract the first 50. Test the method by calling it from a small main method before using it in your main program.

Use the matrix in the main program

In the main program is a JTable with 50 x 2 positions, but empty contents. The simplest way to change the contents of a JTable is to replace its model. The model may be created from two String-arrays, one array for the table contents and the other for its heading. Here is an example:
table.setModel(new DefaultTableModel(..., header));
Instead of the dots, put a link matrix (or a call ro the method that creates the matrix). Read the JTable documentation if you need additional information!

Demonstration

  • Demonstrate that your program can display webpages and their corresponding link tables by typing an adress in the textfield and press ENTER.
  • An erroneous URL must give an error message in a small "pop-up-window".
  • If a web page has more than 50 links, extract only the first 50. Exceptions such as ArrayIndexOutOfBounds are not accepted.
  • Show a UML class diagram with class names and the methods you have defined yourself (inkluding any overridden methods).

 

Extra assignment for a higher mark, clickable links

The web reader so far does not react when you click its links as a "proper" web reader should. The task here is to make the new web page appear in the window when the user clicks a link.

First, make sure it is not possible to write in the window displaying the web page. Call setEditable(false) (perhaps you did this already). Otherwise, the click is interpreted as a starting point for editing. Also, add a HyperlinkListener. The call getURL().toString() on a HyperlinkEvent will return a complete web address to print into the address field of the web reader. Read the documentationen about Hyperlink... and find out the rest!

Requirements

  • The web reader must respond to clicking the links, not just that the cursor enters the link.

  • The chosen link must be displayed in the address field of the web reader and the new page shown.

The method postActionEvent() may be handy for you. Its function is to bring about an event (ActionEvent) by a method call in the program instead of in the usual way (user interaction in GUI).


Copyright © Sidansvarig: Ann Bengtsson <ann@nada.kth.se>
Uppdaterad 2014-05-11