In his first article for Builder AU, Java expert Michael Geisler shows one reader how HTML files can be converted to XML using JTidy.

One of the great things about Java is the extensive standard libraries available as part of the standard platform and there is certainly great support for XML in those libraries. However for your particular need there is no direct support in the standard libraries.
So really you have two options:

  1. Go and build something from scratch yourself. This is generally painful and time consuming.
  2. Check the 'community' and see if someone else has already encountered the problem (highly likely) and been kind enough to share it.

In this case there is a useful little project on SourceForge call JTidy. The JTidy web site can be found at http://sourceforge.net/projects/jtidy/

JTidy provides HTML syntax checking and "pretty printing" of HTML, but for our purposes here it also allows you to take a HTML file as input and convert it into XML. JTidy reads through the input file and if it finds any mismatched or missing end tags it corrects them and outputs a well-formed XML document.

As you can see from the sample code below, it is quite straightforward to use. Simply set the JTidy instance to output XML, supply an input URL, output file and error file, start up the conversion and you are pretty much done.

import java.net.URL;
import java.io.*;
import org.w3c.tidy.Tidy;

public class TestHTML2XML {
private String url;
private String outFileName;
private String errOutFileName;

public TestHTML2XML(String url, String outFileName, String
errOutFileName) {
this.url = url;
this.outFileName = outFileName;
this.errOutFileName = errOutFileName;
}

public void convert() {
URL u;
BufferedInputStream in;
FileOutputStream out;

Tidy tidy = new Tidy();

//Tell Tidy to convert HTML to XML
tidy.setXmlOut(true);

try {
//Set file for error messages
tidy.setErrout(new PrintWriter(new FileWriter(errOutFileName), true));
u = new URL(url);

//Create input and output streams
in = new BufferedInputStream(u.openStream());
out = new FileOutputStream(outFileName);

//Convert files
tidy.parse(in, out);

//Clean up
in.close();
out.close();

} catch (IOException e) {
System.out.println(this.toString() + e.toString());
}
}

public static void main(String[] args) {
/*
* Parameters are:
* URL of HTML file
* Filename of output file
* Filename of error file
*/
TestHTML2XML t = new TestHTML2XML(args[0], args[1], args[2]);
t.convert();
}
}

Do you have a Java related question for Michael that you want answered? Forward your questions to builder@zdnet.com.au

Michael Geisler is a senior systems engineer with Sun Microsystems and has more than 14 years of experience in the IT and telecommunications industry. He has been working with Java since the first public beta and is currently the vice-president of the Australian Java Users Group (AJUG).

Do you need help with Java, C, or C++? Gain advice from Builder AU forums

Related links

Comments

1

K.ganesan - 13/02/07

thanks! your code used to understand xml in java.

» Report offensive content

2

Rashmi - 14/04/08

Hi

I am having problems converting a web page to XML using Jtidy in Windows XP I have downloded the latest version of Jtidy but I am getting errors when the above code is run as it is not recognising the code
Tidy tidy = new Tidy();

I am newly using Jtidy can anybody tell me the detailed steps as to how do I run a web page and convert it into corresponding XML ...
Any help would be greatly appreciated....

» Report offensive content

3

woutboeing - 17/04/08

hello,

at school im learning java with conTEXT, a java text editing soft. we save the files as java, javac or txt, but now my question is, if i have a working program, how do i make it work for ppl that dont use context, so i can run it from any machine, (like some sort of EXE file, or html)

could you tell me what format to use, and how to do that?
(pls mail me)

import javax.swing.*;
import java.io.*; //import dialog boxes

public class session2opgave2 { //intro
public static void main(String [] args) throws IOException {
//nomination and definition of used variables.
double time; //time
double distance; //distance
double cost; //cost
String str; //string definition


BufferedReader in1 = new BufferedReader( //creation bufferreaderobject
new InputStreamReader(System.in));
//time dialog
str = JOptionPane.showInputDialog( null,
"Enter the starting time (e.g. 1650 for 16:40 (4:40pm)): " ); //readline
time = Double.parseDouble(str); //assigning time

//distance dialog
str = JOptionPane.showInputDialog( null,
"Enter the travel distance (km): " ); //readline
distance = Double.parseDouble(str); //assigning distance

if ( distance <= 50 ) { //00-50
if ( time >= 700 && time <= 900 || time >= 1600 && time <= 1800 ){
if ( time >= 700 && time <= 900 ){
cost = 0.25 * distance;
JOptionPane.showMessageDialog( null,
"the total ticket price will be " + cost + " euro's.");}

if ( time >= 1600 && time <= 1800 ){
cost = 0.2 * distance;
System.out.println("the total ticket price will be " + cost + "Euro's");}}

else{
cost = 0.15 * distance;
JOptionPane.showMessageDialog( null,
"the total ticket price will be " + cost + " euro's.");}
}

if ( distance <= 100 && distance > 50 ) { //50-100
if ( time >= 700 && time <= 900 || time >= 1600 && time <= 1800 ){
if ( time >= 700 && time <= 900 ){
cost = 0.22 * 50 + 0.18 * distance;
JOptionPane.showMessageDialog( null,
"the total ticket price will be " + cost + " euro's.");}

if ( time >= 1600 && time <= 1800 ){
cost = 0.18 * 50 + 0.15 * distance;
System.out.println("the total price will be " + cost + "Euro's");}}

else{
cost = 0.12 * 50 + 0.1 * distance;
JOptionPane.showMessageDialog( null,
"the total ticket price will be " + cost + " euro's.");}
}

if ( distance > 100 ) { //100+
if ( time >= 700 && time <= 900 || time >= 1600 && time <= 1800 ){
if ( time >= 700 && time <= 900 ){
cost = 0.2 * 50 + 0.12 * distance;
JOptionPane.showMessageDialog( null,
"the total ticket price will be " + cost + " euro's.");}

if ( time >= 1600 && time <= 1800 ){
cost = 0.17 * 50 + 0.1 * distance;
System.out.println("the total price will be " + cost + "Euro's");}}

else{
cost = 0.1 * 50 + 0.08 * distance;
JOptionPane.showMessageDialog( null,
"the total ticket price will be " + cost + " euro's.");}
}
} //outro
}

» Report offensive content

4

karthi - 10/08/08

5

Nike - 17/11/08

I have problem in printing an html file ( browser look ) thru a printer . .

I tried coding with respect to document rendering n all but nothing worked
plz help me out and provide me wid a code which will print an html file wid image directly thru a printer . . . .

» Report offensive content

6

sengalvarayan - 27/09/09

i have the doubt i write html file and how i call the java file from the html file

» Report offensive content

7

Hassan - 18/02/10

Sometimes your source HTML page is so messed up that you can't convert it to XML directly. What i did worked for me and maybe will work for others. Look at the code

Tidy tidy = new Tidy();
tidy.setXHTML(true);
tidy.parse(new FileReader("file.html"), new FileWriter("file_xhtml.html"));
/*Now use the generated xhtml file as source and output xml from it*/
tidy.setXmlOut(true);
tidy.parse(new FileReader("file_xhtml.html"), new FileWriter("file_xml.xml"));

» Report offensive content

8

vinay - 22/02/10

i got all code whih over there written bt i couldn't run ths code properly pls tell me how to run this code with step by step.. thanks

» Report offensive content

9

abhijeet - 24/02/11

I want to save my html data into XML file..
eg. if i write text in textbox then it should be added to respective XML tag
pls help

» Report offensive content

10

abhijeet - 24/02/11

I want to save my html data into XML file..
eg. if i write text in textbox then it should be added to respective XML tag.I want to do this using JS
pls help

» Report offensive content

10

abhijeet - 24/02/11

I want to save my html data into XML file.. eg. if i write text in textbox then it should be added ... more

9

abhijeet - 24/02/11

I want to save my html data into XML file.. eg. if i write text in textbox then it should be added ... more

8

vinay - 22/02/10

i got all code whih over there written bt i couldn't run ths code properly pls tell me how to run ... more

Log in


Sign up | Forgot your password?

What's on?

  • Optus Deal

    Broadband + home phone + PlayStation®3 in a single package price!