de.stefanfrings.utils.SimpleXMLParser

public abstract class SimpleXMLParser extends Object

Very efficient parser for large XML documents, based on SAX. This class may be used to read huge files because it parses the XML document line by line using only very few memory.

The main benefit on top of SAX is that SimpleXMLParser provides the hierarchy of parent elements and that it concatenates the text fragments between start and end tag to a single string.

You have to implement either the start(XMLElement) or end(XMLElement) method to process the data. Example:

 SimpleXMLParser parser = new SimpleXMLParser()
 {
     protected void end(XMLElement element)
     {
         System.out.print("start()  ");
         System.out.println(element.toString());
     }

     protected void end(XMLElement element)
     {
         System.out.print("end()    ");
         System.out.println(element.toString());
     }
 };
 parser.parse(new FileInputStream("test.xml"), "test", false);

Example input:

 <shops>
     <shop type="fast food" favorite="false">
         <name language="en">Mc Donald</name>
         <description language="en">
             Well known for burgers
             and salads
         </description>
     </shop>
     <shop type="books" favorite="true">
         <name language="en">Readers Place</name>
         <description language="en">
             They really know what they sell.
             Ask the employees for recommendations.
         </description>
     </shop>
 </shops>

For this document, the start() and end() methods would be called 7 times with the following names and attributes in {}:

 start()  shops {}
 start()  shops/shop {type=fast food, favorite=false}
 start()  shops/shop/name {language=en}
 end()    shops/shop/name {language=en, __characters=Mc Donald}
 start()  shops/shop/description {language=en}
 end()    shops/shop/description {language=en, __characters=Well known for burgers\nand salads}
 end()    shops/shop {type=fast food, favorite=false}
 start()  shops/shop {type=books, favorite=false}
 start()  shops/shop/name {language=en}
 end()    shops/shop/name {language=en, __characters=Readers Place}
 start()  shops/shop/description {language=en}
 end()    shops/shop/description {language=en, __characters=They really know what they sell.\nAsk the employees for recommendations.}
 end()    shops/shop {type=books, favorite=false}
 end()    shops {}

The collected text characters between start and end tags are returned like the other XML attributes but with the special name "__characters". While parsing the XML, the following parts get removed from these characters: Heading and trailing whitespaces, heading and trailing line-feeds, duplicate whitespaces, duplicate lineFeeds, indentation and all other control characters.

XML namespaces and DTD are supported as well but have no effect on the output. An element "s:shop" would be called "shop" in the output.

Author:: Stefan Frings, http://stefanfrings.de/javautils

Constructor Summary

Constructors

Constructor

Description

SimpleXMLParser()
Method Summary

Modifier and Type

Method

Description

protected void

end(XMLElement element)

This method is called whenever the end of an element is reached.

void

parse(InputStream byteStream, String name, boolean validate)

Parses an XML document.

protected void

start(XMLElement element)

This method is called whenever the start of a new element is reached.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- SimpleXMLParser
  
  public SimpleXMLParser()
Method Details
- parse
  
  public void parse(InputStream byteStream, String name, boolean validate) throws XMLParseException
  
  Parses an XML document. Every time the start of an XML element has been read, the method start(XMLElement) will be called. Every time the end of an XML element has been read, the method end(XMLElement) will be called. Data between start and end tags are only available to the latter method.
  
  Parameters:
  
  byteStream - Source of the XML document
  
  name - A symbolic name for the source, used in log messages
  
  validate - Whether to validate DTD schema and XML namespaces (costs time!)
  
  Throws:
  
  XMLParseException - If the XML is invalid.
- start
  
  protected void start(XMLElement element) throws Exception
  
  This method is called whenever the start of a new element is reached. The default implementation does nothing.
  All attributes of the XML element are available via element.getAttribute(name).
  To access the text content of an XML element, you have to override the end() function instead. The characters are not available at this stage.
  
  Parameters:
  
  element - The current XML element
  
  Throws:
  
  Exception - In case of any exception
- end
  
  protected void end(XMLElement element) throws Exception
  
  This method is called whenever the end of an element is reached. The default implementation does nothing.
  All attributes of the XML element are available via element.getAttribute(name).
  The text content of the XML element is made available via element.getAttribute("__characters").
  
  Parameters:
  
  element - The current XML element
  
  Throws:
  
  Exception - In case of any exception

Class SimpleXMLParser

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

SimpleXMLParser

Method Details

parse

start

end