Class SimpleXMLParser


  • public abstract class SimpleXMLParser
    extends Object
    Very efficient parser for large XML documents, based on SAX. This class may be used to read huge files because it parses the XML document line by line using only very few memory.

    Compared to SAX, the following features have been added:

    • Provides hierarchical name for sub-elements.
    • Combines all fragments of characters to one trimmed string.
    • Does not overwrite attributes of parent-elements when processing sub-elements.

    You have to override the start(XMLElement) and/or end(XMLElement) method to process the data. In most cases, only the end(XMLElement) will be used.

    The following example dumps the data which was read from file:

     SimpleXMLParser parser = new SimpleXMLParser()
     {
         protected void end(XMLElement element) throws Exception
         {
             System.out.println(element.toString());
         }
     };
     parser.parse(new FileInputStream("/home/stefan/test.xml"), "test", false);
     
    Author:
    Stefan Frings, http://stefanfrings.de/javautils
    • Constructor Detail

      • SimpleXMLParser

        public SimpleXMLParser()
    • Method Detail

      • parse

        public void parse​(InputStream byteStream,
                          String name,
                          boolean validate)
                   throws XMLParseException
        Parses an XML document. Every time the start of an XML element has been read, the method start(XMLElement) will be called. Every time the end of an XML element has been read, the method @link #end(XMLElement)} will be called. Data between start and end tags are only available to the latter method.
        Parameters:
        byteStream - Source of the XML document
        name - A symbolic name for the source, used in log messages
        validate - If true, then DTD and namespace is validated
        Throws:
        XMLParseException - If the XML is invalid.
      • start

        protected void start​(XMLElement element)
                      throws Exception
        This method is called whenever the start of a new element is reached. The current method does nothing.

        To access the collected characters of an element, you have to override the end() function. They are not avaialble at this stage.

        Parameters:
        element - The current XML element
        Throws:
        Exception - In case of any exception
      • end

        protected void end​(XMLElement element)
                    throws Exception
        This method is called whenever the end of an element is reached. The current method does nothing.

        The collected characters of the element are available under the attribute name "__characters".

        Parameters:
        element - The current XML element
        Throws:
        Exception - In case of any exception