Class SimpleXMLParser
The main benefit on top of SAX is that SimpleXMLParser provides the hierarchy of parent elements and that it concatenates the text fragments between start and end tag to a single string.
You have to implement either the start(XMLElement)
or end(XMLElement)
method to process the data. Example:
SimpleXMLParser parser = new SimpleXMLParser() { protected void end(XMLElement element) { System.out.print("start() "); System.out.println(element.toString()); } protected void end(XMLElement element) { System.out.print("end() "); System.out.println(element.toString()); } }; parser.parse(new FileInputStream("test.xml"), "test", false);
Example input:
<shops> <shop type="fast food" favorite="false"> <name language="en">Mc Donald</name> <description language="en"> Well known for burgers and salads </description> </shop> <shop type="books" favorite="true"> <name language="en">Readers Place</name> <description language="en"> They really know what they sell. Ask the employees for recommendations. </description> </shop> </shops>For this document, the start() and end() methods would be called 7 times with the following names and attributes in {}:
start() shops {} start() shops/shop {type=fast food, favorite=false} start() shops/shop/name {language=en} end() shops/shop/name {language=en, __characters=Mc Donald} start() shops/shop/description {language=en} end() shops/shop/description {language=en, __characters=Well known for burgers\nand salads} end() shops/shop {type=fast food, favorite=false} start() shops/shop {type=books, favorite=false} start() shops/shop/name {language=en} end() shops/shop/name {language=en, __characters=Readers Place} start() shops/shop/description {language=en} end() shops/shop/description {language=en, __characters=They really know what they sell.\nAsk the employees for recommendations.} end() shops/shop {type=books, favorite=false} end() shops {}The collected text characters between start and end tags are returned like the other XML attributes but with the special name "__characters". While parsing the XML, the following parts get removed from these characters: Heading and trailing whitespaces, heading and trailing line-feeds, duplicate whitespaces, duplicate lineFeeds, indentation and all other control characters.
XML namespaces and DTD are supported as well but have no effect on the output. An element "s:shop" would be called "shop" in the output.
- Author:
- Stefan Frings, http://stefanfrings.de/javautils
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected void
end
(XMLElement element) This method is called whenever the end of an element is reached.void
parse
(InputStream byteStream, String name, boolean validate) Parses an XML document.protected void
start
(XMLElement element) This method is called whenever the start of a new element is reached.
-
Constructor Details
-
SimpleXMLParser
public SimpleXMLParser()
-
-
Method Details
-
parse
Parses an XML document. Every time the start of an XML element has been read, the methodstart(XMLElement)
will be called. Every time the end of an XML element has been read, the methodend(XMLElement)
will be called. Data between start and end tags are only available to the latter method.- Parameters:
byteStream
- Source of the XML documentname
- A symbolic name for the source, used in log messagesvalidate
- Whether to validate DTD schema and XML namespaces (costs time!)- Throws:
XMLParseException
- If the XML is invalid.
-
start
This method is called whenever the start of a new element is reached. The default implementation does nothing.All attributes of the XML element are available via element.getAttribute(name).
To access the text content of an XML element, you have to override the end() function instead. The characters are not available at this stage.
- Parameters:
element
- The current XML element- Throws:
Exception
- In case of any exception
-
end
This method is called whenever the end of an element is reached. The default implementation does nothing.All attributes of the XML element are available via element.getAttribute(name).
The text content of the XML element is made available via element.getAttribute("__characters").
- Parameters:
element
- The current XML element- Throws:
Exception
- In case of any exception
-