
Parsing XML from the internet is a pretty common requirement on any platform. There are numerous ways to accomplish this task on every language out there, and the Android platform is no exception. On the Android, this is accomplished using SAXParser, a serial access parser API for XML. SAXParser functions as a stream parser, with an event-driven API, using callback methods that trigger everytime events occur during the reading.
The majority of the work is done by a SAX-Handler. The SAXParser will walk through the XML file from beginning to end (hence parsing is always unidirectional) and calls appropriate handler methods along the way. For this exercise, we will create a Handler that extends org.xml.sax.helpers.DefaultHandler and overrides the necessary methods.
On the start/end of each document, the following methods get called:
public void startDocument() throws SAXException {}
public void endDocument() throws SAXException {}
When the Parser reaches an opening tag, like <exampletag name=“labs”>, the following method gets called:
public void startElement(String namespaceURI, String localName, String qName,
Attributes atts) throws SAXException {}
In this case, localName will be "exampletag". The atts variable will hold any associated attribute information: atts.getValue("name") will return "labs".
When we reach a closing tag, like </exampletag>, the equivalent closing method gets called:
public void endElement(String namespaceURI, String localName, String qName)
throws SAXException {}
In the same manner, localName will be "exampletag".
In between an opening and closing tag, there can be a string, like <exampletag>here is some content</exampletag>. The SAXParser reads in the string, one character at a time, but buffers method calls to the handler:
public void characters(char ch[], int start, int length) {}
The ch[] array holds a buffer of characters that the SAXParser has read in, but the only relevant chunk is given by the start and length values. With large enough strings, the characters() method may be called multiple times within a single block of character data. This is a place where I personally stumbled with, as it seems many tutorials out there seem to ignore this fact, assuming the entire block is returned and end up only getting partial data.
Now that I've explained the basics of how this all works, here is some very basic example code that parses an XML doc for content of "qwerasdf" elements:
Creating a SAXParser and give it a handler:
/* Create a URL we want to load some xml-data from. */
URL url = new URL("http://example.com/example.xml");
/* Get a SAXParser from the SAXPArserFactory. */
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
/* Get the XMLReader of the SAXParser we created. */
XMLReader xr = sp.getXMLReader();
/* Create a new ContentHandler and apply it to the XML-Reader*/
ExampleHandler myExampleHandler = new ExampleHandler();
xr.setContentHandler(myExampleHandler);
/* Parse the xml-data from our URL. */
xr.parse(new InputSource(url.openStream()));
/* Parsing has finished. */
Definition of the ExampleHandler:
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class ExampleHandler extends DefaultHandler {
StringBuffer buff = null;
boolean buffering = false;
@Override
public void startDocument() throws SAXException {
// Some sort of setting up work
}
@Override
public void endDocument() throws SAXException {
// Some sort of finishing up work
}
@Override
public void startElement(String namespaceURI, String localName, String qName,
Attributes atts) throws SAXException {
if (localName.equals("qwerasdf")) {
buff = new StringBuffer("");
buffering = true;
}
}
@Override
public void characters(char ch[], int start, int length) {
if(buffering) {
buff.append(ch, start, length)
}
}
@Override
public void endElement(String namespaceURI, String localName, String qName)
throws SAXException {
if (localName.equals("blah")) {
buffering = false;
String content = buff.toString();
// Do something with the full text content that we've just parsed
}
}
}
* Please be aware that all comments are moderated.