Approaches to XML – Part 2 – What about SAX?

Part 1 introduced the idea that there are different ways to approach XML parsing and highlighted the point that XML is NOT A STRING; rather it’s an object oriented document model that can be represented using a string. Today’s blog continues this discussion using my Pete’s Perfect Pizza scenario. If you remember, Pete’s just popped his head around the door and asked you to enhance the system so that the front desk can send orders for multiple pizzas in a single XML message. You know that your simple string parsing code is flawed and won’t cut the mustard, so you Google 1 more on XML and come up with the idea of using SAX.

SAX, or Simple API for XML has been around for many years and, as far as I can recollect, was originally a development lead by David Megginson before the turn of the millennium. In those days, you had to download the Java version of SAX from David’s personal web site. This developed into the SAX Project before finally being added to Java Standard Edition 1.4.

SAX is a streaming interface for XML, which means that applications using SAX receive event notifications about the XML document being processed an element, and attribute, at a time in sequential order starting at the top of the document, and ending with the closing of the ROOT element. This means that it’s extremely efficient at processing XML in linear time without placing too many demands upon system memory.

Back to Pete’s, you work hard and come up with the following SAX parser based class:

public class PizzaParser {



  public List<PizzaOrder> order(InputStream xml) {



    PizzaContentHandler handler = new PizzaContentHandler();



    // do the parsing

    try {

      // Construct the parser by bolting together an XMLReader

      // and the ContentHandler

      XMLReader parser = XMLReaderFactory.createXMLReader();

      parser.setContentHandler(handler);



      // create an input source from the XML input stream

      InputSource source = new InputSource(xml);

      // Do the actual work

      parser.parse(source);



      return handler.getPizzaOrder();

    } catch (Exception ex) {

      throw new RuntimeException('Exception parsing xml message. Message: ' + ex.getMessage(), ex);

    }

  }



  static class PizzaOrder {



    private final String pizzaName;

    private final String base;

    private final String quantity;



    PizzaOrder(String pizzaName, String base, String quantity) {

      this.pizzaName = pizzaName;

      this.base = base;

      this.quantity = quantity;

    }



    public String getPizzaName() {

      return pizzaName;

    }



    public String getBase() {

      return base;

    }



    public String getQuantity() {

      return quantity;

    }

  }



  /**

   * Use this class the handle the SAX events

   */

  class PizzaContentHandler extends DefaultHandler {



    private String[] pizzaInfo;

    private int index;

    private List<PizzaOrder> outList;

    private boolean capture;



    /**

     * Set things up at the start of the document.

     */

    @Override

    public void startDocument() {

      outList = new ArrayList<PizzaOrder>();

    }



    /**

     * Handle the startElement event

     */

    @Override

    public void startElement(String uri, String localName, String qName, Attributes attributes) {



      capture = true;

      if ('pizzas'.equals(qName)) {

        capture = false;

      } else if ('pizza'.equals(qName)) {

        pizzaInfo = new String[3];

        capture = false;

      } else if ('name'.equals(qName)) {

        index = 0;

      } else if ('base'.equals(qName)) {

        index = 1;

      } else if ('quantity'.equals(qName)) {

        index = 2;

      }

    }



    /**

     * Handle the endElement event

     */

    @Override

    public void endElement(String uri, String localName, String qName) {



      if ('pizza'.equals(qName)) {

        outList.add(new PizzaOrder(pizzaInfo[0], pizzaInfo[1], pizzaInfo[2]));

      }

    }



    /**

     * Grab hold of incoming character data

     */

    @Override

    public void characters(char[] ch, int start, int length) {



      if (capture) {

        pizzaInfo[index] = new String(ch, start, length);

        capture = false;

      }

    }



    List<PizzaOrder> getPizzaOrder() {

      return outList;

    }

  }



}

This blog isn’t here to demonstrate how to use SAX, there are lots of the examples available if you look around, but lets take a critical look at the code and the first thing to notice is that the order(...) method now takes an input stream rather than a string as befitting a stream based API:

  public List<PizzaOrder> order(InputStream xml) 

The next thing to note is that the PizzaParser uses a nested class, PizzaContentHandler that extends the SAX helper class DefaultHandler. The PizzaContentHandler class captures a list of PizzaOrder beans and passes them back to the enclosing class for return to the caller. This means that all you need to do to get hold of the SAX events is to override handler methods such as startElement(...), endElement(...) etc.
If you take a closer look a the code, you’ll realise that it’s pretty complex. All it has to do is to create an output list, yet there are multiple if() statements, temporary arrays and boolean switches that are used to grab hold of the right bit of information from the right point in the document. This is downside to SAX: it’s complexity places more of burden on the programmer and makes your code more error prone.

It is, however, more resilient than the previous string based attempt as the unit tests below demonstrate:

public class PizzaParserTest {



  private static final String ORDER_XML = //

  '<?xml version=\'1.0\' encoding=\'UTF-8\'?>\n' + //

      '<pizza>\n' + // 8

      '    <name>Capricciosa</name>\n' + //

      '    <base>thin</base>\n' + //

      '    <quantity>2</quantity>\n' + //

      '</pizza>\n';



  private static final String ORDER_XML_2 = //

  '<?xml version=\'1.0\' encoding=\'UTF-8\'?><pizza><name>Capricciosa</name><base>thin</base><quantity>2</quantity></pizza>';



  private static final String ORDER_XML_3 = //

  '<?xml version=\'1.0\' encoding=\'UTF-8\'?>\n' + //

      '<pizzas>\n' + //

      '    <pizza>\n' + //

      '        <name>Capricciosa</name>\n' + //

      '        <base>thin</base>\n' + //

      '        <quantity>2</quantity>\n' + //

      '    </pizza>\n' + //

      '    <pizza>\n' + //

      '        <name>Margherita</name>\n' + //

      '        <base>thin</base>\n' + //

      '        <quantity>1</quantity>\n' + //

      '    </pizza>\n' + //

      '</pizzas>';



  private PizzaParser instance;



  @Before

  public void setUp() {

    instance = new PizzaParser();

  }



  @Test

  public void readOrderFromXML() {



    List<PizzaOrder> results = instance.order(new ByteArrayInputStream(ORDER_XML.getBytes()));



    assertEquals(1, results.size());



    PizzaOrder result = results.get(0);

    assertEquals('Capricciosa', result.getPizzaName());

    assertEquals('thin', result.getBase());

    assertEquals('2', result.getQuantity());

  }



  @Test

  public void readOrderFromModifiedXML() {



    List<PizzaOrder> results = instance.order(new ByteArrayInputStream(ORDER_XML_2.getBytes()));



    assertEquals(1, results.size());



    PizzaOrder result = results.get(0);

    assertEquals('Capricciosa', result.getPizzaName());

    assertEquals('thin', result.getBase());

    assertEquals('2', result.getQuantity());

  }



  @Test

  public void readOrderForMultiplePizza() {



    List<PizzaOrder> results = instance.order(new ByteArrayInputStream(ORDER_XML_3.getBytes()));



    PizzaOrder result = results.get(0);

    assertEquals('Capricciosa', result.getPizzaName());

    assertEquals('thin', result.getBase());

    assertEquals('2', result.getQuantity());



    result = results.get(1);

    assertEquals('Margherita', result.getPizzaName());

    assertEquals('thin', result.getBase());

    assertEquals('1', result.getQuantity());

  }

}

These tests demonstrate the scenarios of processing XML messages with and without white-space (fixing yesterday’s problem) together with a message that includes an order for multiple pizzas.

It’s all working really well, but Pete’s big ideas are coming fruition. He’s now expanding into a world wide concern with multiple kitchens around the world and an online presence. Pete hires some rinky-dinky business consultants who create a new pizza order XML schema and combine it with their existing customer schema. This is dropped into your email inbox and you wonder what to do next…

1Other search engines are available.

The source code is available from GitHub at:

git://github.com/roghughe/captaindebug.git

Continue to Part 3 of the series.

Reference: Approaches to XML – Part 2 – What about SAX? from our JCG partner Roger Hughes at the Captain Debug’s Blog blog.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

Leave a Reply


6 + = seven



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close