Core Java

Announcing EAXY: Making XML easier in Java

XML libraries in Java is a minefield. The amount of code required to manipulate and read XML is staggering, the risk of getting class path problems with different libraries is substantial and the handling of namespaces opens for a lot of confusion and errors. The worst thing is that the situation doesn’t seem to improve.

A colleague made me aware of the JOOX library some time back. It’s a very good attempt to patch these problems. I found a few shortcomings with JOOX that made me want to explore alternatives and naturally I ended up writing my own library (as you do). I want the library to allow for Easy manipulation of XML, and in an episode of insufficient judgement, I named the library EAXY. It’s a really bad name, so I appreciate suggestions for improvement.

Here is what I set out to solve:

  • It should be easy to create fairly complex XML trees with Java code
  • It should be straight-forward and fool-proof to use namespaces. (This is where JOOX failed me)
  • It should easy to read values out of the XML structure.
  • It should be easy to work with existing XML documents in the file structure or classpath
  • The library should prefer throwing an exception over silently failing.
  • As a bonus, I wanted to make it even easier to deal with (X)HTML, by adding convenience functions for this.

1. Creating an XML document

An XML document is just a tree. How about if align the tree to the Java syntax tree. For example – lets say you wanted to programmatically wanted to construct some feedback on this article:

Element email = Xml.el("message",
        Xml.el("recipients",
            Xml.el("recipent",
                    Xml.attr("type", "email"),
                    Xml.attr("role", "To"),
                    Xml.text("mailto:johannes@brodwall.com")),
            Xml.el("recipent", Xml
                    .attr("type", "email"),
                    Xml.attr("role", "Cc"),
                    Xml.text("mailto:contact@brodwall.com"))),
        Xml.el("subject", "EAXY feedback"),
        Xml.el("contents", "I think this is an interesting library"));

Each element (Xml.el) has a tag name and can nest other elements, attributes (Xml.attr) or text (Xml.text). If the element only contains a text, we don’t even need to make the call to Xml.text. The syntax is optimized so that if you want to do a static import on Xml.* you can write code like this:

Element email = el("message",
        el("recipients",
            el("recipent",
                    attr("type", "email"),
                    attr("role", "to"),
                    text("mailto:johannes@brodwall.com")),
            el("recipent",
                    attr("type", "email"),
                    attr("role", "cc"),
                    text("mailto:contact@brodwall.com"))),
        el("subject", "EAXY feedback"),
        el("content", "I think this is an interesting library"));

2. Reading XML

Reading XML with Java code can be a challenge. The DOM API makes it extremely wordy to do anything at all. You an use XPath, but can be a bit too much on the compact side and when you do something wrong, the result is simply that you get an empty collection or a null value back. I think we can improve on this.

Consider the following:

System.out.println(email.find("recipients", "recipient").texts());

I step down the XML tree structure and get all the recipient email addresses of the previous message. But wait – running this code returns an empty list. EAXY allows us to avoid scratching our head over this:

System.out.println(email.find("recipients", "recipient").check().texts());

Now I get the following exception:

org.eaxy.NonMatchingPathException: Can't find 
	{recipient} below [message, recipients].
	Actual elements: [Element{recipent}, Element{recipent}]

As you can see, we misspelled “recipent” in the message. Let’s get back to this problem later, but for now, let’s work around it to create something meaningful:

for (Element recipient : email.find("recipients", "recipent")) {
    if ("to".equals(recipient.attr("role"))) {
        System.out.println(recipient.text());
    }
}

Again, I think this is about as fluent as Java’s syntax allows.

3. Validation and namespaces

So, we had a message where one of the element names was misspelled. If you have an XSD document for the XML you’re using, you can validate the document against this. However, as you may get used to when it comes to Java XML libraries the act of performing this validation is quite well hidden behind complex API’s. So I’ve provided a little help:

Xml.validatorFromResource("mailmessage.xsd").validate(email);

This reads the mailmessage.xsd from the classpath, which is the most common use case for me.

Of course, most schemas don’t refer to elements in the empty namespace. When using validation, it’s common that we have to construct elements in a specific namespace. In most Java libraries for dealing with XML, this is hard and easy to get wrong, especially when namespaces are mixed. I’ve made namespaces into a primary feature of the Eaxy library:

Namespace MSG_NS = new Namespace("http://eaxy.org/test/mailmessage", "msg");
Element email = MSG_NS.el("message",
        MSG_NS.el("recipients",
            MSG_NS.el("recipient",
                    MSG_NS.attr("type", "email"),
                    attr("role", "cc"),
                    text("mailto:contact@brodwall.com"))));

Notice that the “type” and the “role” attributes belong to different namespaces – a scenario that is especially hard to facilitate with other libraries.

4. Templating

Reading the XSD from the classpath inspired another usage: What if we have an XML document as a template in the classpath and then use Java-code to manipulate this document. This would be especially handy for XHTML:

Document doc = Xml.readResource("testdocument.html");
Element peopleElement = doc.select("#peopleForm");

peopleElement.add(el("input",
     attr("type", "text"),
	 attr("name", "firstName"),
	 attr("value", "Johannes")));
peopleElement.add(el("input", 
     attr("type", "text"), 
	 attr("name", "lastName"),
	 attr("value", "Brodwall")));

This code reads the file testdocument.html from the classpath, selects the element with id “peopleForm” and adds two input elements to it.

5. HTML convenience

In the code above, we set the type, name and value attributes of HTML input elements. These are among the most frequently used attributes in HTML manipulation. To make this easier, I’ve added some convenience methods to Eaxy:

peopleElement.add(el("input")
    .type("text").name("firstName").val("Johannes"));
peopleElement.add(el("input")
    .type("text").name("lastName").val("Brodwall"));

A final case I wanted to optimize for is that of dealing with forms in HTML. Here’s some code that manipulates a form before that can be sent to the user.

HtmlForm form = new HtmlForm(peopleElement);
form.set("firstName", "Johannes");
form.set("lastName", "Brodwall");

doc.writeTo(req.getWriter());

Here, I set the form contents directly. The code will throw an exception if a parameter name is misspelled, so it’s easy to ensure that you use it correctly.

Conclusion

I have five examples of how Eaxy can be used to do easily what’s hard to do with most XML libraries for Java: Create a document tree with pure Java code, read and manipulate individual parts of the XML tree, the use of namespace and validation, templating and manipulating (X)HTML documents and forms.

The library is not stable now, but for an XML library to be unstable may not be a very risky situation as most errors will be easy to detect long before production.

I hope that you may find it useful to try and use this library in your code to deal with XML and (X)HTML manipulation. I’m hoping for some users who can help me iron out the bugs and make Eaxy even more easy to use.

Oh, and do let me know if you come up with a better name.
 

Johannes Brodwall

Johannes works as a programmer, software architect and provocateur for Sopra Steria Norway. He loves writing code in Java, C# and JavaScript and making people think.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Shadow Caster
Shadow Caster
10 years ago

Looks awesome but a link in the article to the EAXY code base would be nice.

Johannes Brodwall
10 years ago
Reply to  Shadow Caster

Thanks for pointing out this rather basic shortcoming!

I’ve updated the original article on my blog with a link to the Github project and also will try and have JavaCodeGeeks update the article.

At any rate, the code base is here: https://github.com/jhannes/eaxy

Padma
Padma
10 years ago

Great Article.

I feel that EasyXml would be the appropriate name.

Thanks.
Padma.

Johannes Brodwall
10 years ago
Reply to  Padma

Hi Padma

Good suggestion. “EasyXml” is pretty good, but it’s a bit plain.

Back to top button