Enterprise Java

Hacking Jasper to Get Object Model of a JSP Page

To perform some checks and statistical analysis on my JSPs I needed a DOM-like, hierarchical model of elements contained in them. But parsing JSP pages isn’t trivial and is best left to a tool that excels in it – the Jasper JSP compiler used by Tomcat, Jetty, GlassFish and likely also by all others.

There is an easy way to tweak it to produce whatever output you need nad to transform a JSP into whatever form you want, including an object model of the page:

  1. Define a Node.Visitor subclass for handling the nodes (tags etc.) of a JSP
  2. Write a simple subclass of Compiler, overriding its generateJava() to invoke the visitor
  3. Subclass the compiler executor JspC overriding its method getCompilerClassName() to return the class of the Compiler of yours

Let’s see the code.

Implementation

1. Custom Visitor

A Visitor is invoked by the compiler to process a tree object model of a parsed JSP. This implementation just prints information about an interesting subset of nodes in the page, indented to make their nesting clear.

package org.apache.jasper.compiler;

import java.util.LinkedList;
import org.apache.jasper.JasperException;
import org.apache.jasper.compiler.Node.CustomTag;
import org.apache.jasper.compiler.Node.ELExpression;
import org.apache.jasper.compiler.Node.IncludeDirective;
import org.apache.jasper.compiler.Node.Visitor;
import org.xml.sax.Attributes;

public class JsfElCheckingVisitor extends Visitor {

    private String indent = "";

    @Override
    public void visit(ELExpression n) throws JasperException {
        logEntry("ELExpression", n, "EL: " + n.getEL());
        super.visit(n);
    }

    @Override
    public void visit(IncludeDirective n) throws JasperException {
        logEntry("IncludeDirective", n, toString(n.getAttributes()));
        super.visit(n);
    }

    @Override
    public void visit(CustomTag n) throws JasperException {
        logEntry("CustomTag", n, "Class: " + n.getTagHandlerClass().getName() + ", attrs: "
                + toString(n.getAttributes()));

        doVisit(n);

        indent += " ";
        visitBody(n);
        indent = indent.substring(0, indent.length() - 1);
    }

    private String toString(Attributes attributes) {
        if (attributes == null || attributes.getLength() == 0) return "";
        LinkedList<String> details = new LinkedList<String>();

        for (int i = 0; i < attributes.getLength(); i++) {
            details.add(attributes.getQName(i) + "=" + attributes.getValue(i));
        }

        return details.toString();
    }

    private void logEntry(String what, Node n, String details) {
        System.out.println(indent + n.getQName() + " at line:"
                + n.getStart().getLineNumber() + ": " + details);
    }

}

Notes:

  • The Visitor must be in the org.apache.jasper.compiler package because the essential class org.apache.jasper.compiler.Node is package-private
  • The method visitBody triggers processing of the nested nodes
  • There are more methods I could have overridden (and the catch-all method doVisit) but I’ve selected only those interesting for me
  • The node’s attributes are of the type …sax.Attributes, which contains attribute names and values as strings
    • attributes.getType(i) is usually CDATA
  • The Node structure contains information about the parent node, tag name, tag handler class, the corresponding line of the source file and the name of the source file and other useful information
  • CustomTag is likely the most interesting node type, e.g. all the JSF tags are of this type

Example Output (for a JSF Page)

jsp:directive.include at line:5: [file=includes/stdjsp.jsp]
jsp:directive.include at line:6: [file=includes/ssoinclude.jsp]
f:verbatim at line:14: Class: com.sun.faces.taglib.jsf_core.VerbatimTag, attrs:
htm:div at line:62: Class: com.exadel.htmLib.tags.DivTag, attrs: [style=width:100%;]
 h:form at line:64: Class: com.sun.faces.taglib.html_basic.FormTag, attrs: [id=inputForm]
  htm:table at line:66: Class: com.exadel.htmLib.tags.TableTag, attrs: [cellpadding=0, width=100%, border=0, styleClass=clear box_main]
   htm:tr at line:71: Class: com.exadel.htmLib.tags.TrTag, attrs:
    htm:td at line:72: Class: com.exadel.htmLib.tags.TdTag, attrs:
    f:subview at line:73: Class: com.sun.faces.taglib.jsf_core.SubviewTag, attrs: [id=cars]
      jsp:directive.include at line:74: [file=/includes/cars.jsp]
      h:panelGroup at line:8: Class: com.sun.faces.taglib.html_basic.PanelGroupTag, attrs: [rendered=#{bookingHandler.flowersAvailable}]
...
   htm:tr at line:87: Class: com.exadel.htmLib.tags.TrTag, attrs: [style=height:5px]
    htm:td at line:87: Class: com.exadel.htmLib.tags.TdTag, attrs:

(I do not print “closing tags” for it’s clear that a tag ends when another node with the same or smaller indentation appears or the output ends.)

2. Compiler Subclass

The important part is generateJava, which I have just copied, removed some code from it and added an invocation of my Visitor. So actually only 3 lines in the listing below are new (6, 56, 70)

public class OnlyReadingJspPseudoCompiler extends Compiler {

    /** We're never compiling .java to .class. */
    @Override protected void generateClass(String[] smap) throws FileNotFoundException,
            JasperException, Exception {
        return;
    }

    /** Copied from {@link Compiler#generateJava()} and adjusted */
    @Override protected String[] generateJava() throws Exception {

        // Setup page info area
        pageInfo = new PageInfo(new BeanRepository(ctxt.getClassLoader(),
                errDispatcher), ctxt.getJspFile());

        // JH: Skipped processing of jsp-property-group in web.xml for the current page

        if (ctxt.isTagFile()) {
            try {
                double libraryVersion = Double.parseDouble(ctxt.getTagInfo()
                        .getTagLibrary().getRequiredVersion());
                if (libraryVersion < 2.0) {
                    pageInfo.setIsELIgnored("true", null, errDispatcher, true);
                }
                if (libraryVersion < 2.1) {
                    pageInfo.setDeferredSyntaxAllowedAsLiteral("true", null,
                            errDispatcher, true);
                }
            } catch (NumberFormatException ex) {
                errDispatcher.jspError(ex);
            }
        }

        ctxt.checkOutputDir();

        try {
            // Parse the file
            ParserController parserCtl = new ParserController(ctxt, this);

            // Pass 1 - the directives
            Node.Nodes directives =
                parserCtl.parseDirectives(ctxt.getJspFile());
            Validator.validateDirectives(this, directives);

            // Pass 2 - the whole translation unit
            pageNodes = parserCtl.parse(ctxt.getJspFile());

            // Validate and process attributes - don't re-validate the
            // directives we validated in pass 1
            /**
             * JH: The code above has been copied from Compiler#generateJava() with some
             * omissions and with using our own Visitor.
             * The code that used to follow was just deleted.
             * Note: The JSP's name is in ctxt.getJspFile()
             */
            pageNodes.visit(new JsfElCheckingVisitor());

        } finally {}

        return null;
    }

    /**
     * The parent's implementation, in our case, checks whether the target file
     * exists and returns true if it doesn't. However it is expensive so
     * we skip it by returning true directly.
     * @see org.apache.jasper.JspCompilationContext#getServletJavaFileName()
     */
    @Override public boolean isOutDated(boolean checkClass) {
        return true;
    }

}

Notes:

  • I have deleted quite lot of code unimportant for me from generate Java; for a different type of analysis than I intend some of that code could have been useful, so look into the original Compiler class and decide for yourself.
  • I do not really care about JSP ELs so it might be possible to optimize the compiler to need only one pass.

3. Compiler Executor

It is difficult to use a Compiler directly because it depends on quite a number of complex settings and objects. The easiest thing is thus to reuse the Ant task JspC, which has the additional benefit of finding the JSPs to process. As mentioned, the key thing is the overriding of getCompilerClassName to return my compiler’s class (line 8)

import org.apache.jasper.JspC;

/** Extends JspC to use the compiler of our choice; Jasper version 6.0.29. */
public class JspCParsingToNodesOnly extends JspC {

    /** Overriden to return the class of ours (default = null => JdtCompiler) */
    @Override public String getCompilerClassName() {
        return OnlyReadingJspPseudoCompiler.class.getName();
    }

    public static void main(String[] args) {
        JspCParsingToNodesOnly jspc = new JspCParsingToNodesOnly();

        jspc.setUriroot("web"); // where to search for JSPs
        //jspc.setVerbose(1);     // 0 = false, 1 = true
        jspc.setJspFiles("helloJSFpage.jsp"); // leave unset to process all; comma-separated

        try {
            jspc.execute();
        } catch (JasperException e) {
            throw new RuntimeException(e);
        }
    }
}

Notes:

  • JspC normally finds all files under the specified Uriroot but you can tell it to ignore all but some selected ones by passing their comma-separated names into setJspFiles.

Compile Dependencies

In thy Ivy form:

<dependency name="jasper" org="org.apache.tomcat" rev="6.0.29">
<dependency name="jasper-jdt" org="org.apache.tomcat" rev="6.0.29">
<dependency name="ant" org="org.apache.ant" rev="1.8.2">

License

All the code here is directly derived from Jasper and thus falls under the same license, i.e. the Apache License, Version 2.0.

Conclusion

Jasper wasn’t really designed for extension and modularity as documented by the fact that the crucial Node class is package private and by its API being so complex that reusing just a part of it is very hard. Fortunately the Ant task JspC makes it usable outside of a servlet container by providing some “fake” objects and there is a way to tweak it to our needs with very little work though it wasn’t that easy to figure it out . I had to apply some dirty tricks, namely using stuff from a package-private class and overriding a method not intended to be overriden (generateJava) but it works and provides very valuable output, which makes it possible to do just anything you might want to do with the a JSP.

Happy coding …
Byron
Related Articles:

Jakub Holy

Jakub is an experienced Java[EE] developer working for a lean & agile consultancy in Norway. He is interested in code quality, developer productivity, testing, and in how to make projects succeed.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jay
Jay
10 years ago

Hi,

Thanks for sharing knowledge
could u pls print here content of helloJSFpage.jsp ?
I am trying to parse my JSP page but it is not recognizing div tags.

any other method shall I override from Visitor class ?
also how to visit nodes inside div tag?

Jakub Holy
10 years ago
Reply to  Jay

Hi Jay,

Sorry, it is quite a long time since this was published. You might want to look into the project https://github.com/jakubholynet/static-jsfexpression-validator (esp. static-jsfexpression-validator-core/src/main/java/net/jakubholy/jeeutils/jsfelcheck/expressionfinder/impl/jasper/JspCParsingToNodesOnly.java). The repo contains also tests so I guess you could run the test and set debugger in the class to find out where and how is it called from. I guess https://github.com/jakubholynet/static-jsfexpression-validator/blob/master/test-webapp-jsf12/src/main/webapp/helloWorld.jsp is one of the pages being tested – parsed.

Back to top button