In this post we’ll survey some options for evaluating XPath expressions with Java in a more industrial, performance-sensitive fashion. Although not exhaustive our analysis will provide a framework for comparing the solutions and expose some tradeoffs.
This exploration was motivated by a client who needed run a zillion XPaths against a set of fairly large xml documents.
The Contenders
The Code
The following groovy code can be pasted in the Groovy console. You’ll need to add the following jars to the Groovy console classpath via the Script/Add Jar to Classpath menu option.
- commons-jxpath-1.3.jar
- jaxen-1.1.1.jar
- xom-1.2.7.jar
import javax.xml.namespace.*
import javax.xml.parsers.*
import javax.xml.xpath.*
import org.apache.commons.jxpath.JXPathContext
import org.jaxen.dom.DOMXPath
class Timer {
String name
long start = System.currentTimeMillis()
Timer(String name) { this.name = name }
def mark() { println "$name\t\t${System.currentTimeMillis() - start}" }
}
def createDoc = { path ->
InputStream is = new FileInputStream(path)
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance()
factory.setNamespaceAware(true)
factory.newDocumentBuilder().parse(is)
}
def createXomDoc = { path ->
nu.xom.Builder builder = new nu.xom.Builder()
builder.build(path)
}
def nsMap = [
"atom": "http://www.w3.org/2005/Atom",
"xlink": "http://www.w3.org/1999/xlink",
]
NamespaceContext nsCtx = [
getNamespaceURI: {
nsMap.get(it)
},
getPrefix: {},
getPrefixes: { nsMap.keySet().iterator() }
] as NamespaceContext
def jdk = { doc, xp, ctx ->
XPath xpath = XPathFactory.newInstance().newXPath()
xpath.setNamespaceContext(ctx)
XPathExpression expr = xpath.compile(xp)
expr.evaluate(doc, XPathConstants.NODESET)
}
def jxpath = { doc, xp, ctx ->
JXPathContext jpc = JXPathContext.newContext(doc)
jpc.setLenient(true)
ctx.getPrefixes().each { pre ->
jpc.registerNamespace(pre, ctx.getNamespaceURI(pre))
}
jpc.selectNodes(xp)
}
def jaxen = { doc, xp, ctx ->
DOMXPath xpath = new DOMXPath(xp)
ctx.getPrefixes().each { pre ->
xpath.addNamespace(pre, ctx.getNamespaceURI(pre))
}
xpath.selectNodes(doc)
}
def xom = { xomDoc, xp, ctx, Closure clo ->
nu.xom.XPathContext xpc = new nu.xom.XPathContext();
ctx.getPrefixes().each { pre ->
xpc.addNamespace(pre, ctx.getNamespaceURI(pre))
}
nu.xom.Nodes nodes = xomDoc.query(xp, xpc)
for (int i = 0; i < nodes.size(); i++) {
clo.call(nodes.get(i))
}
}
// now test the above
//def path = "/home/John/backup/projects/xpath2/usgs_feed.xml"
def path = "/home/John/backup/projects/xpath2/sam.xml"
def doc = createDoc(path)
def xomDoc = createXomDoc(path)
//def xpath = "//atom:entry/atom:id/text()"
//def xpath = "//atom:entry/atom:link[not(@rel)]/@href/text()"
def xpath = "//atom:entry/atom:link[string-length(@rel) = 0]/@href"
def count = 3000 // number of iterations for each solution
//def show = { println it }
def show = { }
2.times {
Timer t = new Timer("jdk")
count.times {
jdk(doc, xpath, nsCtx).each {
show it.nodeValue
}
}
t.mark()
t = new Timer("jxpath")
count.times {
jxpath(doc, xpath, nsCtx).each {
show it.nodeValue
}
}
t.mark()
t = new Timer("jaxen")
count.times {
jaxen(doc, xpath, nsCtx).each {
show it.nodeValue
}
}
t.mark()
t = new Timer("xom")
count.times {
xom(xomDoc, xpath, nsCtx) {
show it.getValue()
}
}
t.mark()
}
The input xml referenced by the path variable was a snapshot of the atom feed from Sam Ruby’s blog.
The Explanation
The create*Doc() closures in lines 21 and 28 create node trees from XML text. All but XOM use vanilla DOM trees.
The nsMap and nsCtx variables defined in lines 33 and 38 provide XML namespace information to the four eponymous XPath evaluators defined in lines 48, 55, 64, and 72, respectively. These four closures take a node tree, an XPath expression, and a NamespaceContext as arguments and return a collection – except for xom() which takes a fourth closure argument to process each matching node.
After creating node trees from the XML text in lines 90-91 (XOM is again the exception) the code in lines 103-135 calls each processor count times with the same XPath expression and prints the elapsed time. The entire process is then repeated to account for start up and caching effects.
The upfront creation of the DOM trees followed by “intense” processing was inspired by our client’s workflow.
If you’d like to examine the actual results change the count variable to 1 and transpose the commenting on the show() closure in lines 100 and 101. You should see something like this:
/blog/2012/02/25/Hearing-Aid /blog/2012/02/22/Your-Next-Desktop-Could-be-a-Phone /blog/2012/02/20/Mulligan /blog/2012/02/19/OpenID-upgrades /blog/2012/02/18/WunderWiki /blog/2012/02/15/Twelve-Steps /blog/2012/02/13/On-The-Move /blog/2012/02/09/Dominoes /blog/2012/02/04/Default-to-Incognito /blog/2012/02/02/Wunderbar /blog/2012/01/23/Port-Forwarding /blog/2012/01/20/AWDwR-updated-for-Rails-3-2 /blog/2012/01/17/The-Presidents-challenge /blog/2012/01/04/Bootstrapping-Debian-Unstable /blog/2011/12/24/Ubuntu-vs-Ruby /blog/2011/11/24/Experience-with-Git /blog/2011/10/10/Building-Dart /blog/2011/10/08/Thunderbird-add-ons-in-Flux /blog/2011/10/03/No-more-XML-parsing-failed-errors /blog/2011/08/31/AWDwR-updated-for-Rails-3-1 jdk 10
The Results
Here are the times in milliseconds reported on my 8GB 2.4GHz dual processor Fedora box:
jdk 5094 jxpath 1981 jaxen 2275 xom 2298 jdk 4949 jxpath 2076 jaxen 2236 xom 2198
That’s pretty much consistent with our results at the client: JXPath was the fastest and ahead of the JDK by a wide margin for our particular workload. Kudos to my colleague RC for suggesting it.
The Caveats
JXPath implements a large subset of XPath 1.0. If your XPath expressions are wild enough you’ll find some holes. I’ll try to update this post with a few.
Other Options
If your XPath expressions are simple and stable you could dodge the entire node tree creation step by implementing a custom SAX DefaultHandler or XML pull parser.
VTD-XML takes a novel approach to implement “the world’s fastest XPath 1.0 Processor”.


