Monday, October 11, 2010

pure Java Nokogiri - XSLT extension function -

Here's a memo of pure Java Nokogiri about an implementation of XSLT extension function. In short, I concluded pure Java version is unable to support Nokogiri style XSLT extension function. I've tried possible ways to make it happen, but for an inevitable reason, I settled to this conclusion. However, in future, this part might be reconsidered when XML libraries and APIs are replaced to others. For the future version of pure Java Nokogiri, I'm going to write down what I did and what was the problem. Hopefully, this memo will help to retry the implementation later.


1. What is XSLT extension function?

XSLT extension is defined in "14 Extensions" of XSL Transformations (XSLT)
Version 1.0 (http://www.w3.org/TR/xslt), which allows users to delegate an XSLT processing to a specified function/method written in a programming languages such as Ruby, Java, JavaScript. As in Nokogiri's test case,

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://e.org/functions"
extension-element-prefixes="f">

<xsl:template match="text()">
<xsl:copy-of select="f:capitalize(.)"/>
</xsl:template>
....

the element, "extension-element-prefixes," indicates the function is tied to this namespace. This XSL file should be common to all languages used to write the function.


2. Nokogiri style function mapping

Nokogiri maps the namesapce to the function as in below:

foo = Class.new do
def capitalize nodes
nodes.first.content.upcase
end
end

XSLT.register "http://e.org/functions", foo

Thus, a receiver object is registered to XSLT processor with the URL tag. It is nice Ruby friendly design. <xsl:copy-of select="f:capitalize(.)"/> above executes a "capitalize" method of the "foo" object.


3. How Java handles this?

As far as I googled about a Java way, not many documents, blogs, articles were out there. Probably, Extending XSLT with Java - Chapter 17. XSLT would be the best described one. This explains how to delegate the process to a Java method tied to the namespace. Xalan has the document, Xalan-Java Extensions; however, this uses BSF (Bean Scripting Framework: http://jakarta.apache.org/bsf/) to execute a function/method written in an XSL file. So, the first one has a possibility to realize Nokogiri style.


While I tried a couple of patterns, the "extension-element-prefixes" element seemed not to have a much meaning. Instead, "xmlns:java="http://xml.apache.org/xslt/java" and "xmlns:foo="xalan://[fully qualified class name]" worked. OK, so pure Java version of Nokogiri needs a specific rule to use XSLT extension function. This might be better than unsupported. Then, I wrote a Java class below to see whether it worked or not:

package Canna;

public class ExtensionFoo {
public static Object exec(String method, Object value) {
.....
}
}

The method should be static to be called from XSLT processor. The first argument is a method name to make the style resemble to the Nokogiri way. Since I thought

foo = Class.new do
def capitalize nodes
nodes.first.content.upcase
end
end

XSLT.register "http://e.org/functions", foo

xsl = Nokogiri.XSLT(<<-EOXSL)
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:foo="xalan://nokogiri.internals.XsltExtensionFunction"
extension-element-prefixes="foo">

<xsl:template match="text()">
<xsl:copy-of select="foo:exec('capitalize' .)"/>
</xsl:template>
...

would not be a bad substitution. Users need to have just a small rule only in the xsl file.


Here are entire files that I tried XSLT extension function in action by Java.

[extension.xsl]
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:foo="xalan://Canna.ExtensionFoo"
extension-element-prefixes="foo"
version="1.0">
<xsl:template match="text()">
<xsl:copy-of select="foo:exec('capitalize', .)"/>
</xsl:template>
</xsl:stylesheet>

[extension.xml]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN"
"http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<meta http-equiv="Content-type" content="application/xhtml+xml"/>
<title>Foo</title>
</head>
<body>
<h1>Foo</h1>
<p>Lorem ipsum.</p>
</body>
</html>

[ExtensionFoo.java]
package Canna;

public class ExtensionFoo {
public static Object exec(String method, Object value) {
if (value != null && (value instanceof String)) {
return ((String)value).toUpperCase();
} else {
return "hello?";
}
}
}

[TransformSample.java]
package Canna;

import java.io.File;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Templates;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class TransformSample {
private static String userdir = System.getProperty("user.dir");
private static String templateName = "extension.xsl";
private static String documentName = "extension.xml";

private TransformSample() throws TransformerConfigurationException, TransformerException {
Source templateSource = new StreamSource(new File(userdir + "/ext/java/Canna/" + templateName));
Source documentSource = new StreamSource(new File(userdir + "/ext/java/Canna/" + documentName));
Result result = new StreamResult(System.out);
TransformerFactory factory = TransformerFactory.newInstance();
Templates templates = factory.newTemplates(templateSource);
Transformer transformer = templates.newTransformer();
transformer.transform(documentSource, result);
}

public static void main(String[] args) throws TransformerConfigurationException, TransformerException {
new TransformSample();
}
}

What was the result? The program output a bunch of "hello?"s. Why? The given object of a method argument wasn't the String type but DTMNodeProxy. What's DTMNodeProxy? This is com.sun.org.apache.xml.internal.dtm.ref.DTMNodeProxy, org.apache.xml.internal.dtm.ref.DTMNodeProxy, or another XSLT processor's internal type. The most affordable choice would be org.apache.xml.internal.dtm.ref.DTMNodeProxy, but Nokogiri needs to add xalan.jar to its jar list. Definitely, Nokogiri will be fat. Otherwise, pure Java Nokogiri will lose portability. Thankfully, there is an option. Users can convert values from XSL to Java and hand a desired type in to the method. So, I changed one line in XSL file:

<xsl:copy-of select="foo:exec('capitalize', string(.))"/>

OK, this worked. Every text became an upper case.


4. Inevitable API conflict

Although there were pure Java specific rules, Nokogiri style XSLT extension function seemed to work. However, the result was NOT. Puzzled. I moved the sample code above under Nokogir source tree then figured out the culprit.

When xercesImpl.jar or jing.jar is on a classpath, the sample code failed to parse the XSL file.

Sigh... I haven't found what's wrong with that yet, but the conflict lies there. Pure Java Nokogiri uses an internal API of Xerces for SAX and Jing for Relaxng processing. Both xercesImpl.jar and jing.jar are necessary APIs.

Probably, the best choice would be not to support XSLT extension of Nokogiri style right now. In future, pure Java Nokogiri might choose other XML APIs. Or, someone might give me a good advice to avoid the conflict. So, still, there is a possibility to make it happen later. At the time, this memo hopefully helps to restart implementing the XSLT extension feature.

No comments: