Friday, October 22, 2010

Gems in a Jar with RedBridge

Using JRuby embed API (RedBridge), it is easy to use Ruby gems from Java. However, packaging might not be simple, so people occasionally struggle to create a "portable" package. Being portable is important for a Java app. All stuffs of the Java app should be packaged in a jar archive, which should work on another PC, or even different OS. This is the good side of Java. This blog illustrates one solution for packaging.

I intentionally didn't use jruby command since the command hides what are going on behind. You might be exhausted by lines of "java -jar ....," sorry. But, it helps to understand a packaging process.

Directories


Since I used bundler to install gems, I created directories below to fit them to bundler.

Linden -+- lib -+- jruby -+- 1.8
+- src -+- linden
+- build

"Linden" is a top directory and the name doesn't have any special meaning. You can name it whatever you like. "lib/jruby/1.8" is the directory gems will be installed. "src" is for Java code, and "build" is for compiled *.class files.

Bundler Installation


I intentionally used jruby-complete.jar to assure gems won't be installed in the default location. The path to jruby-complete.jar can be anything since typing a full path to jruby-complete.jar works. But, I like a short path to type, so I copied jruby-complete.jar under "Linden/lib." Now, the directories/files were as in below:

Linden -+- lib -+- jruby -+- 1.8
+- jruby-complete.jar
+- src -+- linden
+- build


Then, the bundler installation went:

cd lib
java -jar jruby-complete.jar -S gem install bundler jruby-openssl --no-ri --no-rdoc -i jruby/1.8

Bundler and jruby-openssl gems were installed and the directories became as in below:

Linden -+- lib -+- jruby -+- 1.8 -+- bin -+- bundle
| +- cache -+- ...
| +- doc
| +- gems -+- bouncy-castle-java-1.5.0145.2 -+- ...
| | +- bundler-1.0.3 -+- ...
| | +- jruby-openssl-0.7.1 -+- ...

| +- specifications -+- ...
+- jruby-complete.jar
+- src -+- linden
+- build

Let's see whether gems are really installed.

export GEM_PATH=`pwd`/jruby/1.8
java -jar jruby-complete.jar -S gem list

This command should print out the installed three gems and pre-installed gems.

*** LOCAL GEMS ***

bouncy-castle-java (1.5.0145.2)
bundler (1.0.3)
columnize (0.3.1)
jruby-openssl (0.7.1)
rake (0.8.7)
rspec (1.3.0)
ruby-debug (0.10.3)
ruby-debug-base (0.10.3.2)
sources (0.0.1)

OK. Bundler was installed successfully.

Installation of other gems


Next step is to install other gems using bundler, so I used bundle command. Theoretically, PATH environment variable should work to find bundle command by ruby. Unfortunately, this didn't work in my "java -jar ..." usage. Instead, I typed a path to bundle command after -S option.

java -jar jruby-complete.jar -S jruby/1.8/bin/bundle init

Then, I added "twitter" gem to Gemfile.

# A sample Gemfile
source "http://rubygems.org"

# gem "rails"

gem "twitter"

The installation went on:

java -Xmx500m -jar jruby-complete.jar -S jruby/1.8/bin/bundle install --path=.

The java command option "-Xmx500m" is for performance and to avoid memory outage. Now, 6 gems were installed and the directories/files became:

Linden -+- lib -+- jruby -+- 1.8 -+- bin -+- bundle
| | +- httparty
| | +- oauth

| +- cache -+- ...
| +- doc
| +- gems -+- bouncy-castle-java-1.5.0145.2 -+- ...
| | +- bundler-1.0.3 -+- ...
| | +- crack-0.1.8 -+- ...
| | +- hashie-0.4.0 -+- ...
| | +- httparty-0.6.1 -+- ...

| | +- jruby-openssl-0.7.1 -+- ...
| | +- multi_json-0.0.4 -+- ...
| | +- oauth-0.4.3 -+- ...
| | +- twitter-0.9.12 -+- ...

| +- specifications -+- ...
+- jruby-complete.jar
+- Gemfile
+- Gemfile.lock

+- src -+- linden
+- build


Java code to use twitter gem


Since all gems were ready, I wrote Java code to see gems worked. This time, I relied on GEM_PATH environment variable to find gems. The fist code had Java package, linden, and class name, SearchSample. So, I created SearchSample.java file under "src/linden" directory.

package linden;

import org.jruby.embed.LocalContextScope;
import org.jruby.embed.ScriptingContainer;

public class SearchSample {
private String jarname = "sample.jar";

private SearchSample() {
String basepath = System.getProperty("user.dir");
System.out.println("basepath: " + basepath);
ScriptingContainer container = new ScriptingContainer(LocalContextScope.SINGLETHREAD);
System.out.println("jrubyhome: " + container.getHomeDirectory());
container.runScriptlet("ENV['GEM_PATH']='" + basepath + "/lib/jruby/1.8'");

String script =
"require 'rubygems'\n" +
"require 'twitter'\n" +
"require 'pp'\n" +
"pp Twitter::Search.new('#jruby').fetch.results.first";
container.runScriptlet(script);
}

public static void main(String[] args) {
new SearchSample();
}
}

Linden -+- lib -+- jruby -+- 1.8 -+- bin -+- bundle
| | +- httparty
| | +- oauth
| +- cache -+- ...
| +- doc
| +- gems -+- bouncy-castle-java-1.5.0145.2 -+- ...
| | +- bundler-1.0.3 -+- ...
| | +- crack-0.1.8 -+- ...
| | +- hashie-0.4.0 -+- ...
| | +- httparty-0.6.1 -+- ...
| | +- jruby-openssl-0.7.1 -+- ...
| | +- multi_json-0.0.4 -+- ...
| | +- oauth-0.4.3 -+- ...
| | +- twitter-0.9.12 -+- ...
| +- specifications -+- ...
+- jruby-complete.jar
+- Gemfile
+- Gemfile.lock
+- src -+- linden -+- SearchSample.java
+- build



Compile and run using rake-ant integration



Everything was ready. OK, how do I compile and run it? Of course, classic "javac" and "java" command were the options. But, these commands are not very convenient to repeat compile/run. So, I used JRuby's rake-ant integration. Yes, I wrote "Rakefile" to compile and run Java code. Isn't it nice? Here's Rakefile:

require 'ant'

namespace :ant do
task :compile => :clean do
ant.javac :srcdir => "src", :destdir => "build"
end
end

namespace :ant do
task :java => :compile do
ant.java :classname => "linden.SearchSample" do
classpath do
pathelement :location => "lib/jruby-complete.jar"
pathelement :path => "build"
end
end
end
end

require 'rake/clean'

CLEAN.include '*.class', '*.jar'

I created Rakefile under the Linden directory, so now the directories/files were:

Linden -+- lib -+- jruby -+- 1.8 -+- bin -+- bundle
| | +- httparty
| | +- oauth
| +- cache -+- ...
| +- doc
| +- gems -+- bouncy-castle-java-1.5.0145.2 -+- ...
| | +- bundler-1.0.3 -+- ...
| | +- crack-0.1.8 -+- ...
| | +- hashie-0.4.0 -+- ...
| | +- httparty-0.6.1 -+- ...
| | +- jruby-openssl-0.7.1 -+- ...
| | +- multi_json-0.0.4 -+- ...
| | +- oauth-0.4.3 -+- ...
| | +- twitter-0.9.12 -+- ...
| +- specifications -+- ...
+- jruby-complete.jar
+- Gemfile
+- Gemfile.lock
+- src -+- linden -+- SearchSample.java
+- build
+- Rakefile


When I typed "rake ant:java", the Java code above worked and printed out one tweet.

java -jar lib/jruby-complete.jar -S rake ant:java
(in /Users/yoko/Works/tmp/Linden)
basepath: /Users/yoko/Works/tmp/Linden
jrubyhome: file:/Users/yoko/Works/tmp/Linden/lib/jruby-complete.jar!/META-INF/jruby.home
{"profile_image_url"=>
"http://a3.twimg.com/profile_images/76084835/web-profile_normal.jpg",
"created_at"=>"Fri, 22 Oct 2010 15:38:09 +0000",
"from_user"=>"mccrory",
"metadata"=>{"result_type"=>"recent"},
"to_user_id"=>nil,
"text"=>
"RT @carlosqt: #IronRuby 1.1.1 a released! download from: http://ironruby.codeplex.com/ #programming #JRuby #Ruby #Rails #dotnet #code",
"id"=>28415816774,
"from_user_id"=>1757577,
"geo"=>nil,
"iso_language_code"=>"en",
"source"=>"<a href="http://twitter.com/">web</a>"}


Packaging


This time I can't rely on GEM_PATH environment variable. If GEM_PATH had worked also for the path in a jar, unfortunately, it didn't. Instead of GEM_PATH, I set all path to gems to ScriptingContainer. This was not complicated since all gems were installed in the same directory. I fixed the jar name, "sample.jar," so this name appeared in the Java code. This name could have been given via a command line argument. Edited SearchSample.java is in below:

package linden;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Enumeration;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.jar.JarEntry;
import java.util.jar.JarFile;
import java.util.zip.ZipEntry;

import org.jruby.embed.LocalContextScope;
import org.jruby.embed.ScriptingContainer;

public class SearchSample {
private String jarname = "sample.jar";

private SearchSample() throws IOException {
String basepath = System.getProperty("user.dir");
System.out.println("basepath: " + basepath);
ScriptingContainer container = new ScriptingContainer(LocalContextScope.SINGLETHREAD);
System.out.println("jrubyhome: " + container.getHomeDirectory());
container.setLoadPaths(getGemPaths(jarname, basepath));

String script =
"require 'rubygems'\n" +
"require 'twitter'\n" +
"require 'pp'\n" +
"pp Twitter::Search.new('#jruby').fetch.results.first";
container.runScriptlet(script);
}

private List<String> getGemPaths(String jarname, String basepath) throws IOException {
JarFile jarFile = new JarFile(basepath + "/" + jarname);
Enumeration<JarEntry> entries = jarFile.entries();
String gempath = "lib/jruby/1.8/gems/";
Set<String> gemnames = new HashSet<String>();
while (entries.hasMoreElements()) {
ZipEntry entry = (ZipEntry) entries.nextElement();
String entryName = entry.getName();
if (entryName.startsWith(gempath) && entryName.length() > gempath.length()) {
String n = entryName.substring(gempath.length());
String m = n.substring(0, n.indexOf("/"));
gemnames.add(m);
}
}
List<String> gemPaths = new ArrayList<String>();
for (String gem : gemnames) {
gemPaths.add("file:" + basepath + "/" + jarname + "!/lib/jruby/1.8/gems/" + gem + "/lib");
}
return gemPaths;
}


public static void main(String[] args) throws IOException {
new SearchSample();
}
}

To make jar archive, I added jar task to my Rakefile.

require 'ant'

namespace :ant do
task :compile => :clean do
ant.javac :srcdir => "src", :destdir => "build"
end
end

namespace :ant do
task :java => :compile do
ant.java :classname => "linden.SearchSample" do
classpath do
pathelement :location => "lib/jruby-complete.jar"
pathelement :path => "build"
end
end
end
end

namespace :ant do
task :jar => :compile do
ant.jar :basedir => ".", :destfile => "sample.jar" do
fileset :dir => "build" do
include :name => "**/*.class"
end
include :name => "lib/jruby/1.8/gems/**/*"
manifest do
attribute :name => "Main-Class", :value => "linden.SearchSample"
end
end
end
end


require 'rake/clean'

CLEAN.include '*.class', '*.jar'

All right, let's create jar archive.

java -jar lib/jruby-complete.jar -S rake ant:jar

The ant:jar task created the sample.jar archive in Linden directory. To test this archive was really gems in a jar, I unset GEM_PATH environment variable first. Then, I typed java command and got one tweet, Yay!

unset GEM_PATH; echo $GEM_PATH
java -cp lib/jruby-complete.jar:sample.jar linden.SearchSample

To ensure that the jar archive had gems in the jar, I moved sample.jar to a different directory and typed java command with the full path to jruby-complete.jar. It worked.

cp sample.jar ../.
cd ..
java -cp Linden/lib/jruby-complete.jar:sample.jar linden.SearchSample



This might be one answer of packaging gems in a jar and using gems from RedBridge.

Monday, October 11, 2010

pure Java Nokogiri - XSLT extension function -

Here's a memo of pure Java Nokogiri about an implementation of XSLT extension function. In short, I concluded pure Java version is unable to support Nokogiri style XSLT extension function. I've tried possible ways to make it happen, but for an inevitable reason, I settled to this conclusion. However, in future, this part might be reconsidered when XML libraries and APIs are replaced to others. For the future version of pure Java Nokogiri, I'm going to write down what I did and what was the problem. Hopefully, this memo will help to retry the implementation later.


1. What is XSLT extension function?

XSLT extension is defined in "14 Extensions" of XSL Transformations (XSLT)
Version 1.0 (http://www.w3.org/TR/xslt), which allows users to delegate an XSLT processing to a specified function/method written in a programming languages such as Ruby, Java, JavaScript. As in Nokogiri's test case,

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://e.org/functions"
extension-element-prefixes="f">

<xsl:template match="text()">
<xsl:copy-of select="f:capitalize(.)"/>
</xsl:template>
....

the element, "extension-element-prefixes," indicates the function is tied to this namespace. This XSL file should be common to all languages used to write the function.


2. Nokogiri style function mapping

Nokogiri maps the namesapce to the function as in below:

foo = Class.new do
def capitalize nodes
nodes.first.content.upcase
end
end

XSLT.register "http://e.org/functions", foo

Thus, a receiver object is registered to XSLT processor with the URL tag. It is nice Ruby friendly design. <xsl:copy-of select="f:capitalize(.)"/> above executes a "capitalize" method of the "foo" object.


3. How Java handles this?

As far as I googled about a Java way, not many documents, blogs, articles were out there. Probably, Extending XSLT with Java - Chapter 17. XSLT would be the best described one. This explains how to delegate the process to a Java method tied to the namespace. Xalan has the document, Xalan-Java Extensions; however, this uses BSF (Bean Scripting Framework: http://jakarta.apache.org/bsf/) to execute a function/method written in an XSL file. So, the first one has a possibility to realize Nokogiri style.


While I tried a couple of patterns, the "extension-element-prefixes" element seemed not to have a much meaning. Instead, "xmlns:java="http://xml.apache.org/xslt/java" and "xmlns:foo="xalan://[fully qualified class name]" worked. OK, so pure Java version of Nokogiri needs a specific rule to use XSLT extension function. This might be better than unsupported. Then, I wrote a Java class below to see whether it worked or not:

package Canna;

public class ExtensionFoo {
public static Object exec(String method, Object value) {
.....
}
}

The method should be static to be called from XSLT processor. The first argument is a method name to make the style resemble to the Nokogiri way. Since I thought

foo = Class.new do
def capitalize nodes
nodes.first.content.upcase
end
end

XSLT.register "http://e.org/functions", foo

xsl = Nokogiri.XSLT(<<-EOXSL)
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:foo="xalan://nokogiri.internals.XsltExtensionFunction"
extension-element-prefixes="foo">

<xsl:template match="text()">
<xsl:copy-of select="foo:exec('capitalize' .)"/>
</xsl:template>
...

would not be a bad substitution. Users need to have just a small rule only in the xsl file.


Here are entire files that I tried XSLT extension function in action by Java.

[extension.xsl]
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:foo="xalan://Canna.ExtensionFoo"
extension-element-prefixes="foo"
version="1.0">
<xsl:template match="text()">
<xsl:copy-of select="foo:exec('capitalize', .)"/>
</xsl:template>
</xsl:stylesheet>

[extension.xml]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN"
"http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<meta http-equiv="Content-type" content="application/xhtml+xml"/>
<title>Foo</title>
</head>
<body>
<h1>Foo</h1>
<p>Lorem ipsum.</p>
</body>
</html>

[ExtensionFoo.java]
package Canna;

public class ExtensionFoo {
public static Object exec(String method, Object value) {
if (value != null && (value instanceof String)) {
return ((String)value).toUpperCase();
} else {
return "hello?";
}
}
}

[TransformSample.java]
package Canna;

import java.io.File;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Templates;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class TransformSample {
private static String userdir = System.getProperty("user.dir");
private static String templateName = "extension.xsl";
private static String documentName = "extension.xml";

private TransformSample() throws TransformerConfigurationException, TransformerException {
Source templateSource = new StreamSource(new File(userdir + "/ext/java/Canna/" + templateName));
Source documentSource = new StreamSource(new File(userdir + "/ext/java/Canna/" + documentName));
Result result = new StreamResult(System.out);
TransformerFactory factory = TransformerFactory.newInstance();
Templates templates = factory.newTemplates(templateSource);
Transformer transformer = templates.newTransformer();
transformer.transform(documentSource, result);
}

public static void main(String[] args) throws TransformerConfigurationException, TransformerException {
new TransformSample();
}
}

What was the result? The program output a bunch of "hello?"s. Why? The given object of a method argument wasn't the String type but DTMNodeProxy. What's DTMNodeProxy? This is com.sun.org.apache.xml.internal.dtm.ref.DTMNodeProxy, org.apache.xml.internal.dtm.ref.DTMNodeProxy, or another XSLT processor's internal type. The most affordable choice would be org.apache.xml.internal.dtm.ref.DTMNodeProxy, but Nokogiri needs to add xalan.jar to its jar list. Definitely, Nokogiri will be fat. Otherwise, pure Java Nokogiri will lose portability. Thankfully, there is an option. Users can convert values from XSL to Java and hand a desired type in to the method. So, I changed one line in XSL file:

<xsl:copy-of select="foo:exec('capitalize', string(.))"/>

OK, this worked. Every text became an upper case.


4. Inevitable API conflict

Although there were pure Java specific rules, Nokogiri style XSLT extension function seemed to work. However, the result was NOT. Puzzled. I moved the sample code above under Nokogir source tree then figured out the culprit.

When xercesImpl.jar or jing.jar is on a classpath, the sample code failed to parse the XSL file.

Sigh... I haven't found what's wrong with that yet, but the conflict lies there. Pure Java Nokogiri uses an internal API of Xerces for SAX and Jing for Relaxng processing. Both xercesImpl.jar and jing.jar are necessary APIs.

Probably, the best choice would be not to support XSLT extension of Nokogiri style right now. In future, pure Java Nokogiri might choose other XML APIs. Or, someone might give me a good advice to avoid the conflict. So, still, there is a possibility to make it happen later. At the time, this memo hopefully helps to restart implementing the XSLT extension feature.