Monday, April 25, 2011

Attempt to get Nokogiri work on Android


Conclusion


As a result, Nokogiri was loaded on Android successfully but didn't work on it. When I tried to parse XML document, I got tons of errors something like:
W/dalvikvm(  374): Unable to resolve superclass of Lorg/apache/xerces/dom/DeferredDocumentImpl; (2008)
W/dalvikvm( 374): Link of class 'Lorg/apache/xerces/dom/DeferredDocumentImpl;' failed

I'm pretty sure this sort of error messages complain there aren't enough interfaces of org.w3c packages defined in Android SDK. Actually, Android SDK's org.w3c API is a subset of JDK's. This is the problem. Xerces needs a full-set of org.w3c packages to work. Pure Java Nokogiri heavily relies on Xerces and nekoHTML/nekoDTD, which are built on top of Xerces. So, pure Java Nokogiri also needs the fullset of org.w3c packages to keep compatibility with libxml2 backed, CRuby version. This is why Nokogiri ended up in raising an exception as in below:
W/dalvikvm(  374): threadid=10: thread exiting with uncaught exception (group=0x40014760)
E/AndroidRuntime( 374): FATAL EXCEPTION: runWithLargeStack
E/AndroidRuntime( 374): java.lang.NoClassDefFoundError: org.apache.xerces.dom.DeferredDocumentImpl
E/AndroidRuntime( 374): at org.apache.xerces.parsers.AbstractDOMParser.startDocument(Unknown Source)
E/AndroidRuntime( 374): at org.apache.xerces.impl.dtd.XMLDTDValidator.startDocument(Unknown Source)
(snip)

Is this avoidable? Might be. Googling led me some discussions about replacing org.w3c and related other packages. If I can include Xerces' xml-apis.jar (this defines org.w3c/org.w3c.xxx, javax.xml.xxx, org.xml.xxx) in my Android app and override some of core packages, Nokogiri will start working exactly the same as a web app on Rails. But, it should not be a good workaround. Surgery on SDK might incur other applications that use replaced packages.


Probably, the best answer will create a subset of Nokogiri for Android. I'm not sure such limited version of Nokogiri still attracts users. But, I think it's better than nothing.



Thoughts on Ruboto and Android

Although my small Nokogiri app didn't work, I'm going to write about what I learned and did. This might help some poeple who want to make Ruby gems to work.

  • JDK should be 1.6.0_24 on OS X
Ruboto people might not develop JRuby on Rails on Google App Engine, but I do. Just before I tried Ruboto, I had to downgrade JDK version for Google App Engine gem. So, when I started, my JDK was 1.6.0_22. I spent pretty much time to figure out why ruboto didn't work on my PC at all. Once the JDK got back to the latest, ruboto worked like a magic. Make sure what version of JDK you are using.


  • Android API level should be 11
Not all Ruboto samples needs level 11 API. For example, samples of https://www.ibm.com/developerworks/web/library/wa-ruby/ worked on level 8. But, Nokogiri needs level 11. I'm not sure the reason, but, the activerecord (and jdbc) sample, https://github.com/ruboto/ruboto-core/wiki/Tutorial%3A-Using-an-SQLite-database-with-ActiveRecord-and-RubyGems, was also tested on level 11, which is Java backed rubygems like Nokogiri.


  • Jar archives should be moved to project's libs directory
This happens on an environment that uses custom classloader, for example, Google App Engine. So, I have all jars in my project's libs directory, https://github.com/yokolet/cranberry/tree/master/libs, so that custom classloader can load all jars. If those jars failed to be loaded, Nokogiri raises a mysterious, "undefined method `next_sibling' for class `Nokogiri::XML::Node'," error. I didn't get that error, so jars should be loaded.

Also, I commented line 18-24 out from nokogiri.rb (https://github.com/yokolet/cranberry/blob/master/assets/vendor/gems/1.8/gems/nokogiri-1.5.0.beta.4-java/lib/nokogiri.rb) so that Nokogiri doesn't try to load those jars again.


  • Configuration and setup are key to load gems
Loading gems on Ruboto was tricky. In the article, https://www.ibm.com/developerworks/web/library/wa-ruby/, the author rearranged all ruby files into single directory. This might work for small rubygems but never does for Nokogiri. For example, Nokogiri has nokogiri/html/document.rb and nokogiri/xml/document.rb. Instead, the way described in https://github.com/ruboto/ruboto-core/wiki/Tutorial%3A-Using-an-SQLite-database-with-ActiveRecord-and-RubyGems worked well. It looks complicated, but I realized that the thread based gem loading way was really necessary while I was trying other stuff. My config.rb is https://github.com/yokolet/cranberry/blob/master/assets/scripts/config.rb if you want look at it. Also, I edited src/irg/ruboto/Script.java (https://github.com/yokolet/cranberry/blob/master/src/org/ruboto/Script.java) and added "vendor" directory.

When I clicked on "Cranberry" Ruboto icon right after "rake install" said "Success," all Nokogiri files were copied to /data/data/.... directory. To cut down the time for copying, I deleted Nokogiri's test and ext directories, which are unnecessary to run Nokogiri.


  • Needs threads to become a nifty app
Android expects developers' "responsiveness" (http://developer.android.com/guide/practices/design/responsiveness.html). According the document, database or network access should not be performed on a main thread. In my Nokogiri sample, I tried to get rss feed, on the main thread firstly, so I got the error:
W/System.err(  343): org.jruby.exceptions.RaiseException: Native Exception: 'class android.os.NetworkOnMainThreadException'; Message: null; StackTrace: android.os.NetworkOnMainThreadException
W/System.err( 343): at android.os.StrictMode$AndroidBlockGuardPolicy.onNetwork(StrictMode.java:1077)
W/System.err( 343): at java.net.InetAddress.lookupHostByName(InetAddress.java:481)
(snip)

This is why config.rb uses threads to require rubygems.


  • No need to reinstall app when scripts are updated
"rake update_scripts" updates Ruby scripts of installed app. So, you don't need reinstall the app. This was a great help for me since an installing process took many many minutes.


  • ... but, it doesn't work. What's going on ???
As an Android newbie, I very often fell into troubles to get Android SDK and the app to work. Sometimes, app icons didn't show up, or rake install failed. The troubleshooting, https://github.com/ruboto/ruboto-core/wiki/Troubleshooting, was so helpful. Especially, "adb kill-server; adb start-server" commands were the best. Also, I made a rule to type "ruby -v" before I started something. As you know, rake tasks start working on CRuby. But, those won't complete tasks as you expect.

I'd like to add "uninstall" the app to the troubleshooting. You can uninstall the app on emulator as well as adb uninstall command. On the emulator, do the long-click on the icon you want to uninstall. Then, trash bin and the word "uninstall" appears. Dragging the icon on trash bin will delete the app. Or adb uninstall [package name of app] will delete the app. For example, my app's package name is com.servletgarden.ruboto.cranberry, so "adb uninstall com.servletgarden.ruboto.cranberry" deleted my app from emulator. In case you forget the package name, look at the path to XXXActivity.java file. That path corresponds to package layer.



How I made this app

In the end, I'm going to add how I made this app and how to start it. This app won't work, but for myself, to try this app in future again, I'll leave this memo.

1. install ruboto-core gem
  $ ruby -v    (double check I'm on JRuby)
$ gem install ruboto-core

2. set path to android tools
  $ cd path/to/android-sdk-mac_x86
$ PATH=`pwd`/tools:`pwd`/platform-tools:$PATH

3. create emulator image
  $ android -s create avd -f -n cranberry-11 -t android-11
This possibly creates it. Actually, I created my virtual image using Eclipse's ADT. It's way easy. Prior to using android command, I installed platforms. I also used Eclipse's ADT for that.


4. create ruboto app
  ruboto gen app --package com.servletgarden.ruboto.cranberry --target android-11
This generated cranberry directory and whole stuff under that.


5. add emulator task to Rakefile
  $ cd cranberry
$ [edit Rakefile]
(line 42-45 of https://github.com/yokolet/cranberry/blob/master/Rakefile)
Since my virutal image name is cranberry-11 (step 3) "-avd cranberry-11" is there. If the app is small, you don't need -partition-size option.


6. install nokogiri gem
  $ mkdir -p assets/vendor/gems/1.8
$ gem install --install-dir assets/vendor/gems/1.8 nokogiri -v 1.5.0.beta.4
$ rm -rf assets/vendor/gems/1.8/cache
$ rm -rf assets/vendor/gems/1.8/doc
$ pushd assets/vendor/gems/1.8/gems/nokogiri-1.5.0.beta.4-java
$ rm -rf ext test
$ popd

7. add config.rb file, one line in assets/scripts/cranberry_activity.rb and edit Script.java
line 2 of https://github.com/yokolet/cranberry/blob/master/assets/scripts/cranberry_activity.rb
line 186-188 of https://github.com/yokolet/cranberry/blob/master/src/org/ruboto/Script.java


8. start emulator
  $ rake emulator
The emulator took many minutes to boot up on my MacBook. Occasionally, it showed up without dark blue hexagons. In such case, emulator didn't work correctly. I tried a couple of times "adb kill-server" and "adb start-server." When that attempt didn't work, I shut the emulator down and did "adb kill-server," then restarted the emulator.


9. start log monitor
  $ adb logcat

This prints out verbose infos, errors, and others. It is a bit noisy, but a great help to figure out what's going on.


10. install app
  $ rake install
Be patient.


10. click Cranberry ruboto icon
Be patient again. JRuby needs long time to activate.
Ruboto default app, Figure 4 of https://www.ibm.com/developerworks/web/library/wa-ruby/ will show up.



11. edit ruby files and do "rake update_scripts"
Then, back to Apps view and clock ruboto icon. Updated version should work, or troubleshooting time starts.


Whew...!

Thursday, April 14, 2011

Nokogiri on Google App Engine

Nokogiri 1.5.0 is on its way right now. Sure, it should be soonish. This version is also the first release of pure Java Nokogiri. We call it *pure Java*, but the name might not express itself precisely. Since it is written half Ruby and half Java, so *pure JRuby* (pragdave called so) would be the best name. This pure JRuby version implements methods, which are implemented in C, using xerces, nekoHTML, jing and a couple more Java Tools, while CRuby version uses libxml and libxslt. When people use Nokogiri 1.5.0 on JRuby, they use pure Java version.
What's the beauty of pure Java Nokogiri? It works smoothly on various platforms if Java runs on them. On OS X, Linux, Windows, and even Google App Engine, Nokogiri starts working painlessly. Really frequently asked questions for Nokogiri are "I can't install Nokogiri," or "Nokogiri doesn't work." Definitely, pure Java Nokogiri doesn't have these problems.


To see pure Java Nokogiri works fine, I gave it a try on Google App Engine (GAE). As you know, GAE supports python or Java only. Using libxml is out of scope. In short, pure Java Nokogiri just worked. Easy. (Unexpectedly, I struggled to get GAE work, so I'll write how I made it.) Although I don't have many to write about, I'm going to note what I did for people who don't know they can use Nokogiri on GAE.


First, I installed gems following the instruction, https://gist.github.com/825451. The instruction says, "Do not use rvm," but, I used rvm. Using rvm is not the matter. Rubygems' version is the matter. After I installed Ruby 1.8.7 using rvm, I downgraded rubygems to 1.3.7. Don't forget, google-appengine gem needs version 1.3.7 (or before) of rubygems. Otherwise, bundler08 will fail to install gem command *bundle*. This will end up in raising an error when appengine gem tries to install gems in .gems/bundler_gems/jruby/1.8/gems directory. Make sure *bundle* is listed in there when you type "gem help commands." See http://groups.google.com/group/appengine-jruby/browse_thread/thread/2db62b1a51896098 for a detail.

You do need to have CRuby but don't need to install JRuby. Appengine gem will install jruby-jar gem when it is needed. The gem, jruby-jar, has JRuby's stdlib in a jar archive. JRuby gets stared using this jar archive. So, google-appengine gem mostly works on CRuby and uses jruby-jar gem when JRuby is needed. Therefore, all gems should be installed on CRuby. Below is what I did.

rvm 1.8.7
sudo gem install google-appengine (Since I installed rvm to /usr/local, I need *sudo*)
sudo gem install rails -v 2.3.11
sudo gem install rails_dm_datastore
sudo gem install activerecord-nulldb-adapter
mkdir rails_app; cd rails_app
curl -O http://appengine-jruby.googlecode.com/hg/demos/rails2/rails2311_appengine.rb
ruby rails2311_appengine.rb

Then, rails app is ready to run. To start app on a development server,

./script/server.sh

This should start Jetty and rails app on that.

However, I was among unlucky people. I got Segmentation fault because my Java was Java SE 6 Update 4 for Mac OS X. Googling, I followed "Comment 39" of http://code.google.com/p/googleappengine/issues/detail?id=4712. I didn't want to downgrade JDK, but there seemed no better choice. Anyways, rails app successfully worked on update 3.


Next, I added Nokogiri in Gemfile. Currently 1.5.0.beta.4 is the latest.

gem 'nokogiri', '1.5.0.beta.4'

One more. The latest version of jruby-jar gem is 1.6.1, but, sadly, the jar archive in the gem is too big to upload. JRuby 1.6.1 grew bigger. As far as I remember, 1.6.0 is also too big to upload. Again, downgrade came in. I used version 1.5.6, and my Gemfile became as in below:

# Critical default settings:
disable_system_gems
disable_rubygems
bundle_path '.gems/bundler_gems'

# List gems to bundle here:
gem 'rails_dm_datastore'
gem 'jruby-jars', '1.5.6'
gem 'jruby-openssl'
gem 'jruby-rack', '1.0.5'
gem 'rails', '2.3.11'
gem 'nokogiri', '1.5.0.beta.4'



OK, my platform has been ready. Let's create a simple Nokogiri sample. In this sample, I got the rss feed from cnn.com (http://rss.cnn.com/rss/cnn_topstories.rss), parsed it using Nokogiri, and displayed news list. Since this is just a simple sample of Nokogiri, I generated a controller only.

./script/generate controller newsfeeds index

The rss I used was like https://gist.github.com/921058. From this XML document, I collected item elements using xpath. Then, I extracted pubDate, title, link, and description children elements of item also using xpath.

# newsfeeds_controller.rb
require 'nokogiri'
require 'open-uri'

class Entry
attr_reader :title, :url, :description, :pubdate
def initialize(title, url, description, pubdate)
@title = title
@url = url
@description = description
@pubdate = pubdate
end
end

class NewsfeedsController < ApplicationController
def index
doc = Nokogiri::XML(open("http://rss.cnn.com/rss/cnn_topstories.rss"))
items = doc.xpath("//item")
@entries = []
items.each do |item|
title = item.xpath("title").text
url = item.xpath("link").text
description = item.xpath("description").text
pubdate = item.xpath("pubDate").text
@entries << Entry.new(title, url, description, pubdate)
end
end
end

# newsfeeds/index.html.erb
<h1>Newsfeeds#index</h1>
<% @entries.each do |entry| %>
<dl>
<dt><%= entry.pubdate %></dt>
<dt><b><%= entry.title %></b> [<%= link_to("Read", entry.url) %>]</dt>
<dt><%= entry.description %></dt>
</dl>
<% end %>

When I restarted the server./script/server.h and requested http://localhost:8080/newsfeeds/, I could see news list something like this.



The last thing I did was uploading. I set my application id on the line "application:" in WEB-INF/app.yaml, then uploaded it by ./script/publish.sh. Now my Nokogiri sample is working at http://4.latest.servletgarden-in-red.appspot.com/newsfeeds/.


In the end, I'm going to add a link to the blog talked about Nokogiri on Google App Engine. This would be helpful, too.

- Google App Engine, JRuby, Sinatra and some fun!


So far, pure Java Nokogiri worked just fine on Google App Engine. Give it a try!