Here're what I did to enable the unicode flag and to get correct outputs.
1. checkout jcodings from http://svn.codehaus.org/jruby/jcodings/ because joni needs it.
2. cd jcodings; mvn clean install
3. check out joni-1_0 from http://svn.codehaus.org/jruby/joni/branches/joni-1_0/. (needs exactly this version)
4. cd joni-1_0
5. edit src/org/joni/Config.java and set true to USE_UNICODE_PROPERTIES.
6. mvn clean package
7. cp target/joni.jar <somewhere>/jruby-1.1.4/build_lib/.
8. cd <somewhere>/jruby-1.1.4
9. ant clean jar
Then, I could build customized version of JRuby, which should be unicode regular expression compliant. When I tried this UTF-8 encoded Ruby script,
p 'abcアイウαβγ'.scan(/[a-z]/)
p "abcアイウαβγ".scan(/\p{Katakana}/u)
print "abcアイウαβγ".scan(/\p{Katakana}/u), "\n\n"
p "abcアイウαβγ".scan(/\p{^Greek}/u)
print "abcアイウαβγ".scan(/\p{^Greek}/u), "\n\n"
p "abcアイウαβγ".scan(/[\u0370-\u30FF]/u)
print "abcアイウαβγ".scan(/[\u0370-\u30FF]/u), "\n"
$KCODE="utf8"
p "abcアイウαβγ".scan(/\p{Greek}/)
it printed out:
["a", "b", "c"]
["\343\202\242", "\343\202\244", "\343\202\246"]
アイウ
["a", "b", "c", "\343\202\242", "\343\202\244", "\343\202\246"]
abcアイウ
["a", "b", "c"]
abc
["α", "β", "γ"]
Although unicode codepoint from Greek to Katakana didn't work, others were good. (Ruby 1.9 showed readable characters in both p and print, but JRuby's p didn't.)
Of course, I got an error "unicode_regex.rb:2: invalid character property name {Katakana}: /\p{Katakana}/u (RegexpError)" when I tried this script by regular JRuby 1.1.4.
Following lopex's comment, I wrote this Ruby script in EUC-JP encoding and ran it on regular JRuby 1.1.4.
p "abcアイウαβγ".scan(/\p{Katakana}/e)
print "abcアイウαβγ".scan(/\p{Katakana}/e),"\n"
print "abcアイウαβγ".scan(/\p{Greek}/e),"\n"
Naturally, the last line caused an error "unicode_regexp_eucjp.rb:6: invalid character property name {Greek}: /\p{Greek}/e (RegexpError)" whatever the encoding option of regular expression was. However, two lines from the top worked and outputed:
["\245\242", "\245\244", "\245\246"]
アイウ
JRuby already has the ability to handle unicode regular expression in a Ruby way but this feature is just turned off. Since unicode regular expression is useful for non ascii language speakers, I hope this feature will trun on in near future.