Thursday, March 03, 2011

RedBrdige's Sharing Variables, How It Works

RedBridge's sharing variables feature is convenient to let objects back and forth between Java and Ruby. The feature makes things easy but might be hard to understand a bit. The idea of sharing variables itself is simple. On Java side, all variables to share are saved in an internal map. When Ruby code is parsed/evaluated, all variables in the map are injected to Ruby runtime. All variables (local vars are slightly different) used in Ruby code are eligible to be retrieved after the evaluation. When the variable is requested to be retrieved, the value is grabbed from Ruby runtime (lazy mode). Or, when the evaluation finishes, all variables used in Ruby code are retrieved and saved in the map on the Java side (non-lazy mode). However, it depends on choices of both local context type and local variable behavior. Also, it is Ruby's receiver and scope aware. Besides, there were bugs, which have made people confused. (I've fixed several bugs in this area in JRuby 1.6.0RC1/RC2/RC3) So, I'll write about how it should work for clarity.


1. Lazy or non-Lazy mode

This option works for retrieving variables from Ruby runtime and is independent from any local context type and local variable behavior. It specifies whether only requested variable is retrieved on demand or all variables are retrieved every time evaluation finishes. As you may realize easily, the former has better performance. Yes, this option has been added to improve performance. By default, lazy mode is on for embed core (ScriptingContainer), and off for JSR223 (I changed the default of JSR223 between RC2 and RC3. On RC3 and later, non-lazy is default for JSR223). JSR223 has javax.script.SimpleBindings/SimpleScriptContext for a variable holder, which makes it very hard to work on demand. For JSR223 users, it is important that JRubyEngine works in the same way as other ScriptEngines such as Rhino, BeanShell, etc. So, I took compatibility over performance. Still, JSR223 users can choose lazy mode, but they should know some trick makes available to get a variable value from Ruby runtime.

This figure illustrates how sharing variables of lazy mode works: The *BiVariableMap* in the figure is the internal map to save variables. The internal map is responsible for a type coercion from Java to Ruby and from Ruby to Java. Moreover, based on a given name, BiVariableMap chooses a right variable type such as a global, or instance variable and its logic to get from/set to Ruby runtime. As in the figure, only when a user want to get a variable, the variable is grabbed form Ruby runtime on demand. While the variable is returned to the user, it is also created in the BiVariableMap for a possible later use.



Next figure illustrates one of a non-lazy retrieving sequence on JSR223: This would be a typical way on JSR223. Users are allowed to instantiate JDK bundled classes javax.script.SimpleBindings/SimpleScriptContext. Unfortunatelly, RedBridge has no way to hook over these classes to make sharing variables work. Besides, there is no way to know what variable a user want to retrieve from Ruby runtime after the evaluation. So, when the evaluation finishes, JRubyEngine tries to get all variables/constants that look user defined ones. In some cases, BiVariableMap grows pretty fatty. But, when evaluation ends, all possible variables must exists in SimpleBindings/SimpleScriptContext. To implement JSR223 faithfully, JRubyEngine sacrificed performance and memory saving.




Well, how people can change lazy mode? I'll show just JSR223 example since ScriptingContainer users are happy with the default, lazy mode, and won't feel inconvenience.

For comparison, I'll put an example of default settings here:

// non-lazy mode; default
ScriptEngine engine = new ScriptEngineManager().getEngineByExtension("rb");
SimpleBindings bindings = new SimpleBindings();
engine.eval("$weather = 'freezing rain'; $temperature = '28F'", bindings);
System.out.println("It should be '28F': " + bindings.get("temperature"));

This code prints
"It should be '28F': 28F"

The following is a lazy setting example:

// lazy mode on
System.setProperty("org.jruby.embed.laziness", "true");
engine = new ScriptEngineManager().getEngineByExtension("rb");
bindings = new SimpleBindings();
engine.eval("$weather = 'snow'; $temperature = '17F'", bindings);
System.out.println("It should be null: " + bindings.get("temperature"));
System.out.println("It should be '17F': " + engine.get("temperature"));

Above prints:

It should be null: null
It should be '17F': 17F

So, when engine's get method is used, retrieving a variable on demand works. Or, more Ruby way would work. For example, Ruby can return more than one variables at the same time. Returned values are saved in an Array, which is converted to java.util.List:

List list = (List) engine.eval("$weather = 'sleet', $temperature = '32F'");
System.out.println("It should be 'sleet': " + list.get(0));
System.out.println("It should be '32F': " + list.get(1));

When this snippet gets run, it prints

It should be 'sleet': sleet
It should be '32F': 32F



2. Singleton Type

Before going forward, let's review local context types one by one. To make it clear, I wrote figures that illustrate the structures of each type in terms of sharing variables. The figures will help you to understand what are going on.


The first type is singleton. This is a default type for both ScriptingContainer and JRubyEngine. The singleton type has only one Ruby runtime on JVM, which is *singleton* as the name expresses. BiVariableMap ("Var Map" in the figure) is also only one on JVM. No matter how many instances of ScriptingContainer / JRubyEngine you create, there is only one set of the runtime and variable map. In this type, thread safety is users' responsibility. No synchronization in API is RedBridge's policy.


3. SingleThreaded Type

The second type is naive singlethread. This is the simplest type and good to test something simple. This singlethreaded model would be a typical one that other JSR223 engines adapt to. The singlethreaded type can have multiple sets of Ruby runtimes and BiVariableMaps on a single JVM. If you instantiate three ScriptingContainers / JRubyEngines, you'll have three sets of runtime and variable map on the JVM. Again, thread safety is users' responsibility.


4. Threadsafe Type

The third type is threadsafe. In this type, a set of runtime and variable map is a thread local value. Thus, each *thread* has its own set of runtime and variable map. This type allows us to isolate an internal state by creating a thread. Just one instance of ScriptingContainer / JRubyEngine creates multiple sets of runtime and variable map along with threads. Users don't need to worry about thread safety as long as the concerned threads are created in Java. The thread safety here doesn't mean Ruby threads.

Here's the example of threadsafe type that isolates the internal state. The code is here.

import java.util.Map;

import org.jruby.Ruby;
import org.jruby.embed.LocalContextScope;
import org.jruby.embed.LocalVariableBehavior;
import org.jruby.embed.ScriptingContainer;

public class TransientThreadsafe {

private TransientThreadsafe() {
ScriptingContainer container =
new ScriptingContainer(LocalContextScope.THREADSAFE, LocalVariableBehavior.TRANSIENT);
Runner runner1 = new Runner(container);
Runner runner2 = new Runner(container);
new Thread(runner1, "Runner-1").start();
new Thread(runner2, "Runner-2").start();
runner1.getVarMap().put("$tmp", "Atlanta");
runner2.getVarMap().put("$tmp", "Los Angeles");
}

public static void main(String[] args) {
new TransientThreadsafe();
}

class Runner implements Runnable {
ScriptingContainer container;
Map varMap = null;

Runner(ScriptingContainer container) {
this.container = container;
}

@Override
public void run() {
varMap = container.getProvider().getVarMap();
while (varMap == null || varMap.get("$tmp") == null) {
try {
Thread.currentThread().sleep(1000L);
} catch (InterruptedException e) {
// no-op
}
}
container.runScriptlet("puts \"" + Thread.currentThread().getName() + " ran in #{$tmp}\"");
}

Map getVarMap() {
while(varMap == null) {
try {
Thread.currentThread().sleep(1000L);
} catch (InterruptedException e) {
// no-op
}
}
return varMap;
}
}
}

From the output below, we can see two different sets of runtime and variable map are there:

Runner-2 ran in Los Angeles
Runner-1 ran in Atlanta


5. Concurrent Type

The last type is concurrent. This type is added in 1.6.0RC1 and mixture of singleton and threadsafe. Concurrent type has singleton runtime and thread local variable map. Probably, it is the most complicated type but works well in some cases. For example, gems are evaluated in Java Servlet's init() method, then, classes and methods of those gems can be used in doGet()/doPost()/etc methods. On a Servlet container, each HTTP request is on a thread, so each HTTP request can have an isolated state.


6. Transient Local Variable Behavior

OK, let's see local variable behavior types one by one. We have transient (default for ScriptingContainer), persistent, global (default for JSR223), and bsf. I'm not going to talk about bsf type. It is just for BSF engine, which is almost obsolete.

The first local variable behavior is transient. The transient type is natural to Ruby. When we assign value "hello" to a variable name "$message," it is $message = "hello" in Ruby. We can use Ruby's global, instance, local variables and constants to share between Java and Ruby.

When you use global variables to share, you need to care what local context type you are using. Because the global variables are global on a runtime, the variable becomes unique on the single runtime. When the runtime is singleton (singleton and concurrent types), a global variable, say $tmp, is shared globally. We have two more to care about. One is that embedding API doesn't see what type of variable is given. We can put a global variable with a receiver object, but the given receiver is ignored when the global variable gets pushed on to the runtime. Another is that the global variables persist in BiVariableMap unless they are deleted. As long as global variables are in BiVariableMap, they are re-injected to runtime. When the global variables are removed from BiVariableMap, nil is set as its value on Ruby side. (There's no way to delete global variables in JRuby)

How about instance variables? When you use instance variables, you need care about they are in top level or some object. If it is a top level instance variable, the variable will be unique on a single runtime. Because the "top self" object is the only one on the runtime, the instance variable of the top self object is the only one. When you put instance variables using embedding API without a receiver, they will be injected as top level variables. This behavior is the same on all four local context types. The persistence of instance variables on Java side is the same as global variables. Unless they are deleted from the map, they will keep being injected to the runtime.

Constants works exactly the same as instance variables, so I won't add anything about constants.

The behavior of transient local variables are remarkable. The local variables on Java side vanish from a variable map after each evaluation. Because they are the local variables, they should not survive over the evaluation. Suppose 'orange' gem is evaluated right after 'red' gem is evaluated. What if both RubyGems use a local variable name "tmp"? The 'red' gem might change the value of 'tmp' local variable. Ruby code never expects the value of 'tmp' is changed by another gem and given for evaluation. Thus, the local variables in the map are removed not to cause unexpcted results when evaluation finishes. If you want to use the same local variable again, put it to ScriptingContainer / JRubyEngine(or SimpleBindings/SimpleScriptContext) again. So far, sharing *local* variable is available only for a top level local variable.


7. Persistent Local Variable Behavior

Next local variable behavior is persistent. As the name shows, this type makes local variables persistent. The local variables survive over the evaluations like global and instance variables. This behavior was added mainly for ex-BSF users. BSF defines both local and global variables should persist on an engine. Ex-BSF users feel this behavior essential. This type might look useful, but users themselves need to avoid local variables collision.

Other than local variables, global variables, instance variables and constants work exactly the same as transient type.


8. Global Local Variable Behavior

The last local variable behavior is global. The name, global local variable, might sound weird, but it actually expresses the behavior. If we put the key-value pair, container.put("tmp", 100) in Java side, it will be $tmp = 100 in Ruby side. The variable looks like local or constants in Java, but everything is a global variable in Ruby. So, we can use only global variables and need to care what local context type is used. As I wrote in "6. Transient Local Variable Behavior," the global variables are shared globally on a single runtime.

This type is for JSR223. JSR223 reference implementation released from Sun was this type, so embedding API had the same local variable behavior for compatibility. Moreover, to fulfill JSR223 specification, values of local global type variables on Ruby side go to nil when the evaluation finishes but not on Java side.

This global local type is really weird in terms of Ruby programming. But, the design of Ruby language is very different from other JVM languages. Besides, not all JSR223 users are dedicated Ruby programmers. This type might be the answer for JSR223 to avoid unexpected results.


9. No Sharing Variables

In the end, you might not need sharing variables feature to work. You are probably a big fun of ScriptingContainer's callMethod() or JSR223's invokeMethod() / invokeFunction(). Every variable to get Ruby code work are given through method's arguments. In fact, this has better performance than repeating runScriptlet() or eval(). Embedding API does have the option to off the feature.

[ScriptingContainer]
container.setAttribute(AttributeName.SHARING_VARIABLES, false);

[JSR223]
System.setProperty("org.jruby.embed.sharing.variables", "false");




This is how RedBridge's sharing variables works. Various field of users have requested various kinds of features to embedding API. To cover those as much as possible, embedding API has become a bit complicated. If you understand how it works, you'll feel comfortable to use in the way that fits you the best. Have happy coding with embedding API.

3 comments:

N. Nelson said...

That's a lot to digest. Bookmarking this for later because it is almost 1 o'clock in the morning.

Joel McNary said...

I would go so far as to say that not only do JSR223 developers not need this, this features goes against the JSR itself. When we specify a variable to have ENGINE_SCOPE, it should not be present in other engines that we create. However, that is the functionality that I am now seeing is that variables that are explictly set to have ENGINE_SCOPE are bleeding over to new engines, which was quite unexpected behavior.

Not only that, the call

System.setProperty("org.jruby.embed.sharing.variables", "false");

does not work, as there does not appear to be any code to actually read this variable. (The only variables that were put to the attributes map where the 3 io streams:

private void initialize(RubyInstanceConfig config, LocalVariableBehavior behavior, boolean lazy) {
this.config = config;
this.behavior = behavior;
this.lazy = lazy;

attribute = new HashMap();
attribute.put(AttributeName.READER, new InputStreamReader(System.in));
attribute.put(AttributeName.WRITER, new PrintWriter(System.out, true));
attribute.put(AttributeName.ERROR_WRITER, new PrintWriter(System.err, true));
}

I was able to solve my problem by using the singlethread model, but it took some time to figure out what was going on.

yokolet said...

Local context models and variable behaviors won't be against to the JSR223. JSR223 doesn't mention this sort of stuff at all. Ruby is a sophisticated language and is way beyond the JSR223 assumed to define API. For example, some other languages don't have idea of global/instance/local variables and their scopes.

Besides, JRuby embedding API users use it in various environment. On Servlet Container, on Android, etc. Some people want to use Ruby -> Java -> Ruby -> Java and so on.

The idea described here has come from various requests to embedding API.

However, ENGINE_SCOPE variables won't be present other non-Ruby engines even though the model is a singleton runtime. The variables are only on *Ruby* runtime. Please read https://github.com/jruby/jruby/wiki/RedBridge. By the historical reason, the singleton is the default on JRuby's JSR223 engine. We discussed on jruby-users ML.

I fixed the bug you wrote. As you said sharing variable off feature didn't work on JSR223. Next release won't have the bug.

Thanks for using JRuby.