Search This Blog

Sunday, January 26, 2020

Using arrays with parallel

OpenAF is a mix of Javascript and Java, but "pure" javascript isn't "thread-safe" in the Java world. Nevertheless being able to use Java threads with Javascript is very useful starting with performance.

One common pitfall is not paying attention to what is "thread-safe" and what isn't. Meaning, what is "aware" that it can be running on a multi-thread environment where several threads might be competing for access to a resource and what isn't.

Let's see a pratical example of a common pitfall with arrays.

Example of what should NOT be done

In this example we will create a simple array and push new elements in parallel. So we would expect that it would have the same number of elements always:

var targetArray = [];
// making up a source array with 10 elements to push to targetArray
var sourceArray = repeat(9, '-').split(/-/);
parallel4Array(source, v => {
    targetArray.push(v);
    return v;
});

print(sourceArray.length);  // 10
print(targetArray.length);  // 10

Great, so for a source array and target array are equal. It works, right? Let's increase the "thread competition" to 10000 and compare again:

var targetArray = [];
var sourceArray = repeat(9999, '-').split(/-/);
parallel4Array(source, v => {
    targetArray.push(v);
    return v;
});

print(sourceArray.length);  // 10000
print(targetArray.length);  // 9961

There are no longer equal. And if you run it again and increase the number of elements on the source array the difference will increase.

This is what happens when you forget that a javascript array is not thread-safe (and it's actually a good thing because being thread-safe is usually slower when you are not using threads).

Examples of what should be done

Using sync

The first immediate way to solve this is to ensure that just one thread will access the targetArray at a time. You can do this in OpenAF with the sync function:

var targetArray = [];
var sourceArray = repeat(9999, '-').split(/-/);
parallel4Array(source, v => {
    sync(() => {
        targetArray.push(v);
    }, targetArray);
    return v;
});

print(sourceArray.length);  // 10000
print(targetArray.length);  // 10000

The sync function will use Java underneath (synchronize) to ensure that only one thread at a time accessing targetArray will execute the provided function.

So the problem is solved, right? Well if you measure the performance of using sync and not using sync you will notice that it can be up to twice as bad.

Why? Because when one thread is inserting a value into targetArray all other threads have to stop and wait. That will slow down everything and probably make it not so much as effective as running it sequentially (depending on the processing outside the sync function).

Using syncArray

On the OpenAF library ow.obj there is a wrapper for you to use a "thread-safe" java array: ow.obj.syncArray

ow.loadObj();
var targetArray = new ow.obj.syncArray();
var sourceArray = repeat(99999, '-').split(/-/);
parallel4Array(source, v => {
    targetArray.add(v);
    return v;
});

print(sourceArray.length);            // 100000
print(targetArray.toArray().length);  // 100000

The java array version is optimized to be faster in these conditions. So the bigger is the sourceArray the bigger the benefits with ow.obj.syncArray.

The ow.obj.syncArray has more methods that you can explore from the OpenAF's help including: addAll, clear, get, indexOf, length, remove and even getJavaObject that will let you iteract with the original java object directly.

Comparison

Changing the above examples to perform something "harder" for each array element like calculating the Math.sin of each value and then comparing the performance you would get something similar to:

Strategy Source size Target size Average time
Parallel 100000 <100000 2.11s
Sync 100000 100000 2.44s
ow.obj.syncArray 100000 100000 2.06s

But what's the time if it's done sequentially? In this simple example: a lot better (~1.1s). Keep in mind that you only gain a performance advantage when the time spent dealing with threads and concurrent access is lot lower relatively to the time spent processing each array element.

So, in conclusion, there is no right or wrong answer. You need to test to get the best for your case.

No comments:

Post a Comment

Using arrays with parallel

OpenAF is a mix of Javascript and Java, but "pure" javascript isn't "thread-safe" in the Java world. Nevertheless be...