Search This Blog

Tuesday, November 12, 2019

oJob Check for stall

When running an oJob there might be situations that you want to ensure that the entire process won't enter into a stall (e.g. being stopped on a "dead-end" waiting for some service or lock or whatever).

In oJob there is actually a feature to allow you to ensure, no matter what, your oJob won't run pass a specific timeout or for a function to be executed to determine if the oJob is at a stall situation.

Killing after x seconds

The easiest configuration is ensuring that there is a general timeout for the entire oJob:

ojob:
  checkStall:
    # check for a stall every x seconds (default 60)
    everySeconds    : 1
    # kill the entire process after x seconds
    killAfterSeconds: 4

todo:
  - Test job

jobs:
  #---------------
  - name: Test job
    exec: |
      args.wait = _$(args.wait).default(5000);

      log("Waiting for " + args.wait + "ms...");
      sleep(args.wait, true);
      log("Done");

Executing this oJob you will get different results depending on the amount of time the "Test job" takes. It's configured to "kill it self" if it takes longer than 4 seconds and it will check for that every second (e.g. on real situations you should use the default of 60 seconds).

For 1,5 seconds:

$ ojob test.yaml wait=1500
>> [Test job] | STARTED | 2019-11-11T12:13:17.199Z------------------------
2019-11-11 12:13:17.230 | INFO | Waiting for 1500ms...
2019-11-11 12:13:18.736 | INFO | Done

<< [Test job] | Ended with SUCCESS | 2019-11-11T12:13:18.740Z ============

For 5 seconds:

$ ojob test.yaml wait=5000
>> [Test job] | STARTED | 2019-11-11T12:22:00.058Z -----------------------
2019-11-11 12:22:00.085 | INFO | Waiting for 5000ms...
oJob: Check stall over 4000
2019-11-11 12:22:03.878 | ERROR | oJob: Check stall over 4000

Killing depending on a function

If you have certain conditions that can be easily checked to determine if the oJob is stalled you can use a function:

ojob:
  checkStall:
    everySeconds    : 1
    checkFunc       : |
      print("checking for stall...");
      if (global.canDie) {
        print("should die.");
        return true;
      }

todo:
  - Init
  - Test job

jobs:
  #-----------
  - name: Init
    exec: |
      global.canDie = false;

  #---------------
  - name: Test job
    exec: |
      log("Waiting for 2500ms...");
      sleep(2500, true);

      log("Setting canDie to true...");
      global.canDie = true;

      log("Waiting for another 2500ms...");
      sleep(2500, true);

      log("Done");

In this case a global variable canDie is only set to true after the first 2,5 seconds of execution of the Test job job. As soon as the checkFunc is executed and confirms the conditions by returning a true value the oJob is immediatelly stopped.

$ ojob test2.yaml
checking for stall...
>> [Init] | STARTED | 2019-11-11T12:32:02.828Z -----------------------------
<< [Init] | Ended with SUCCESS | 2019-11-11T12:32:02.857Z ==================
>> [Test job] | STARTED | 2019-11-11T12:32:02.893Z -------------------------
2019-11-11 12:32:02.924 | INFO | Waiting for 2500ms...
checking for stall...
checking for stall...
2019-11-11 12:32:05.429 | INFO | Setting canDie to true...
2019-11-11 12:32:05.430 | INFO | Waiting for another 2500ms...
checking for stall...
should die.

You can see the several checkFunc executions by the output "checking for stall…" and once the global variable canDie was true all the oJob stopped it's execution.

Checking at the job level

All the previous options checked for stall for the entire oJob execution but you can specify the same at the job level using typeArgs.timeout and typeArgs.stopWhen that are available for all types of jobs in oJob.

Example with typeArgs.timeout

In this example the Test job job is set to timeout after 1,5 seconds:

todo:
  - Init
  - Test job
  - Done

jobs:
  #-----------
  - name: Init
    exec: |
      global.canDie = false;

  #-----------
  - name: Done
    exec: |
      log("Everything is done.");

  #-------------------
  - name    : Test job
    typeArgs:
      timeout: 1500
    exec    : |
      log("Waiting for 2500ms...");
      sleep(2500, true);

      log("Setting canDie to true...");
      global.canDie = true;

      log("Waiting for another 2500ms...");
      sleep(2500, true);

      log("Done");

Executing it the job will actually end in error after the specified timeout:

>> [Init] | STARTED | 2019-11-11T12:47:25.568Z ----------------------------
<< [Init] | Ended with SUCCESS | 2019-11-11T12:47:25.624Z =================
>> [Test job] | STARTED | 2019-11-11T12:47:25.662Z ------------------------
2019-11-11 12:47:25.684 | INFO | Waiting for 2500ms...

!! [Test job] | Ended in ERROR | 2019-11-11T12:47:27.197Z =================
- id: 8ebbf961-822d-3b95-ca09-8dfb335ab6cb
  error: Job exceeded timeout of 1500ms

===========================================================================
>> [Done] | STARTED | 2019-11-11T12:47:27.277Z ----------------------------
2019-11-11 12:47:27.297 | INFO | Everything is done.

<< [Done] | Ended with SUCCESS | 2019-11-11T12:47:27.300Z =================

Example with typeArgs.stopWhen

In this example the Test job job is set stop whenever the stopWhen function returns a true value:

todo:
  - Init
  - Test job
  - Done

jobs:
  #-----------
  - name: Init
    exec: |
      global.canDie = false;

  #-----------
  - name: Done
    exec: |
      log("Everything is done.");

  #-------------------
  - name    : Test job
    typeArgs:
      stopWhen: |
        if (global.canDie) {
           print("should die...");
           return true;
        }
    exec    : |
      log("Waiting for 2500ms...");
      sleep(2500, true);

      log("Setting canDie to true...");
      global.canDie = true;

      log("Waiting for another 2500ms...");
      sleep(2500, true);

      log("Done");

Executing the job will actually stop without any error if the stopWhen function returns the a true value. To end the job with an error simply throw an exception on the stopWhen function.

>> [Init] | STARTED | 2019-11-11T12:38:54.232Z -----------------------------
<< [Init] | Ended with SUCCESS | 2019-11-11T12:38:54.263Z ==================
>> [Test job] | STARTED | 2019-11-11T12:38:54.298Z -------------------------
2019-11-11 12:38:54.330 | INFO | Waiting for 2500ms...
2019-11-11 12:38:56.837 | INFO | Setting canDie to true...
should die...
2019-11-11 12:38:56.838 | INFO | Waiting for another 2500ms...

<< [Test job] | Ended with SUCCESS | 2019-11-11T12:38:56.857Z ==============
>> [Done] | STARTED | 2019-11-11T12:38:56.012Z =============================
2019-11-11 12:38:56.025 | INFO | Everything is done.

<< [Done] | Ended with SUCCESS | 2019-11-11T12:38:56.098Z ==================

Thursday, November 7, 2019

Handling oJob deps failure

When executing an oJob each job can depend on the successfull execution of other jobs. That means that if a depending job fails the current job execution will fail or stall (if it's an ojob.sequential = true).

Dependency timeout

If ojob.sequential != true any failed job dependency will keep the oJob waiting for a successfull execution of the dependent job. You can avoid that using a dependency timeout:

ojob:
  # timeout of 2500ms
  depsTimeout: 2500 

In this case whenever a job doesn't execute because another failed it will wait just the amount of specific ms for another successfull execution. Otherwise it will terminate with an error indicating the a dependency timeout has occurred.

Individual dependency

You can also execute code to decide what should be done if a dependent job fails when ojob.sequential != true:

todo:
  - Init
  - Test 1
  - Test 2

jobs:
  #-----------
  - name: Init
    exec: |
      // Initialize error flag
      global.inError = false;

  #-------------
  - name: Test 1
    exec: |
      print("Test 1");
      // Throw an error if args.error is defined
      if (args.error) throw("Problem with test 1");

  #-------------
  - name: Test 2
    deps:
      - name  : Init
      - name  : Test 1
        onFail: |
          // if the dependent job fails, 
          // change the global error flag and proceed
          global.inError = true;
          return true;
    exec: |
      if (!global.inError)
        print("Test 2");
      else
        printErr("Can not execute Test 2");

In this example there are three jobs:

  • Init - initializes a global inError flag.
  • Test 1 - generates an error if the error argument is defined during this oJob execution.
  • Test 2 - depends on the Init and Test 1 jobs. If Test 1 job fails it sets the global inError flag and proceeds with the execution that checks that same flag.

The onFail entry on the list of dependencies for job Test 2 is actually the code of a function that receives three parameters:

  • args - the current job arguments
  • job - the current job definition
  • id - the current job execution id

If this onFail functions returns true the job execution will proceed. If it returns false the job execution stalls as it does by default.

Wednesday, November 6, 2019

How to convert arrays to maps and vice-versa

When working with javascript maps of maps or arrays of maps you might sometimes think: "if only it was an array instead of an object" or "if only it was an object instead of an array".

Usually this happens when you know the perfect method/library but it would only work with arrays or objects.

In OpenAF there are two functions to try to make it easier to convert arrays to maps and vice-versa: ow.obj.fromArray2Obj and ow.obj.fromObj2Array.

Converting a map into an array

Let's take, for example, the map that it's returned by the getRemoteOPackDB():

{
    "OpenAF-Templates": {
        "version": "20190906",
        "files": [
            // ...
        ],
        "description": "..."
    },
    "OpenAFLambdaLayers": {
        "version": "20190809",
        "files": [
            //...
        ],
        "description": "..."
    },
    ...
}

You can quickly convert into an array:

> ow.loadObj();
> var ar = ow.obj.fromObj2Array(getRemoteOPackDB(), "name")
[
    {
        "name": "OpenAF-Templates",
        "version": "20190906",
        "files": [
            // ...
        ],
        "description": "..."
    },
    {
        "name": "OpenAFLambdaLayers",
        "version": "20190809",
        "files": [
            // ...
        ],
        "description": "..."
    }
]
> printTable(mapArray(ar, ["name", "version"]));
       name        |version
-------------------+--------
OpenAF-Templates   |20190906
OpenAFLambdaLayers |20190809
// ...

The second parameter for ow.obj.fromObj2Array is the attribute that the array of maps will include if you want to map each key into an entry (e.g. in this case name).

Converting an array into a map

Let's take the example of an array with the list of files:

> var files = io.listFiles("/my/folder").files;
> var obj = ow.obj.fromArray2Obj(files, "canonicalPath");
{
    "/my/folder/a.js": {
        "isDirectory": false,
        "isFile": true,
        "filename": "a.js",
        "filepath": "/my/folder/a.js",
        "lastModified": 1534542464558,
        "createTime": 1534541960321,
        "lastAccess": 1562153514605,
        "size": 16384,
        "permissions": "xrw"
    },
    "/my/folder/b.js": {
        // ...
    }
}

The second argument provided to the function ow.obj.fromArray2Obj will be each map's entry that will be used as key on the resulting map.

Optionally you can also add a third boolean argument to indicate if you want, or not, the entry remote from each map detail.

Using arrays with parallel

OpenAF is a mix of Javascript and Java, but "pure" javascript isn't "thread-safe" in the Java world. Nevertheless be...