14

This is quite hard problem to describe. I have a koajs app with a function which is created in multiple instances (10-1000 range) every 2 minutes. this scheduled job created on app startup. I use koajs because i need a few simple api endpoints for this app. It is running well for first 3-5 hours and then the count of created instances starts to decrease and some of the log output disappears.

Here is the minimal sample based on actual code:

server.ts

const bootstrap = async () => {
    process.setMaxListeners(0); //(node:7310) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 uncaughtException listeners added to [process]. Use emitter.setMaxListeners() to increase limit
                                //appears on app startup (however seems like this setMaxListeners(0) doesnt affect anything since the warning persist)
    const app = new Koa();
    app.use(async ctx => {
        ctx.body = "Welcome to my Server!";
    });
    app.listen(port);
    new Main().run();
};

bootstrap(); 

main.ts (tried: cron npm package, node-scheduler, setInterval, recursive setTimeout) to run the scheduledJobWrapper.

    isStarting: boolean = false;

    async run() {
        logger.info(`running the app, every 2 minutes`);

        //let that = this;
        // new CronJob(`*/2 * * * *`, function () {
        //     that.scheduledJobWrapper();
        // }, null, true, 'America/Los_Angeles');

        const interval = 2 * 60 * 1000;
        setInterval(() => {
            this.scheduledJobWrapper();
        }, interval);
    }

    async scheduledJobWrapper() {
        logger.info("here scheduledJobWrapper");

        let args = {};
        //some irrelevant logic to set the arguments
        await this.scheduledJob(args);
    }

    async scheduledJob(args) {
        try {
            logger.info("starting");
            if (!this.isStarting) {
                this.isStarting = true;

                const runningCount = Executor.tasks.length; //Executor.tasks is a singleton containing some info about tasks. details are irrelevant. the point is it contains the active tasks.
                const tasksLimit = 100;

                if (runningCount < tasksLimit) {
                    for await (const i of Array(tasksLimit - runningCount).keys()) {
                        if (Executor.tasks.length > 20)
                            await global.sleep(5 * 1000);

                        this.startWrapper(args); //calling main task here
                    }
                }
                this.isStarting = false;

                logger.info(`Started: ${Executor.tasks.length - runningCount}`);
            }
        } catch (e) {
            logger.error("Error running scheduled job: " + e.toString());
        }
    }

In this example the problem manifests as following: All work as expected first 3-5 hours, later for each time the scheduled function called:

  1. logger.info("here scheduledJobWrapper"); does now show any output.
  2. logger.info("starting"); not in the output
  3. this.startWrapper does run and the code inside it is being executed.

Despite that the code inside of this.startWrapper is still running, the count of newly created jobs is slowly decreasing. Hardware (RAM/CPU) is not getting any significant load (CPU under 10%, RAM under 20%)

Any clue on possible reason?

nodejs: 12.6.0

Thanks!

UPDATE it seems like that with the usage of setInterval the app is running OK for a longer time (6-24 hours), but after that the problem still starts.

user1935987
  • 3,136
  • 9
  • 56
  • 108
  • 2
    Probably not the issue, but I'd recommend not to use `arguments` as a variable name cause the identifier is pre-defined in all JS functions. [more info](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/arguments) – Fabian Lauer Jul 19 '19 at 11:15
  • it's not the actual variable name, just for example – user1935987 Jul 19 '19 at 13:00
  • Tough to say what could be wrong. Perhaps you can can use something like NSolid and see what is going on? https://dzone.com/articles/nodejs-performance-monitoring-part-3-debugging-the – Tarun Lalwani Jul 19 '19 at 14:06
  • trying to monitor it now with `NSolid` for few fours - all seems right. High event loop idle, low lag, stable memory & cpu usage. the only suspicious this is infinitely growing `gc count` & `loop count` metrics. however i guess that's a total count. – user1935987 Jul 20 '19 at 11:04
  • 5
    I suggest to put a `logger.info` statement before `await global.sleep(5 * 1000);`. Use it to print the value of `Executor.tasks.length`. – Roland Weber Jul 20 '19 at 13:00
  • installed pm2 - actually found out that logs do not disappear, they're just not show through the logs koajs endpoint. i can see those missing lines in pm2 nodejs logs monitoring. also there is certainly a memory leak, but for some reason only `pm2` shows it. I guess thats the problem. – user1935987 Jul 21 '19 at 08:57
  • `setInterval` used to have a bug: https://stackoverflow.com/questions/51835331/node-js-setinterval-stops-executing-after-25-days. This is almost certainly not this bug (this one only appeared after 25 days), but you may still profit from upgrading your node version. – Sandro Jul 23 '19 at 15:16
  • yeah, using `12.6.0` node, should've mentioned – user1935987 Jul 24 '19 at 10:22
  • 1
    I think it's a leaky async/await logic problem, not directly nodejs limits. Unfortunately it's broken code and hard to tell. A working example would be better, since the async/await has problems as-is. 1) Why use `for await` to iterate a simple array? 2) You read three potentially different values of `Executor.tasks.length` depending on when the tasks array is modified relative to checks, so 3) Why assume Executor.tasks management isn't part of the problem? You add 5s delay for over 20 tasks with 100 max, which guarantees batches exceeding the 2m setInterval interval, making it reentrant. – rob3c Jul 25 '19 at 09:24
  • `for await` to start the tasks asynchronously -> faster. regarding it being reentrant - yes, it is. but there is the `isStarting` flag which prevents it. – user1935987 Jul 27 '19 at 07:29
  • 1
    That's not how `for await` works. – rob3c Jul 29 '19 at 02:43
  • oh, just in case - i did try with the normal `for` cycle, it doesn't make any difference. surely not related to `for await` – user1935987 May 29 '20 at 12:41

2 Answers2

1

The issue is with setInterval function. It gets slow down with the time. It has wierd behavior too. You can create custom setInterval using setTimeout or use third-party module and give try.

Sample setInterval Implementation.

const intervals = new Map();
function setInterval(fn, time, context, ...args) {
  const id = new Date().getTime() + "" + Math.floor(Math.random() * 10000);
  intervals.set(
    id,
    setTimeout(function next() {
      intervals.set(id, setTimeout(next, time));
      fn.apply(context, args);
    }, time)
  );
  return id;
}
function clearInterval(id) {
  clearTimeout(intervals.get(id));
}

setInterval(console.log, 100, console, "hi");

You can also enhance, by adding delta time loss in next setTimeout. Meaning if time loss, run next setTimeout earlier.

xdeepakv
  • 7,835
  • 2
  • 22
  • 32
  • I'm currently doing something similar - own `setInterval` implementation. the Issue persists. the only workaround is still using the `pm2 --max-memory-restart` - its restarts once in about 12 hours. – user1935987 Apr 04 '20 at 21:44
  • if memory issue than it is the solution. You didn't highlighted memory issue!! – xdeepakv Apr 04 '20 at 21:46
0

First of all, It will be better to move instance of Main() in listen scope:

app.listen(port, () => {
   new Main().run();
});

I don't know how good idea is to run setInterval function in the backend side. It's better to extract this logic and move it in cron job.

Are we sure that the machine can run 100 tasks? Please count the tasks by order and see when the problem starts. Probably you can not schedule 100 tasks and exists one limit somewhere

xargr
  • 2,788
  • 1
  • 19
  • 28
  • machine can run even 1000 of such tasks. all is running well, but the memory leak exists. yeah unforunately can't run it in cron job. – user1935987 Jul 27 '19 at 07:24
  • Please, @user1935987 realize that this code can't run anymore. You have to set a cron job to call one function every 2 minutes. After more tasks schedule at some point your app crash for that. – xargr Jul 27 '19 at 13:55
  • 1
    it certainly can run and is running, but there is a memory leak somewhere. for now just restarting the app by `pm2` once memory threshold is reached. too much complex logic involved - will have to rewrite really a lot to run it by cron, not a solution in this particular case. Finding memory leak is the solution. – user1935987 Jul 29 '19 at 12:54
  • can the memory leak come from setInterval function which run in the server? probably getting over somehow limit of cron jobs right? – xargr Jul 29 '19 at 13:04