Skip to content

Commit

Permalink
Memory monitor, and a bit of cleanup. (#180)
Browse files Browse the repository at this point in the history
This PR adds a new service, `MemoryMonitor`, to… well… monitor memory.
If memory usage goes beyond defined limits, the service will initiate
(hopefully) graceful shutdown. The idea is that one configures the
limits in the service to be lower than the actual `--max-old-space-size`
(or similar), so that shutdown happens well before Node will give up and
abruptly exit.

In addition, I took the opportunity here to tidy up the code of the
other builtin services, just a bit.
  • Loading branch information
danfuzz authored Apr 5, 2023
2 parents 37d0850 + f782993 commit e51e70f
Show file tree
Hide file tree
Showing 12 changed files with 365 additions and 59 deletions.
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,18 @@
Changelog
=========

### v0.5.9 -- 2023-04-05

Notable changes:

* During reload (e.g. `kill -HUP`), endpoint sockets (server sockets) are no
longer immediately closed. Instead, they're held open for several seconds, and
the reloaded configuration is given an opportunity to take them over. This
makes it possible for endpoints that use incoming FDs to actually be reloaded.
* New service `MemoryMonitor`, to induce graceful shutdown if memory usage goes
beyond defined limits, with an optional grace period to ignore transient
spikes.

### v0.5.8 -- 2023-03-29

Notable changes:
Expand Down
38 changes: 38 additions & 0 deletions doc/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,44 @@ the ones that are used by `save`:
Note that at least one of the `on*` bindings need to be provided for a `save` to
have any meaning.

### `MemoryMonitor`

A service which occasionally checks the system's memory usage, and will force a
(hopefully) clean shutdown if memory usage is too high, with an optional grace
period to allow momentary usage spikes. It accepts the following configuration
bindings:

* `checkSecs` — How often to check for memory usage being over the
defined limit, in seconds. Optional. Minimum `1` (which is frankly way too
often). Default `5 * 60` (once every five minutes).
* `gracePeriodSecs` — Once a memory limit has been reached, how long, in
seconds, it is allowed to remain at or beyond the maximum before this service
takes action. `0` (or `null`) to not have a grace period at all. Default `0`.
**Note:**: When in the middle of a grace period, the service will check
memory usage more often than `checkSecs` so as not to miss a significant dip.
* `maxHeapBytes` — How many bytes of heap is considered "over limit," or
`null` for no limit on this. The amount counted is `heapTotal + external` from
`process.memoryUsage()`. Defaults to `null`. **Note:** In order to catch
probably-unintentional misconfiguration, if a number, must be at least one
megabyte.
* `maxRssBytes` — How many bytes of RSS is considered "over limit," or
`null` for no limit on this. Defaults to `null`. **Note:** In order to catch
probably-unintentional misconfiguration, if non-`null`, must be at least one
megabyte.

```js
const services = [
{
name: 'memory',
class: 'MemoryMonitor',
checkSecs: 5 * 60,
gracePeriodSecs: 60,
maxHeapBytes: 100 * 1024 * 1024,
maxRssBytes: 150 * 1024 * 1024
}
];
```

### `ProcessIdFile`

A service which writes a simple text file containing the process ID (number) of
Expand Down
8 changes: 8 additions & 0 deletions etc/example-setup/config/config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,14 @@ const hosts = [

// Service definitions.
const services = [
{
name: 'memory',
class: 'MemoryMonitor',
checkSecs: 10 * 60,
gracePeriodSecs: 60,
maxHeapBytes: 100 * 1024 * 1024,
maxRssBytes: 150 * 1024 * 1024
},
{
name: 'process',
class: 'ProcessInfoFile',
Expand Down
2 changes: 2 additions & 0 deletions src/builtin-services/export/BuiltinServices.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

import { BaseService } from '@this/app-framework';

import { MemoryMonitor } from '#x/MemoryMonitor';
import { ProcessIdFile } from '#x/ProcessIdFile';
import { ProcessInfoFile } from '#x/ProcessInfoFile';
import { RateLimiter } from '#x/RateLimiter';
Expand All @@ -21,6 +22,7 @@ export class BuiltinServices {
*/
static getAll() {
return [
MemoryMonitor,
ProcessIdFile,
ProcessInfoFile,
RateLimiter,
Expand Down
244 changes: 244 additions & 0 deletions src/builtin-services/export/MemoryMonitor.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,244 @@
// Copyright 2022-2023 the Lactoserv Authors (Dan Bornstein et alia).
// SPDX-License-Identifier: Apache-2.0

import { memoryUsage } from 'node:process';
import { setTimeout } from 'node:timers/promises';

import { ServiceConfig } from '@this/app-config';
import { BaseService } from '@this/app-framework';
import { Threadlet } from '@this/async';
import { Moment } from '@this/data-values';
import { Host } from '@this/host';
import { MustBe } from '@this/typey';


/**
* Service which monitors the system's memory usage and can initiate shutdown
* before a memory problem becomes dire. Configuration object details:
*
* * `{?number} checkSecs` -- How often to check things, in seconds, or `null`
* to use the default frequency. Minimum `1`. Defaults to `60` (once per
* minute).
* * `{?number} gracePeriodSecs` -- Once a memory limit has been reached, how
* long it is allowed to remain at or beyond the maximum before this service
* takes action, or `null` not to have a grace period at all (equivalent to
* `0`). When in the middle of a grace period, the system checks more often
* than `checkSecs` so as not to miss a significant dip. Defaults to `null`.
* * `{?number} maxHeapBytes` -- How many bytes of heap is considered "over
* limit," or `null` for no limit on this. The amount counted is `heapTotal +
* external` from `process.memoryUsage()`. Defaults to `null`. **Note:** In
* order to catch probably-unintentional misconfiguration, if a number, must
* be at least one megabyte.
* The amount counted is `heapTotal + external` from `process.memoryUsage()`.
* * `{?number} maxRssBytes` -- How many bytes of RSS is considered "over
* limit," or `null` for no limit on this. Defaults to `null`. **Note:** In
* order to catch probably-unintentional misconfiguration, if non-`null`, must
* be at least one megabyte.
*/
export class MemoryMonitor extends BaseService {
/** @type {Threadlet} Threadlet which runs this service. */
#runner = new Threadlet(() => this.#run());

/**
* @type {?{ heap: number, rss: number, troubleAtMsec: ?number }} Last memory
* snapshot (including trouble indicator), if any.
*/
#lastSnapshot = null;

// Note: Default constructor is fine for this class.

/** @override */
async _impl_start(isReload_unused) {
await this.#runner.start();
}

/** @override */
async _impl_stop(willReload_unused) {
await this.#runner.stop();
}

/**
* Takes a memory snapshot, including figuring out if we're in an "over limit"
* situation or not.
*
* @returns {object} The current snapshot.
*/
#takeSnapshot() {
const rawUsage = memoryUsage();
const now = Moment.fromMsec(Date.now());

// Note: Per Node docs, `external` includes the `arrayBuffers` value in it.
const usage = {
heap: rawUsage.heapUsed + rawUsage.external,
rss: rawUsage.rss
};

this.logger?.usage(usage);

const snapshot = {
...usage,
at: now,
troubleAt: this.#lastSnapshot?.troubleAt ?? null,
actionAt: this.#lastSnapshot?.actionAt ?? null
};

const { maxHeapBytes, maxRssBytes } = this.config;

if ( (maxHeapBytes && (snapshot.heap >= maxHeapBytes))
|| (maxRssBytes && (snapshot.rss >= maxRssBytes))) {
if (!snapshot.troubleAt) {
// We just transitioned to an "over limit" situation.
const actionAt = now.addSecs(this.config.gracePeriodSecs);
snapshot.troubleAt = now;
snapshot.actionAt = actionAt;
this.logger?.overLimit({ actionAt });
}
} else {
if (snapshot.troubleAt) {
// We just transitioned back to a "within limit" situation.
snapshot.troubleAt = null;
snapshot.actionAt = null;
this.logger?.withinLimit();
}
}

this.#lastSnapshot = snapshot;
return snapshot;
}

/**
* Runs the service thread.
*/
async #run() {
const checkMsec = this.config.checkSecs * 1000;

while (!this.#runner.shouldStop()) {
const snapshot = this.#takeSnapshot();

if (snapshot.actionAt && (snapshot.actionAt.atSecs < snapshot.at.atSecs)) {
this.logger?.takingAction();
// No `await`, because then the shutdown handler would end up deadlocked
// with the stopping of this threadlet.
Host.exit(1);
break;
}

let timeoutMsec = checkMsec;
if (snapshot.actionAt) {
const msecUntilAction = snapshot.actionAt.subtract(snapshot.at).secs * 1000;
const msecUntilCheck = Math.min(
checkMsec,
Math.max(
msecUntilAction * MemoryMonitor.#TROUBLE_CHECK_FRACTION,
MemoryMonitor.#MIN_TROUBLE_CHECK_MSEC));
timeoutMsec = msecUntilCheck;
}

await this.#runner.raceWhenStopRequested([
setTimeout(timeoutMsec)
]);
}
}


//
// Static members
//

/**
* @type {number} Minimum amount of time in msec between checks, when dealing
* with an "over limit" situation.
*/
static #MIN_TROUBLE_CHECK_MSEC = 1000;

/**
* @type {number} Fraction of time between "now" and when action needs to
* happen, when the next check should take place in an "over limit" situation.
*/
static #TROUBLE_CHECK_FRACTION = 0.4;

/** @override */
static get CONFIG_CLASS() {
return this.#Config;
}

/**
* Configuration item subclass for this (outer) class.
*/
static #Config = class Config extends ServiceConfig {
/** @type {number} How often to check, in seconds. */
#checkSecs;

/** @type {number} Grace period before triggering an action, in seconds. */
#gracePeriodSecs;

/**
* @type {?number} Maximum allowed size of heap usage, in bytes, or `null`
* for no limit.
*/
#maxHeapBytes;

/**
* @type {?number} Maximum allowed size of RSS, in bytes, or `null` for no
* limit.
*/
#maxRssBytes;

/**
* Constructs an instance.
*
* @param {object} config Configuration object.
*/
constructor(config) {
super(config);

const {
checkSecs = null,
gracePeriodSecs = null,
maxHeapBytes = null,
maxRssBytes = null
} = config;

this.#checkSecs = (checkSecs === null)
? 5 * 60
: MustBe.number(checkSecs, { finite: true, minInclusive: 1 });
this.#gracePeriodSecs = (gracePeriodSecs === null)
? 0
: MustBe.number(gracePeriodSecs, { finite: true, minInclusive: 0 });
this.#maxHeapBytes = (maxHeapBytes === null)
? null
: MustBe.number(maxHeapBytes, { finite: true, minInclusive: 1024 * 1024 });
this.#maxRssBytes = (maxRssBytes === null)
? null
: MustBe.number(maxRssBytes, { finite: true, minInclusive: 1024 * 1024 });
}

/** @returns {number} How often to check, in seconds. */
get checkSecs() {
return this.#checkSecs;
}

/**
* @returns {number} Grace period before triggering an action, in seconds.
*/
get gracePeriodSecs() {
return this.#gracePeriodSecs;
}

/**
* @returns {?number} Maximum allowed size of heap usage, in bytes, or
* `null` for no limit.
*/
get maxHeapBytes() {
return this.#maxHeapBytes;
}

/**
* @returns {?number} Maximum allowed size of RSS, in bytes, or `null` for
* no limit.
*/
get maxRssBytes() {
return this.#maxRssBytes;
}
};
}
Loading

0 comments on commit e51e70f

Please sign in to comment.