Release Improved memory profiling, new features, bugfixes · plasma-umass/scalene

Overhauled memory attribution logic:

uses Python's custom memory management APIs to efficiently disambiguate native vs. Python memory allocations, supplanting the prior approach that employed periodic call stack sampling.
performs immediate lookup of the location in source code responsible for allocation/deallocation, reducing the "smearing" effect in attributions previously caused by delayed attribution.
computes average memory consumption (rather than total) for each line of code (using the novel technique of "one-shot" tracing); lines executed many times no longer appear to have consumed large amounts of memory.
no longer reports negative memory growth from output, caused by lines freeing more than allocating, which has been a source of confusion for some users.
this release also resolves a memory leak.

Overhauled internal signal handling:

uses signal actors, an approach based on actors that decouples signal handling logic from the main thread, avoiding the risk of races and deadlocks and simplifying logic

Bug fixes:

fixed missing handling of pynvml.NVMLError_NotSupported exception (issue #262);
fixed issue cleaning up after profiling multiprocessor and multithreaded programs;
fixed issue not accounting for elapsed time when zero frames were recorded (issue #269).

New features:

added JSON output option (--json);
added programmatic profile control (scalene_profiler.start() and scalene_profiler.stop()).

Miscellaneous:

Note: this release is for MacOS and Linux only.

Provide feedback