Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process_collector: fill in most statistics on macOS #1600

Merged
merged 3 commits into from
Sep 4, 2024

Conversation

mharbison72
Copy link
Contributor

Unfortunately, the virtual memory, resident memory, and network stats will require access to undocumented C functions. I was warned off of cgo in IRC because it would then have to be enabled in a bunch of different projects that use this module, but I already was against it because that would break the ability to cross-compile. There is no interface to dlopen built into golang. The github.com/ebitengine/purego module looks promising (I can cross-compile and call these methods), but I'm currently getting unexpected results. I'll follow up with that separately if I can get it working, but hopefully this stuff is pretty uncontroversial.

Tested on macOS 10.14.6 (amd64), macOS 14.6.1 (amd64), and macOS 15.0 (arm64) by spawning /usr/bin/ulimit -a -S and /usr/sbin/lsof -c $my_process from the test exporter process, and ps -o lstart,vsize,rss,utime,stime,command from the shell, and comparing results with the exported metrics.

I can't find documentation for RLIMIT_AS on macOS (specifically if it's in bytes or pages). It's currently being reported back as RLIM_INFINITY, which seems reasonable, because I've come across reports that the value is ignored anyway[1]. The bash 3.2 code for the built-in ulimit divides the value reported by getrusage(2) by 1024 when printing, as it does for RLIMIT_DATA, which is documented as being bytes in getrusage(2). The help for ulimit indicates it prints both in kbytes, so it's reasonable to assume this is already in bytes.

[1] https://issues.chromium.org/issues/40581251#comment3

Unfortunately, the virtual memory, resident memory, and network stats will
require access to undocumented C functions.  I was warned off of cgo in IRC
because it would then have to be enabled in a bunch of different projects that
use this module, but I already was against it because that would break the
ability to cross-compile.  There is no interface to `dlopen` built into golang.
The `github.com/ebitengine/purego` module looks promising (I can cross-compile
and call these methods), but I'm currently getting unexpected results.  I'll
follow up with that separately if I can get it working, but hopefully this stuff
is pretty uncontroversial.

Tested on macOS 10.14.6 (amd64), macOS 14.6.1 (amd64), and macOS 15.0 (arm64)
by spawning `/usr/bin/ulimit -a -S` and `/usr/sbin/lsof -c $my_process` from
the test exporter process, and `ps -o lstart,vsize,rss,utime,stime,command` from
the shell, and comparing results with the exported metrics.

I can't find documentation for `RLIMIT_AS` on macOS (specifically if it's in
bytes or pages).  It's currently being reported back as `RLIM_INFINITY`, which
seems reasonable, because I've come across reports that the value is ignored
anyway[1].  The bash 3.2 code for the built-in `ulimit` divides the value
reported by `getrusage(2)` by 1024 when printing, as it does for `RLIMIT_DATA`,
which is documented as being bytes in `getrusage(2)`.  The help for `ulimit`
indicates it prints both in kbytes, so it's reasonable to assume this is already
in bytes.

[1] https://issues.chromium.org/issues/40581251#comment3

Signed-off-by: Matt Harbison <mharbison72@gmail.com>
@mharbison72
Copy link
Contributor Author

For context about the memory stats future work, it looks like ps does this:

https://github.com/apple-oss-distributions/adv_cmds/blob/8744084ea0ff41ca4bb96b0f9c22407d0e48e9b7/ps/tasks.c#L109

No idea if the "try to determine if this task has the split libraries mapped in..." extra processing around line 130 is relevant to Go processes. Most blog and SO references I've seen for getting virtual memory counts from task_info(TASK_BASIC_INFO) leave out this extra processing.

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, minor nit.

prometheus/process_collector_darwin.go Outdated Show resolved Hide resolved
Co-authored-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Matt Harbison <57785103+mharbison72@users.noreply.github.com>
Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will leave the final review up to the maintainers.

@mharbison72
Copy link
Contributor Author

I guess I forgot to tag a maintainer.

@bwplotka

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing starting point, thanks! Tiny suggestions, but not blocking, up to you. LGTM

prometheus/process_collector_darwin.go Show resolved Hide resolved
prometheus/process_collector_darwin.go Show resolved Hide resolved
@bwplotka bwplotka merged commit a5e1340 into prometheus:main Sep 4, 2024
10 checks passed
@bwplotka
Copy link
Member

bwplotka commented Sep 4, 2024

Thanks!

mharbison72 added a commit to mharbison72/client_golang that referenced this pull request Sep 17, 2024
prometheus#1600)

Unfortunately, these values aren't available from getrusage(2), or any other
builtin Go API.  Using cgo is one alternative.  It's possible to conditionalize
everything such that cgo can remain disabled on non-Darwin platforms, or even
when cross-compiling Darwin executables on a non-Darwin platform (and stub in
code that causes the metrics to not be exported).  `CGO_ENABLED=1` is set by
default on macOS, but unfortunately is off for the non-host architecture, even
when gcc supports cross-compiling.  (e.g. building with GOARCH=amd on an M2 mac
skipped the cgo code.)  I think that's too subtle of a distinction to rely on
cgo.

There's no builtin equivalent of `syscall.NewLazyDLL()` and `.NewProc()` on
macOS that Go provides for Windows, so we're stuck with a 3rd party dependency.
But it seems stable, maintained, ang getting a fair amount of usage.  I'm
avoiding their struct deserialization because these native structs are packed
differently than the equivalent Go structs, which was causing bad values to be
returned.

The code is heavy with inline comments, and I tried keeping the type names the
same as the C code to make it easier to search for them.  I'm not sure that we
need to do the `mach_vm_region()` call to adjust the `task_info()` values,
because I've never seen that conditional evaluate to True on either amd64,
arm64, or when amd64 is run under Rosetta.  But this is what `ps(1)` does, and
I think it's reasonable to try to match that unless somebody knows it's dead
code.

Signed-off-by: Matt Harbison <mharbison72@gmail.com>
mharbison72 added a commit to mharbison72/client_golang that referenced this pull request Sep 24, 2024
prometheus#1600)

Unfortunately, these values aren't available from getrusage(2), or any other
builtin Go API.  Go itself doesn't provide a mechanism (like on Windows) to call
into system libraries.  Using a 3rd party package[1] to dynamically call system
libraries was proposed and rejected, to avoid adding to the number of
dependencies.  That leaves using cgo, which is used here when available.  When
not available (either because of cross compiling or explicitly disabling it), a
stub function is linked instead, and the metrics are not exported.  That way,
cross compiling of other platforms is unaffected (and can also still be done
with Darwin too, but at the cost of not exporting these metrics).

Note that building an amd64 image on an arm64 mac or vice-versa is cross
compiling, and will use the stub method by default.  This can be avoided by
setting `CGO_ENABLED=1` in the environment to force the use of cgo for both
architectures.

I'm unsure of the usefulness of the potential adjustment made to the virtual
memory value after calling `mach_vm_region()`.  I've not seen that code get run
with a native amd64 or arm64 image, or with an amd64 image running under
Rosetta.  But that's what the `ps(1)` command does, and I think we should report
what the system tools do.

When I was testing this on a beta of macOS 15 with Go 1.21.13 (the current
minimum support for this module), the amd64 image ran fine under Rosetta, but
the arm64 image immediately printed a message that it was killed, even prior to
the cgo call.  This seems to be a recurring issue on macOS[2][3], and passing
`-ldflags -s` to `go build` avoided the issue.  Go 1.23.1 worked out of the box,
without fiddling with linker flags, so I don't think this is an issue- Go 1.21
is simply too old to support macOS 15, but I thought it was worth noting.  I
supposed we could gate the cgo code with an additional build flag, if anyone is
concerned about this.

[1] https://github.com/ebitengine/purego
[2] golang/go#19841 (comment)
[3] golang/go#11887 (comment)
mharbison72 added a commit to mharbison72/client_golang that referenced this pull request Sep 24, 2024
prometheus#1600)

Unfortunately, these values aren't available from getrusage(2), or any other
builtin Go API.  Go itself doesn't provide a mechanism (like on Windows) to call
into system libraries.  Using a 3rd party package[1] to dynamically call system
libraries was proposed and rejected, to avoid adding to the number of
dependencies.  That leaves using cgo, which is used here when available.  When
not available (either because of cross compiling or explicitly disabling it), a
stub function is linked instead, and the metrics are not exported.  That way,
cross compiling of other platforms is unaffected (and can also still be done
with Darwin too, but at the cost of not exporting these metrics).

Note that building an amd64 image on an arm64 mac or vice-versa is cross
compiling, and will use the stub method by default.  This can be avoided by
setting `CGO_ENABLED=1` in the environment to force the use of cgo for both
architectures.

I'm unsure of the usefulness of the potential adjustment made to the virtual
memory value after calling `mach_vm_region()`.  I've not seen that code get run
with a native amd64 or arm64 image, or with an amd64 image running under
Rosetta.  But that's what the `ps(1)` command does, and I think we should report
what the system tools do.

When I was testing this on a beta of macOS 15 with Go 1.21.13 (the current
minimum support for this module), the amd64 image ran fine under Rosetta, but
the arm64 image immediately printed a message that it was killed, even prior to
the cgo call.  This seems to be a recurring issue on macOS[2][3], and passing
`-ldflags -s` to `go build` avoided the issue.  Go 1.23.1 worked out of the box,
without fiddling with linker flags, so I don't think this is an issue- Go 1.21
is simply too old to support macOS 15, but I thought it was worth noting.  I
supposed we could gate the cgo code with an additional build flag, if anyone is
concerned about this.

[1] https://github.com/ebitengine/purego
[2] golang/go#19841 (comment)
[3] golang/go#11887 (comment)

Signed-off-by: Matt Harbison <mharbison72@gmail.com>
mharbison72 added a commit to mharbison72/client_golang that referenced this pull request Sep 25, 2024
prometheus#1600)

Unfortunately, these values aren't available from getrusage(2), or any other
builtin Go API.  Go itself doesn't provide a mechanism (like on Windows) to call
into system libraries.  Using a 3rd party package[1] to dynamically call system
libraries was proposed and rejected, to avoid adding to the number of
dependencies.  That leaves using cgo, which is used here when available.  When
not available (either because of cross compiling or explicitly disabling it), a
stub function is linked instead, and the metrics are not exported.  That way,
cross compiling of other platforms is unaffected (and can also still be done
with Darwin too, but at the cost of not exporting these metrics).

Note that building an amd64 image on an arm64 mac or vice-versa is cross
compiling, and will use the stub method by default.  This can be avoided by
setting `CGO_ENABLED=1` in the environment to force the use of cgo for both
architectures.

I'm unsure of the usefulness of the potential adjustment made to the virtual
memory value after calling `mach_vm_region()`.  I've not seen that code get run
with a native amd64 or arm64 image, or with an amd64 image running under
Rosetta.  But that's what the `ps(1)` command does, and I think we should report
what the system tools do.

When I was testing this on a beta of macOS 15 with Go 1.21.13 (the current
minimum support for this module), the amd64 image ran fine under Rosetta, but
the arm64 image immediately printed a message that it was killed, even prior to
the cgo call.  This seems to be a recurring issue on macOS[2][3], and passing
`-ldflags -s` to `go build` avoided the issue.  Go 1.23.1 worked out of the box,
without fiddling with linker flags, so I don't think this is an issue- Go 1.21
is simply too old to support macOS 15, but I thought it was worth noting.  I
supposed we could gate the cgo code with an additional build flag, if anyone is
concerned about this.

[1] https://github.com/ebitengine/purego
[2] golang/go#19841 (comment)
[3] golang/go#11887 (comment)

Signed-off-by: Matt Harbison <mharbison72@gmail.com>
mharbison72 added a commit to mharbison72/client_golang that referenced this pull request Sep 25, 2024
prometheus#1600)

Unfortunately, these values aren't available from getrusage(2), or any other
builtin Go API.  Go itself doesn't provide a mechanism (like on Windows) to call
into system libraries.  Using a 3rd party package[1] to dynamically call system
libraries was proposed and rejected, to avoid adding to the number of
dependencies.  That leaves using cgo, which is used here when available.  When
not available (either because of cross compiling or explicitly disabling it), a
stub function is linked instead, and the metrics are not exported.  That way,
cross compiling of other platforms is unaffected (and can also still be done
with Darwin too, but at the cost of not exporting these metrics).

Note that building an amd64 image on an arm64 mac or vice-versa is cross
compiling, and will use the stub method by default.  This can be avoided by
setting `CGO_ENABLED=1` in the environment to force the use of cgo for both
architectures.

I'm unsure of the usefulness of the potential adjustment made to the virtual
memory value after calling `mach_vm_region()`.  I've not seen that code get run
with a native amd64 or arm64 image, or with an amd64 image running under
Rosetta.  But that's what the `ps(1)` command does, and I think we should report
what the system tools do.

When I was testing this on a beta of macOS 15 with Go 1.21.13 (the current
minimum support for this module), the amd64 image ran fine under Rosetta, but
the arm64 image immediately printed a message that it was killed, even prior to
the cgo call.  This seems to be a recurring issue on macOS[2][3], and passing
`-ldflags -s` to `go build` avoided the issue.  Go 1.23.1 worked out of the box,
without fiddling with linker flags, so I don't think this is an issue- Go 1.21
is simply too old to support macOS 15, but I thought it was worth noting.  I
supposed we could gate the cgo code with an additional build flag, if anyone is
concerned about this.

[1] https://github.com/ebitengine/purego
[2] golang/go#19841 (comment)
[3] golang/go#11887 (comment)

Signed-off-by: Matt Harbison <mharbison72@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants