-
-
Notifications
You must be signed in to change notification settings - Fork 251
PCP: PMDA: Add collection of more metrics to InfiniBand PMDA #2301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Authored-by: Mohith Kumar Thummaluru <mohith.k.kumar.thummaluru@oracle.com> Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
|
@Hannibal404 @mohith-kumar-thummaluru hi! Thanks for contributing these changes. It looks like the existing Infiniband code has no tests :( ... would you be able to add some along with these changes? Looking at the code for the first time just now, one possible approach would be to create "dummy" IB libraries that provide "fake" data to the PMDA allowing all the code paths to be tested on any hardware (esp. for CI etc, without Infiniband, which is relatively exotic). A second, simpler test could also be written that tests for the presence of the required kernel and hardware feature, and does a basic agent Install/Remove cycle. |
|
Looks like the use of <infinibands/verbs.h> is problematic too - not present on some distros? (this is the reason for the CI failures here) I expect configure.ac will need some additional macro(s) in the pmda_infiniband section that can toggle aspects of the build. |
| CFILES = ib.c pmda.c | ||
| HFILES = ibpmda.h | ||
| LLDLIBS = $(IB_LIBS) $(PCP_LIBS) -lpcp_pmda -lpcp | ||
| LLDLIBS = $(IB_LIBS) $(PCP_LIBS) -lpcp_pmda -lpcp -libverbs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Hannibal404 this library should arrive via the IB_LIBS macro (which originates from the configure.ac file), and not hard-coded like this.
|
|
||
| #include <pcp/pmapi.h> | ||
| #include <pcp/libpcp.h> | ||
| #include "libpcp.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm? this should not be necessary - how are you building this?
| /* infiniband.port.in.ucast_packets */ | ||
| { (void *)IBPMDA_RCV_UPKTS, | ||
| {PMDA_PMID(1,METRIC_ib_port_in_upkts), | ||
| PM_TYPE_64, IB_PORT_INDOM, PM_SEM_DISCRETE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics and units look incorrect for this and a number of other metrics - this one should be PM_SEM_COUNTER and the units should be more like those used with the network interface metrics, e.g.
$ pminfo -d network.interface.in.packets
network.interface.in.packets
Data Type: 64-bit unsigned int InDom: 60.3 0xf000003
Semantics: counter Units: count
Implement complete testing framework for the InfiniBand PMDA enabling automated QA testing without physical hardware, following the NVIDIA GPU testing pattern. * Mock InfiniBand libraries (libibumad, libibverbs) with deterministic data * DSO support for InfiniBand PMDA (dual-mode: daemon + DSO) * Sysfs path injection via IB_STATSPATH environment variable * QA test 1996 using mock libraries and sysfs tarball Replaces #2301 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
I've addressed all the issues and added some QA. I had to cherry-pick the commit from here to resolve the branch conflicts that had developed, but should be all good to go now. |
|
Thanks! I hadn't gotten a chance to work on it. |
|
@Hannibal404 no problem - thanks for starting the effort & turned out I needed it too. :) |
This commit expands upon the existing InfiniBand PMDA to collect more metrics to enhance diagnostic and monitoring capabilities.
Authored-by: Mohith Kumar Thummaluru mohith.k.kumar.thummaluru@oracle.com