Skip to content

Conversation

@Hannibal404
Copy link
Contributor

This commit expands upon the existing InfiniBand PMDA to collect more metrics to enhance diagnostic and monitoring capabilities.

Authored-by: Mohith Kumar Thummaluru mohith.k.kumar.thummaluru@oracle.com

Authored-by: Mohith Kumar Thummaluru <mohith.k.kumar.thummaluru@oracle.com>

Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
@Hannibal404 Hannibal404 changed the title Add collection of more metrics to InfiniBand PMDA PCP: PMDA: Add collection of more metrics to InfiniBand PMDA Aug 25, 2025
@natoscott
Copy link
Member

@Hannibal404 @mohith-kumar-thummaluru hi! Thanks for contributing these changes. It looks like the existing Infiniband code has no tests :( ... would you be able to add some along with these changes?

Looking at the code for the first time just now, one possible approach would be to create "dummy" IB libraries that provide "fake" data to the PMDA allowing all the code paths to be tested on any hardware (esp. for CI etc, without Infiniband, which is relatively exotic). A second, simpler test could also be written that tests for the presence of the required kernel and hardware feature, and does a basic agent Install/Remove cycle.

@natoscott
Copy link
Member

natoscott commented Aug 25, 2025

Looks like the use of <infinibands/verbs.h> is problematic too - not present on some distros? (this is the reason for the CI failures here)

nathans@debian:~/git/pcp$ ls /usr/include/infiniband/
mad.h      umad_cm.h  umad_sa.h      umad_sm.h   umad_types.h
mad_osd.h  umad.h     umad_sa_mcm.h  umad_str.h
nathans@debian:~/git/pcp$ cat /etc/debian_version 
12.11
nathans@debian:~/git/pcp$ 

I expect configure.ac will need some additional macro(s) in the pmda_infiniband section that can toggle aspects of the build.

CFILES = ib.c pmda.c
HFILES = ibpmda.h
LLDLIBS = $(IB_LIBS) $(PCP_LIBS) -lpcp_pmda -lpcp
LLDLIBS = $(IB_LIBS) $(PCP_LIBS) -lpcp_pmda -lpcp -libverbs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hannibal404 this library should arrive via the IB_LIBS macro (which originates from the configure.ac file), and not hard-coded like this.


#include <pcp/pmapi.h>
#include <pcp/libpcp.h>
#include "libpcp.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm? this should not be necessary - how are you building this?

/* infiniband.port.in.ucast_packets */
{ (void *)IBPMDA_RCV_UPKTS,
{PMDA_PMID(1,METRIC_ib_port_in_upkts),
PM_TYPE_64, IB_PORT_INDOM, PM_SEM_DISCRETE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics and units look incorrect for this and a number of other metrics - this one should be PM_SEM_COUNTER and the units should be more like those used with the network interface metrics, e.g.

$ pminfo -d network.interface.in.packets 

network.interface.in.packets
    Data Type: 64-bit unsigned int  InDom: 60.3 0xf000003
    Semantics: counter  Units: count

natoscott added a commit that referenced this pull request Oct 29, 2025
Implement complete testing framework for the InfiniBand PMDA enabling
automated QA testing without physical hardware, following the NVIDIA
GPU testing pattern.

* Mock InfiniBand libraries (libibumad, libibverbs) with deterministic data
* DSO support for InfiniBand PMDA (dual-mode: daemon + DSO)
* Sysfs path injection via IB_STATSPATH environment variable
* QA test 1996 using mock libraries and sysfs tarball

Replaces #2301

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@natoscott
Copy link
Member

I've addressed all the issues and added some QA. I had to cherry-pick the commit from here to resolve the branch conflicts that had developed, but should be all good to go now.

@natoscott natoscott closed this Oct 30, 2025
@Hannibal404
Copy link
Contributor Author

Thanks! I hadn't gotten a chance to work on it.

@natoscott
Copy link
Member

@Hannibal404 no problem - thanks for starting the effort & turned out I needed it too. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants