Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot build pcm-sensor-server for macOS #612

Closed
MatteoBax opened this issue Nov 25, 2023 · 31 comments · Fixed by #623
Closed

Cannot build pcm-sensor-server for macOS #612

MatteoBax opened this issue Nov 25, 2023 · 31 comments · Fixed by #623

Comments

@MatteoBax
Copy link
Contributor

Hi,
if I try to compile pcm-sensor-server by running the following commands inside the build folder:

cmake ..  && make -j8 pcm-sensor-server

i receive:

make: *** No rule to make target `pcm-sensor-server'.  Stop.

Is pcm-sensor-server supported for macOS?

@opcm
Copy link
Contributor

opcm commented Nov 26, 2023

no, it is not. Patches welcome..

@gogohaja
Copy link

I found that building pcm-sensor-server on Mac OS was excluded in src/CMakeLists.txt. I also want to build pcm-sensor-server on Mac.

@opcm
Copy link
Contributor

opcm commented Dec 5, 2023

@gogohaja @MatteoBax could you try this branch? https://github.com/intel/pcm/tree/opcm-patch-pcm-sensor-server-osx
If it works we can include it into the mainline

@opcm opcm linked a pull request Dec 5, 2023 that will close this issue
@MatteoBax
Copy link
Contributor Author

Infinite loop of:

WARNING: Core 0 IA32_PERFEVTSEL0_ADDR is not zeroed 18446744073709551615
Warning: PMU appears to be busy, do you want to reset it? (y/n)

@gogohaja
Copy link

gogohaja commented Dec 6, 2023

'Segmentation Fault' error occurs due to unknown reasons.

sudo ./pcm-sensor-server -r
password:

===== Processor Information =====
Hybrid processor: No
IBRS and IBPB support: Yes
STIBP Support: Yes
Specifications Arch Cap Support: Yes
Maximum CPUID level: 22
CPU Model Number: 158
Number of physical cores: 1
Number of logical cores: 12
Number of online logical cores: 12
Threads per physical core (logical cores): 8
Number of sockets: 4
Physical cores per socket: 0
Last level cache fragment per socket: 0
Core PMU (perfmon) version: 4
Number of core PMU typical (programmable) counters: 4
Typical (programmable) counter width: 48 bits
Number of core PMU fixed counters: 3
Fixed counter width: 48 bits
Nominal core frequency: 3700000000Hz
Enable IBRS in Kernel: No
Enable STIBP in Kernel: No
The processor is not susceptible to bad data cache loads.
The processor supports enhanced IBRS.
Package Thermal Specifications Power: 95W; Minimum package power: 0 watts; Package maximum power: 0W;

Info: 0 UBOX devices detected.
Socket 0: 0 PCU devices detected. 0 IIO device detected. 0 IRP unit detected. 0 CHA/CBO units detected. 0 MDF units detected. 0 CXL devices detected.
Socket 1: 0 PCU devices detected. 0 IIO device detected. 0 IRP unit detected. 0 CHA/CBO units detected. 0 MDF units detected. 0 CXL devices detected.
Socket 2: 0 PCU devices detected. 0 IIO device detected. 0 IRP unit detected. 0 CHA/CBO units detected. 0 MDF units detected. 0 CXL devices detected.
Socket 3: 0 PCU devices detected. 0 IIO device detected. 0 IRP unit detected. 0 CHA/CBO units detected. 0 MDF units detected. 0 CXL devices detected.

WARNING: Custom counter 0 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x800000070000000f
WARNING: Custom counter 1 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x800000070000000f
WARNING: Custom counter 2 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x800000070000000f
WARNING: Custom counter 3 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x800000070000000f
WARNING: Core 0 IA32_PERFEVTSEL0_ADDR is not set to 0. 4399313
    Zeroed PMU registers
Start a regular HTTP server at http://localhost:9738/
[1] 43000 Segmentation Error sudo ./pcm-sensor-server -r

@opcm
Copy link
Contributor

opcm commented Dec 6, 2023

Infinite loop of:

WARNING: Core 0 IA32_PERFEVTSEL0_ADDR is not zeroed 18446744073709551615
Warning: PMU appears to be busy, do you want to reset it? (y/n)

did you run into this issue (signing/sip): #608 (comment)

@opcm
Copy link
Contributor

opcm commented Dec 6, 2023

'Segmentation Fault' error occurs due to unknown reasons.

sudo ./pcm-sensor-server -r
password:

===== Processor Information =====
Hybrid processor: No
IBRS and IBPB support: Yes
STIBP Support: Yes
Specifications Arch Cap Support: Yes
Maximum CPUID level: 22
CPU Model Number: 158
Number of physical cores: 1
Number of logical cores: 12
Number of online logical cores: 12
Threads per physical core (logical cores): 8
Number of sockets: 4
Physical cores per socket: 0
Last level cache fragment per socket: 0
Core PMU (perfmon) version: 4
Number of core PMU typical (programmable) counters: 4
Typical (programmable) counter width: 48 bits
Number of core PMU fixed counters: 3
Fixed counter width: 48 bits
Nominal core frequency: 3700000000Hz
Enable IBRS in Kernel: No
Enable STIBP in Kernel: No
The processor is not susceptible to bad data cache loads.
The processor supports enhanced IBRS.
Package Thermal Specifications Power: 95W; Minimum package power: 0 watts; Package maximum power: 0W;

Info: 0 UBOX devices detected.
Socket 0: 0 PCU devices detected. 0 IIO device detected. 0 IRP unit detected. 0 CHA/CBO units detected. 0 MDF units detected. 0 CXL devices detected.
Socket 1: 0 PCU devices detected. 0 IIO device detected. 0 IRP unit detected. 0 CHA/CBO units detected. 0 MDF units detected. 0 CXL devices detected.
Socket 2: 0 PCU devices detected. 0 IIO device detected. 0 IRP unit detected. 0 CHA/CBO units detected. 0 MDF units detected. 0 CXL devices detected.
Socket 3: 0 PCU devices detected. 0 IIO device detected. 0 IRP unit detected. 0 CHA/CBO units detected. 0 MDF units detected. 0 CXL devices detected.

WARNING: Custom counter 0 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x800000070000000f
WARNING: Custom counter 1 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x800000070000000f
WARNING: Custom counter 2 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x800000070000000f
WARNING: Custom counter 3 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x800000070000000f
WARNING: Core 0 IA32_PERFEVTSEL0_ADDR is not set to 0. 4399313
    Zeroed PMU registers
Start a regular HTTP server at http://localhost:9738/
[1] 43000 Segmentation Error sudo ./pcm-sensor-server -r

thanks for testing. Could you please run it in gdb and provide the callstack of the crash?

@MatteoBax
Copy link
Contributor Author

'Segmentation Fault' error occurs due to unknown reasons.

sudo ./pcm-sensor-server -r
password:

===== Processor Information =====
Hybrid processor: No
IBRS and IBPB support: Yes
STIBP Support: Yes
Specifications Arch Cap Support: Yes
Maximum CPUID level: 22
CPU Model Number: 158
Number of physical cores: 1
Number of logical cores: 12
Number of online logical cores: 12
Threads per physical core (logical cores): 8
Number of sockets: 4
Physical cores per socket: 0
Last level cache fragment per socket: 0
Core PMU (perfmon) version: 4
Number of core PMU typical (programmable) counters: 4
Typical (programmable) counter width: 48 bits
Number of core PMU fixed counters: 3
Fixed counter width: 48 bits
Nominal core frequency: 3700000000Hz
Enable IBRS in Kernel: No
Enable STIBP in Kernel: No
The processor is not susceptible to bad data cache loads.
The processor supports enhanced IBRS.
Package Thermal Specifications Power: 95W; Minimum package power: 0 watts; Package maximum power: 0W;

Info: 0 UBOX devices detected.
Socket 0: 0 PCU devices detected. 0 IIO device detected. 0 IRP unit detected. 0 CHA/CBO units detected. 0 MDF units detected. 0 CXL devices detected.
Socket 1: 0 PCU devices detected. 0 IIO device detected. 0 IRP unit detected. 0 CHA/CBO units detected. 0 MDF units detected. 0 CXL devices detected.
Socket 2: 0 PCU devices detected. 0 IIO device detected. 0 IRP unit detected. 0 CHA/CBO units detected. 0 MDF units detected. 0 CXL devices detected.
Socket 3: 0 PCU devices detected. 0 IIO device detected. 0 IRP unit detected. 0 CHA/CBO units detected. 0 MDF units detected. 0 CXL devices detected.

WARNING: Custom counter 0 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x800000070000000f
WARNING: Custom counter 1 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x800000070000000f
WARNING: Custom counter 2 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x800000070000000f
WARNING: Custom counter 3 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x800000070000000f
WARNING: Core 0 IA32_PERFEVTSEL0_ADDR is not set to 0. 4399313
    Zeroed PMU registers
Start a regular HTTP server at http://localhost:9738/
[1] 43000 Segmentation Error sudo ./pcm-sensor-server -r

Me too

@MatteoBax
Copy link
Contributor Author

Callstack of the crash:

Warning: PMU appears to be busy, do you want to reset it? (y/n)
y
 Zeroed PMU registers
Starting plain HTTP server on http://localhost:9738/
Process 939 stopped
* thread #19, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
pcm-sensor-server`pcm::BasicCounterState::readAndAggregate:
->  0x10005dc15 <+165>: movq   0x8(%rax), %rax
    0x10005dc19 <+169>: testq  %rax, %rax
    0x10005dc1c <+172>: je     0x10005ead2               ; <+3938>
    0x10005dc22 <+178>: movq   %rsi, %r15
  thread #20, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
pcm-sensor-server`pcm::BasicCounterState::readAndAggregate:
->  0x10005dc15 <+165>: movq   0x8(%rax), %rax
    0x10005dc19 <+169>: testq  %rax, %rax
    0x10005dc1c <+172>: je     0x10005ead2               ; <+3938>
    0x10005dc22 <+178>: movq   %rsi, %r15
  thread #21, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
pcm-sensor-server`pcm::BasicCounterState::readAndAggregate:
->  0x10005dc15 <+165>: movq   0x8(%rax), %rax
    0x10005dc19 <+169>: testq  %rax, %rax
    0x10005dc1c <+172>: je     0x10005ead2               ; <+3938>
    0x10005dc22 <+178>: movq   %rsi, %r15
  thread #22, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
pcm-sensor-server`pcm::BasicCounterState::readAndAggregate:
->  0x10005dc15 <+165>: movq   0x8(%rax), %rax
    0x10005dc19 <+169>: testq  %rax, %rax
    0x10005dc1c <+172>: je     0x10005ead2               ; <+3938>
    0x10005dc22 <+178>: movq   %rsi, %r15
  thread #23, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
pcm-sensor-server`pcm::BasicCounterState::readAndAggregate:
->  0x10005dc15 <+165>: movq   0x8(%rax), %rax
    0x10005dc19 <+169>: testq  %rax, %rax
    0x10005dc1c <+172>: je     0x10005ead2               ; <+3938>
    0x10005dc22 <+178>: movq   %rsi, %r15
  thread #24, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
pcm-sensor-server`pcm::BasicCounterState::readAndAggregate:
->  0x10005dc15 <+165>: movq   0x8(%rax), %rax
    0x10005dc19 <+169>: testq  %rax, %rax
    0x10005dc1c <+172>: je     0x10005ead2               ; <+3938>
    0x10005dc22 <+178>: movq   %rsi, %r15
  thread #25, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
pcm-sensor-server`pcm::BasicCounterState::readAndAggregate:
->  0x10005dc15 <+165>: movq   0x8(%rax), %rax
    0x10005dc19 <+169>: testq  %rax, %rax
    0x10005dc1c <+172>: je     0x10005ead2               ; <+3938>
    0x10005dc22 <+178>: movq   %rsi, %r15
Target 3: (pcm-sensor-server) stopped.

@MatteoBax
Copy link
Contributor Author

MatteoBax commented Dec 6, 2023

Infinite loop of:

WARNING: Core 0 IA32_PERFEVTSEL0_ADDR is not zeroed 18446744073709551615
Warning: PMU appears to be busy, do you want to reset it? (y/n)

did you run into this issue (signing/sip): #608 (comment)

I turned off SIP and the loop no longer occurs.

@opcm
Copy link
Contributor

opcm commented Dec 6, 2023

Callstack of the crash:

Warning: PMU appears to be busy, do you want to reset it? (y/n)
y
 Zeroed PMU registers
Starting plain HTTP server on http://localhost:9738/
Process 939 stopped
* thread #19, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
pcm-sensor-server`pcm::BasicCounterState::readAndAggregate:
->  0x10005dc15 <+165>: movq   0x8(%rax), %rax
    0x10005dc19 <+169>: testq  %rax, %rax
    0x10005dc1c <+172>: je     0x10005ead2               ; <+3938>
    0x10005dc22 <+178>: movq   %rsi, %r15
  thread #20, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
pcm-sensor-server`pcm::BasicCounterState::readAndAggregate:
->  0x10005dc15 <+165>: movq   0x8(%rax), %rax
    0x10005dc19 <+169>: testq  %rax, %rax
    0x10005dc1c <+172>: je     0x10005ead2               ; <+3938>
    0x10005dc22 <+178>: movq   %rsi, %r15
  thread #21, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
pcm-sensor-server`pcm::BasicCounterState::readAndAggregate:
->  0x10005dc15 <+165>: movq   0x8(%rax), %rax
    0x10005dc19 <+169>: testq  %rax, %rax
    0x10005dc1c <+172>: je     0x10005ead2               ; <+3938>
    0x10005dc22 <+178>: movq   %rsi, %r15
  thread #22, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
pcm-sensor-server`pcm::BasicCounterState::readAndAggregate:
->  0x10005dc15 <+165>: movq   0x8(%rax), %rax
    0x10005dc19 <+169>: testq  %rax, %rax
    0x10005dc1c <+172>: je     0x10005ead2               ; <+3938>
    0x10005dc22 <+178>: movq   %rsi, %r15
  thread #23, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
pcm-sensor-server`pcm::BasicCounterState::readAndAggregate:
->  0x10005dc15 <+165>: movq   0x8(%rax), %rax
    0x10005dc19 <+169>: testq  %rax, %rax
    0x10005dc1c <+172>: je     0x10005ead2               ; <+3938>
    0x10005dc22 <+178>: movq   %rsi, %r15
  thread #24, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
pcm-sensor-server`pcm::BasicCounterState::readAndAggregate:
->  0x10005dc15 <+165>: movq   0x8(%rax), %rax
    0x10005dc19 <+169>: testq  %rax, %rax
    0x10005dc1c <+172>: je     0x10005ead2               ; <+3938>
    0x10005dc22 <+178>: movq   %rsi, %r15
  thread #25, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
pcm-sensor-server`pcm::BasicCounterState::readAndAggregate:
->  0x10005dc15 <+165>: movq   0x8(%rax), %rax
    0x10005dc19 <+169>: testq  %rax, %rax
    0x10005dc1c <+172>: je     0x10005ead2               ; <+3938>
    0x10005dc22 <+178>: movq   %rsi, %r15
Target 3: (pcm-sensor-server) stopped.

Thanks. Could you please type "bt" to see the full call stack of the crashing thread (with all frames)?

@MatteoBax
Copy link
Contributor Author

MatteoBax commented Dec 6, 2023

Thanks. Could you please type "bt" to see the full call stack of the crashing thread (with all frames)?

Full call stack of the crashing thread:

* thread #19, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
  * frame #0: 0x000000010005dc15 pcm-sensor-server`pcm::BasicCounterState::readAndAggregate(std::__1::shared_ptr<pcm::SafeMsrHandle>) + 165
    frame #1: 0x0000000100096e18 pcm-sensor-server`pcm::Aggregator::dispatch(pcm::HyperThread*)::'lambda'(pcm::HyperThread*)::operator()(pcm::HyperThread*) const + 520
    frame #2: 0x0000000100096c06 pcm-sensor-server`std::__1::__packaged_task_func<std::__1::__bind<pcm::Aggregator::dispatch(pcm::HyperThread*)::'lambda'(pcm::HyperThread*)&, pcm::HyperThread*&>, std::__1::allocator<std::__1::__bind<pcm::Aggregator::dispatch(pcm::HyperThread*)::'lambda'(pcm::HyperThread*)&, pcm::HyperThread*&>>, pcm::CoreCounterState ()>::operator()() + 22
    frame #3: 0x0000000100097144 pcm-sensor-server`std::__1::packaged_task<pcm::CoreCounterState ()>::operator()() + 100
    frame #4: 0x0000000100094479 pcm-sensor-server`pcm::ThreadPool::execute(pcm::ThreadPool*) + 41
    frame #5: 0x00000001000381b0 pcm-sensor-server`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, std::__1::__bind<void (*)(pcm::ThreadPool*), pcm::ThreadPool*>>>(void*) + 48
    frame #6: 0x00007ff8120e9202 libsystem_pthread.dylib`_pthread_start + 99
    frame #7: 0x00007ff8120e4bab libsystem_pthread.dylib`thread_start + 15

@ogbrugge
Copy link
Contributor

ogbrugge commented Dec 6, 2023

That is very clearly a null pointer access... Not sure how that is possible with all the shared_ptrs, will need to look into that.

@ogbrugge
Copy link
Contributor

Would it be possible to run pcm-sensor-server with all debug output enabled? It's a documented command line switch (--help), set it to 5, redirect the output to a file and attach it here please?

@MatteoBax
Copy link
Contributor Author

Would it be possible to run pcm-sensor-server with all debug output enabled? It's a documented command line switch (--help), set it to 5, redirect the output to a file and attach it here please?

debug.log

@ogbrugge
Copy link
Contributor

Is this the full log before the errors come and that is it? Oh my...

@MatteoBax
Copy link
Contributor Author

Is this the full log before the errors come and that is it? Oh my...

This is the entire log from the sudo ./pcm-sensor-server -D 5 -r command

@ogbrugge
Copy link
Contributor

Thanks, this means the problem happens quite soon, if not immediately after startup, @opcm, I'm not sure what the cause is but this could be related to things not being properly initialized. What fix did you make for the other MacOSX problem?

@opcm
Copy link
Contributor

opcm commented Dec 13, 2023

Thanks, this means the problem happens quite soon, if not immediately after startup, @opcm, I'm not sure what the cause is but this could be related to things not being properly initialized. What fix did you make for the other MacOSX problem?

the other problem I remember did not require any fix in pcm: #608 (comment)

@opcm
Copy link
Contributor

opcm commented Dec 13, 2023

I believe there is an issue with identification of CPU topology. @MatteoBax , would it be possible to run as root and set this environment variable: PCM_PRINT_TOPOLOGY=1 and run pcm? (Note: https://unix.stackexchange.com/questions/202383/how-to-pass-environment-variable-to-sudo-su)

@MatteoBax
Copy link
Contributor Author

I believe there is an issue with identification of CPU topology. @MatteoBax , would it be possible to run as root and set this environment variable: PCM_PRINT_TOPOLOGY=1 and run pcm? (Note: https://unix.stackexchange.com/questions/202383/how-to-pass-environment-variable-to-sudo-su)

@opcm you are right, there was a problem with identifying the CPU topology:

=====  Processor topology  =====
OS_Processor    Thread_Id       Core_Id         Tile_Id         Package_Id      Core_Type   Native_CPU_Model
0               0               0               0               0               unknown         0               
1               0               0               0               0               unknown         0               
2               0               1               0               0               unknown         0               
3               0               1               0               0               unknown         0               
4               0               2               0               0               unknown         0               
5               0               2               0               0               unknown         0               
6               0               3               0               0               unknown         0               
7               0               3               0               0               unknown         0               

@opcm
Copy link
Contributor

opcm commented Dec 14, 2023

@MatteoBax the https://github.com/intel/pcm/tree/opcm-patch-pcm-sensor-server-osx branch has been updated with the new topology code for OSX. Could you please

  1. download the new version from https://github.com/intel/pcm/tree/opcm-patch-pcm-sensor-server-osx
  2. rebuild it (both user space and the MacMsr driver)
  3. load the new MacMsr driver version
  4. run the new version of pcm-sensor-server with the PCM_PRINT_TOPOLOGY=1 variable

does the new version crash? Please share the complete output from pcm-sensor-server with all warning and information messages.

@MatteoBax
Copy link
Contributor Author

MatteoBax commented Dec 14, 2023

It crash anyway

=====  Processor information  =====
Hybrid processor         : no
IBRS and IBPB supported  : yes
STIBP supported          : yes
Spec arch caps supported : yes
Max CPUID level          : 22
CPU model number         : 142
Number of physical cores: 4
Number of logical cores: 8
Number of online logical cores: 8
Threads (logical cores) per physical core: 2
Num sockets: 1
Physical cores per socket: 4
Last level cache slices per socket: 4
Core PMU (perfmon) version: 4
Number of core PMU generic (programmable) counters: 4
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 2100000000 Hz
IBRS enabled in the kernel   : no
STIBP enabled in the kernel  : no
The processor is not susceptible to Rogue Data Cache Load: yes
The processor supports enhanced IBRS                     : yes

=====  Processor topology  =====
OS_Processor    Thread_Id       Core_Id         Module_Id       Tile_Id         Die_Id          Die_Group_Id    Package_Id      Core_Type       Native_CPU_Model
0               0               0               0               0               0               0               0               unknown         0               
1               1               0               0               0               0               0               0               unknown         0               
2               0               1               0               1               0               0               0               unknown         0               
3               1               1               0               1               0               0               0               unknown         0               
4               0               2               0               2               0               0               0               unknown         0               
5               1               2               0               2               0               0               0               unknown         0               
6               0               3               0               3               0               0               0               unknown         0               
7               1               3               0               3               0               0               0               unknown         0               
=====  Placement on packages  =====
Package Id.    Core Id.     Processors
0              0,1,2,3

=====  Core/Tile sharing  =====
Level      Processors
Core       (0,1)(2,3)(4,5)(6,7)
Tile / L2$ (0,1)(2,3)(4,5)(6,7)

Package thermal spec power: 15 Watt; Package minimum power: 0 Watt; Package maximum power: 0 Watt;

Info: 0 UBOX units detected.
Socket 0: 0 PCU units detected. 0 IIO units detected. 0 IRP units detected. 0 CHA/CBO units detected. 0 MDF units detected. 0 CXL units detected.

WARNING: Custom counter 0 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x70000000f
WARNING: Custom counter 1 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x70000000f
WARNING: Custom counter 2 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x70000000f
WARNING: Custom counter 3 is in use. MSR_PERF_GLOBAL_INUSE on core 0: 0x70000000f
WARNING: Core 0 IA32_PERFEVTSEL0_ADDR is not zeroed 4399313
 Zeroed PMU registers
Starting plain HTTP server on http://localhost:9738/

@opcm
Copy link
Contributor

opcm commented Dec 17, 2023

thank you for testing @MatteoBax I found an issue which should directly relate to the crash. I pushed a fix into https://github.com/intel/pcm/tree/opcm-patch-pcm-sensor-server-osx branch. Could you please download it again and test?

@MatteoBax
Copy link
Contributor Author

MatteoBax commented Dec 17, 2023

thank you for testing @MatteoBax I found an issue which should directly relate to the crash. I pushed a fix into https://github.com/intel/pcm/tree/opcm-patch-pcm-sensor-server-osx branch. Could you please download it again and test?

It works! A thousand thanks @opcm.

The only problem I have is that when I run /usr/local/sbin/pcm utility I get this:

dyld[3356]: Library not loaded: @rpath/libPcmMsr.dylib
  Referenced from: <E938B611-4170-3DD4-81F1-B72C7CEB5EF4> /usr/local/sbin/pcm
  Reason: no LC_RPATH's found

I've had this problem before. Should I open another issue?

@ogbrugge
Copy link
Contributor

Woohoo!! Glad @opcm found the issue!

@opcm
Copy link
Contributor

opcm commented Dec 18, 2023

thank you for testing @MatteoBax I found an issue which should directly relate to the crash. I pushed a fix into https://github.com/intel/pcm/tree/opcm-patch-pcm-sensor-server-osx branch. Could you please download it again and test?

It works! A thousand thanks @opcm.

thank you for testing.

The only problem I have is that when I run /usr/local/sbin/pcm utility I get this:

dyld[3356]: Library not loaded: @rpath/libPcmMsr.dylib
  Referenced from: <E938B611-4170-3DD4-81F1-B72C7CEB5EF4> /usr/local/sbin/pcm
  Reason: no LC_RPATH's found

I've had this problem before. Should I open another issue?

Do you remember how you resolved that issue?
Does copying libPcmMsr.dylib as described in

2) copy build/lib/libPcmMsr.dylib to a location on your path (auto-install uses /usr/lib)

help?

You might also want to try setting DYLD_LIBRARY_PATH env variable to point to the directory with libPcmMsr.dylib:
https://stackoverflow.com/questions/3146274/is-it-ok-to-use-dyld-library-path-on-mac-os-x-and-whats-the-dynamic-library-s

@MatteoBax
Copy link
Contributor Author

MatteoBax commented Dec 18, 2023

If I run pcm from /usr/local/sbin the error is generated, while if I run it from pcm/build/bin the error is not generated.
If I point the DYLD_LIBRARY_PATH environment variable to /usr/local/lib/, this problem does not occur.
This is a problem with the pcm executable and not with pcm-sensor-server.

@opcm
Copy link
Contributor

opcm commented Dec 18, 2023

"point the DYLD_LIBRARY_PATH environment variable to /usr/local/lib/" looks like a solution. Thank you. Perhaps there should be a pull request (MAC HOW TO) documenting it

@MatteoBax
Copy link
Contributor Author

MatteoBax commented Dec 18, 2023

"point the DYLD_LIBRARY_PATH environment variable to /usr/local/lib/" looks like a solution. Thank you. Perhaps there should be a pull request (MAC HOW TO) documenting it

Isn't it possible to specify the path that the DYLD_LIBRARY_PATH environment variable points to during building?

Regarding my previous statement, I stand corrected. All pcm executables fail to find the dynamic library when run from the /usr/local/sbin directory (i.e. the directory they are installed in).

@opcm
Copy link
Contributor

opcm commented Dec 19, 2023

"point the DYLD_LIBRARY_PATH environment variable to /usr/local/lib/" looks like a solution. Thank you. Perhaps there should be a pull request (MAC HOW TO) documenting it

Isn't it possible to specify the path that the DYLD_LIBRARY_PATH environment variable points to during building?

need to do some research if and how that is possible.

Regarding my previous statement, I stand corrected. All pcm executables fail to find the dynamic library when run from the /usr/local/sbin directory (i.e. the directory they are installed in).

good to know.

Please open a new issue.

@rdementi rdementi linked a pull request Dec 19, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants