plfit fails some tests on the s390x, powerpc, and ppc64 architectures #38

jgmbenoit · 2021-12-19T13:57:06Z

see the Build status of the plfit Debian package for more details.

szhorvat · 2021-12-27T20:55:19Z

Travis supports these. Could set up a Travis-based CI to debug.

ntamas · 2022-01-04T10:17:22Z

I have added a .travis.yml config file but apparently Travis is not willing to recognize my repo. All my other repos are visible in Travis but not this one. @szhorvat have you seen anything like this before?

Edit: nevermind, got it, apparently I had to "migrate" the repo to travis-ci.com first.

ntamas · 2022-01-04T11:37:06Z

The build fails on s390x with Travis, but apparently only in release mode.

szhorvat · 2022-01-04T11:39:33Z

When this issue was opened, I attempted to compile it for x87, but even with SSE disabled, I get an error that this is not possible. If that's easily fixable, it might be worth a try as it would make debugging simpler.

ntamas · 2022-01-04T12:28:12Z

I can't test this easily for x87 as I am on Apple Silicon and -mfpmath=387 fails there; gcc 11 from Homebrew says unrecognized command-line option '-mfpmath=387' whle Clang returns compiler errors like SSE register return with SSE disabled. What error messages did you see when compiling for x87?

szhorvat · 2022-01-06T17:32:17Z

This was my mistake. It doesn't work with Clang on macOS, but it does work with GCC. Of course this is Intel-specific so it won't work on your M1 machine.

The test suite passes.

For reference, I used:

CC=gcc-mp-11 cmake .. -DPLFIT_USE_SSE=OFF -DCMAKE_C_FLAGS=-mfpmath=387

where gcc-mp-11 is GCC 11.2.0 from MacPorts.

ntamas · 2022-01-06T21:03:12Z

The ppc64le build seems to work but I just realized that this is probably identical to ppc64el on Debian, which also passes. Travis does not offer the "pure" ppc64 or powerpc architectures. Still, there is a genuine failure on s390x so that's what we should focus on, hoping that solving it will also solve the problem on the other two platforms.

ntamas · 2022-01-06T21:37:29Z

@jgmbenoit Something weird is going on with the hsl_sf_hzeta() function on s390x. Take a look at this recent test log from Travis. I have added debug output (search for DEBUG:) that prints the components of the formula that calculates the log-likelihood of a discrete test data set. On my machine the correct output is:

DEBUG: xmin = 2.0000, alpha = 2.5800, result = 6010.4751, m = 5457
DEBUG: hsl_sf_lnhzeta(alpha, xmin) = -1.1639

The output from the test log on s390x is:

DEBUG: xmin = 2.0000, alpha = 2.5800, result = 6010.4751, m = 5457
DEBUG: hsl_sf_lnhzeta(alpha, xmin) = -0.2244

Note that the difference between the expected and the observed log-likelihood in the test is m * (-1.1639 + 0.2244) so the math checks out; hsl_sf_lnhzeta() is returning an invalid value for alpha=2.58 and xmin=2. However, the same alpha and xmin tested in isolation in a newly added test_hzeta.c file yields the correct result on s390x as well!

Do you have any ideas about what may be going on here? Or, alternatively, do you have a detailed test suite for hsl_sf_lnhzeta() that I could plug into the plfit tests to get more insights?

jgmbenoit · 2022-01-07T08:03:37Z

I have indeed a test suite for the zeta related functions. I confess that I only run them on my amd64 boxes. The tests are against values obtained with Maple. The tests fit well in my HSL library source (my own Home Scientific Library based on GSL). I can adapt it to plfit source, but I need some time to make this. I will try to submit a patch by the end of the coming week-end.

My first guess is that one of the used native C math functions does not work as expected on this architecture. If so, my fist move will be to check how this is managed in GSL (upstream and at packaging level).

Note that, as Debian Developer, I have actually access to these architectures. I will try to have a look and submit a fix by the end of this week-end. This sounds as a good challenge.

ntamas · 2022-01-07T09:48:34Z

Thanks a lot for your help! Looking forward to the patched version.

ntamas · 2022-01-07T09:51:05Z

I have reverted master to the pre-debugging state and branched off into a debug/hzeta branch, so if you want to investigate the debug statements yourself, please check out debug/hzeta.

jgmbenoit · 2022-01-08T20:00:33Z

The numerical tests for zeta functions works. However the issue persists.
I could trace back the issue up to the inline function hsl_sf_hZeta0_zed which play with long double.
Commenting the inline does not solve the issue. It appears that the long double math functions gives totally foolish values. Any hints ?

szhorvat · 2022-01-08T21:42:31Z

long double is not available on every platform, is it? Or if it is, it may not work the same way. On Intel, it is 80 bits.

I would avoid relying on long double when possible ...

jgmbenoit · 2022-01-09T21:03:39Z

My understanding is that gcc ruins some tricky numeric codes by optimizing them: gcc is too smart or not enough. These codes involve long double numbers and some arithmetic tricks to keep last digits. The initial formula involves cancellations that are hard to manage when we compute in fixed precision. I also agree that long double computation must be avoided. So this forces me to revisit this key part of the computation: this is a numerical issue not a coding issue. This comeback on the subject was profitable since I can see now how to transform the formula into a more sympathetic form. I think now that I can manage better the cancellations and I can stay with double numbers. I working on it, and it may take some time. Thanks to exotic architectures for helping us for better programming :)

plugwash · 2022-01-10T16:07:53Z

long double is not available on every platform, is it?

long double exists on every platform, but it's characteristics are all over the place. It might be the same as double, it might be Intel x87 80 bit extended precision, it might be powerpc "double double" or it might be IEEE quad precision. It may also be implemented in software even when regular floating point is implemented in hardware. So it really doesn't make sense to use in portable code.

xypron · 2022-02-25T13:40:56Z

gcc on s390x has build flags -mlong-double-64 and -mlong-double-128. When building with -mlong-double-64 the unit tests are passed. With -mlong-double-128:

4: Test command: /<<PKGBUILDDIR>>/_BUILD_SHARED/test/test_sampling
4: Test timeout computed to be: 10000000
1: ./test/test_discrete.c:39 : expected -9155.62809000, got -14282.71356713, difference: -5127.09
1/8 Test #1: test_discrete ....................***Failed    0.00 sec
./test/test_discrete.c:39 : expected -9155.62809000, got -14282.71356713, difference: -5127.09

test 5
    Start 5: test_underflow_handling

5: Test command: /<<PKGBUILDDIR>>/_BUILD_SHARED/test/test_underflow_handling
5: Test timeout computed to be: 10000000
3: ./test/test_real.c:41 : expected -245.14869000, got -313.14725000, difference: -67.9986
2/8 Test #3: test_real ........................***Failed    0.00 sec
./test/test_real.c:41 : expected -245.14869000, got -313.14725000, difference: -67.9986

Either the unit test for log-likelihood should be removed or long double should not be used in the package.

jgmbenoit · 2022-02-25T14:37:03Z

Either the unit test for log-likelihood should be removed or long double should not be used in the package.

The second option is the one to choose: long double should not be used in the package .
And I am working on it.

jgmbenoit · 2023-04-11T07:32:01Z

This comeback on the subject was profitable since I can see now how to transform the formula into a more sympathetic form. I think now that I can manage better the cancellations and I can stay with double numbers.

With the help of my favourite Computer Algebra System I could finally isolate and manage properly the faulty cancellations. I will now focus on the implementation of the analytical result. This is the easy part.

ntamas · 2023-04-11T08:21:35Z

Thanks, looking forward to this!

sanvila · 2024-08-03T15:46:17Z

Hello. This is to note that the issue was apparently fixed with this commit, which is already in version 0.9.5:

0559e46

In fact, I logged to a s390x machine running Debian 12 (bookworm), checked that version 0.9.4 failed to build (because of the tests), then applied the above patch over version 0.9.4, and then the tests started to pass.

For this reason, when we (Debian) packaged version 0.9.5, the package started to build ok without doing anything else:

https://buildd.debian.org/status/logs.php?pkg=plfit&arch=s390x
https://buildd.debian.org/status/logs.php?pkg=plfit&arch=powerpc
https://buildd.debian.org/status/logs.php?pkg=plfit&arch=ppc64

Thanks.

szhorvat · 2024-08-03T16:01:17Z

Thanks for the feedback! This is very valuable since we can't easily test on these platforms.

Getting rid of long double would still be preferable for the reasons stated above.

jgmbenoit · 2024-08-03T20:29:44Z

Getting rid of long double would still be preferable for the reasons stated above.

This is absolutely true. The issue is a numeric issue. I am still working on it. Since my last message here, I have done some progress. It has been far harder than I thought, but I went somewhere. I am now trying to articulate the analysis. Thanks for your patience.

szhorvat · 2024-11-16T22:39:38Z

@jgmbenoit No pressure, just checking if there's any update on this, since a new release may be imminent.

jgmbenoit · 2024-11-16T23:02:40Z

Thanks for asking. Unfortunately I have no update. I just need some peaceful time before, what for me it hard to get right now.

barracuda156 · 2024-12-03T01:56:59Z

At least now on powerpc everything passes:

--->  Testing plfit
Executing:  cd "/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_plfit/plfit/work/build" && /usr/bin/make test 
Running tests...
/opt/local/bin/ctest --force-new-ctest-process 
Test project /opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_plfit/plfit/work/build
    Start 1: test_discrete
1/9 Test #1: test_discrete ....................   Passed    0.19 sec
    Start 2: test_continuous
2/9 Test #2: test_continuous ..................   Passed    0.46 sec
    Start 3: test_real
3/9 Test #3: test_real ........................   Passed    0.10 sec
    Start 4: test_sampling
4/9 Test #4: test_sampling ....................   Passed    0.07 sec
    Start 5: test_underflow_handling
5/9 Test #5: test_underflow_handling ..........   Passed    0.02 sec
    Start 6: test_xmin_too_low
6/9 Test #6: test_xmin_too_low ................   Passed    0.01 sec
    Start 7: test_hzeta
7/9 Test #7: test_hzeta .......................   Passed    0.01 sec
    Start 8: test_kolmogorov
8/9 Test #8: test_kolmogorov ..................   Passed    0.01 sec
    Start 9: test_gss
9/9 Test #9: test_gss .........................   Passed    0.02 sec

100% tests passed, 0 tests failed out of 9

Total Test time (real) =   0.92 sec

36-92% port -v installed plfit | grep active
  plfit @1.0.0_0+openmp (active) requested_variants='' platform='darwin 10' archs='ppc' date='2024-12-03T09:56:34+0800'

ntamas · 2024-12-03T09:14:08Z

Closing this as it seems like the issue has magically resolved itself. Feel free to reopen if it starts to fail again.

xypron · 2024-12-03T10:02:00Z

Just for reference: Debian and Ubuntu added the following to resolve the issue on s390x:

ifneq (,$(filter s390x,$(DEB_HOST_ARCH)))
export DEB_CFLAGS_APPEND = -mlong-double-64
endif

barracuda156 · 2024-12-03T10:05:52Z

Just for reference: Debian and Ubuntu added the following to resolve the issue on s390x:
ifneq (,$(filter s390x,$(DEB_HOST_ARCH)))
export DEB_CFLAGS_APPEND = -mlong-double-64
endif

@xypron I do not know what s390x uses and got no hardware either. PowerPC uses IBM long double (composed of two doubles), that seems to work as-is.

szhorvat · 2024-12-03T11:10:45Z

Let's leave it to @jgmbenoit to decide, since he's been working on a fix for a long time and has the best understanding. I think long double still has the potential to cause trouble.

jgmbenoit · 2024-12-03T11:38:02Z

Just for reference: Debian and Ubuntu added the following to resolve the issue on s390x:
ifneq (,$(filter s390x,$(DEB_HOST_ARCH)))
export DEB_CFLAGS_APPEND = -mlong-double-64
endif

At the time of writing, this snippet is not present in the Debian rules file of the pffit package. However, it is added in the Ubuntu version indeed.

jgmbenoit · 2024-12-03T11:45:06Z

Let's leave it to @jgmbenoit to decide, since he's been working on a fix for a long time and has the best understanding. I think long double still has the potential to cause trouble.

I agree that long double is still a cause of trouble. My comment in the Debian report 1003395 is still pertinent. I would keep the issue open and tag its severity as import (as in the Debian at the time of writing)

szhorvat mentioned this issue Sep 8, 2022

test_igraph_power_law_fit fails on arm64 in Fedora igraph/igraph#2189

Closed

ntamas closed this as completed Dec 3, 2024

ntamas reopened this Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plfit fails some tests on the s390x, powerpc, and ppc64 architectures #38

plfit fails some tests on the s390x, powerpc, and ppc64 architectures #38

jgmbenoit commented Dec 19, 2021

szhorvat commented Dec 27, 2021

ntamas commented Jan 4, 2022 •

edited

Loading

ntamas commented Jan 4, 2022

szhorvat commented Jan 4, 2022

ntamas commented Jan 4, 2022

szhorvat commented Jan 6, 2022

ntamas commented Jan 6, 2022

ntamas commented Jan 6, 2022 •

edited

Loading

jgmbenoit commented Jan 7, 2022 •

edited

Loading

ntamas commented Jan 7, 2022

ntamas commented Jan 7, 2022

jgmbenoit commented Jan 8, 2022 •

edited

Loading

szhorvat commented Jan 8, 2022

jgmbenoit commented Jan 9, 2022 •

edited

Loading

plugwash commented Jan 10, 2022

xypron commented Feb 25, 2022

jgmbenoit commented Feb 25, 2022

jgmbenoit commented Apr 11, 2023

ntamas commented Apr 11, 2023

sanvila commented Aug 3, 2024 •

edited

Loading

szhorvat commented Aug 3, 2024

jgmbenoit commented Aug 3, 2024 •

edited

Loading

szhorvat commented Nov 16, 2024

jgmbenoit commented Nov 16, 2024

barracuda156 commented Dec 3, 2024

ntamas commented Dec 3, 2024

xypron commented Dec 3, 2024

barracuda156 commented Dec 3, 2024

szhorvat commented Dec 3, 2024

jgmbenoit commented Dec 3, 2024

jgmbenoit commented Dec 3, 2024

plfit fails some tests on the s390x, powerpc, and ppc64 architectures #38

plfit fails some tests on the s390x, powerpc, and ppc64 architectures #38

Comments

jgmbenoit commented Dec 19, 2021

szhorvat commented Dec 27, 2021

ntamas commented Jan 4, 2022 • edited Loading

ntamas commented Jan 4, 2022

szhorvat commented Jan 4, 2022

ntamas commented Jan 4, 2022

szhorvat commented Jan 6, 2022

ntamas commented Jan 6, 2022

ntamas commented Jan 6, 2022 • edited Loading

jgmbenoit commented Jan 7, 2022 • edited Loading

ntamas commented Jan 7, 2022

ntamas commented Jan 7, 2022

jgmbenoit commented Jan 8, 2022 • edited Loading

szhorvat commented Jan 8, 2022

jgmbenoit commented Jan 9, 2022 • edited Loading

plugwash commented Jan 10, 2022

xypron commented Feb 25, 2022

jgmbenoit commented Feb 25, 2022

jgmbenoit commented Apr 11, 2023

ntamas commented Apr 11, 2023

sanvila commented Aug 3, 2024 • edited Loading

szhorvat commented Aug 3, 2024

jgmbenoit commented Aug 3, 2024 • edited Loading

szhorvat commented Nov 16, 2024

jgmbenoit commented Nov 16, 2024

barracuda156 commented Dec 3, 2024

ntamas commented Dec 3, 2024

xypron commented Dec 3, 2024

barracuda156 commented Dec 3, 2024

szhorvat commented Dec 3, 2024

jgmbenoit commented Dec 3, 2024

jgmbenoit commented Dec 3, 2024

ntamas commented Jan 4, 2022 •

edited

Loading

ntamas commented Jan 6, 2022 •

edited

Loading

jgmbenoit commented Jan 7, 2022 •

edited

Loading

jgmbenoit commented Jan 8, 2022 •

edited

Loading

jgmbenoit commented Jan 9, 2022 •

edited

Loading

sanvila commented Aug 3, 2024 •

edited

Loading

jgmbenoit commented Aug 3, 2024 •

edited

Loading