Skip to content

Conversation

khwilliamson
Copy link
Contributor

@khwilliamson khwilliamson commented Aug 24, 2025

proto.h contains a generated PERL_ARGS_ASSERT macro for every function. It asserts that each parameter that isn't allowed to be NULL actually isn't (except there are no asserts for the aTHX parameter.)

These asserts are disabled when not DEBUGGING. But many compilers allow a compile-time assertion to be made for this situation, so we can add an extra measure of protection for free. And this gives hints to the compiler for optimizations when the asserts() aren't there.

In addition it adds a compile time assertion for the aTHX parameter.

  • This set of changes does not require a perldelta entry.

@bulk88
Copy link
Contributor

bulk88 commented Aug 25, 2025

Doing this showed a small issue

util.c: In function ‘void Perl_set_context(void*)’:
perl.h:6412:26: warning: ‘nonnull’ argument ‘t’ compared to NULL [-Wnonnull-compare]
 6412 |             STMT_START { if (i) PERL_SET_LOCALE_CONTEXT(i); } STMT_END
      |                          ^~
util.c:3665:5: note: in expansion of macro ‘PERL_SET_NON_tTHX_CONTEXT’
 3665 |     PERL_SET_NON_tTHX_CONTEXT((PerlInterpreter *) t);

in which the compiler now catches that t can't be NULL. I don't know the best way to resolve this.

tried PTR2nat() or macro NUM2PTR(size_t,ptr)?

#  define PERL_SET_CONTEXT(t)                                               \
    STMT_START {                                                            \
        int _eC_;                                                           \
        if ((_eC_ = pthread_setspecific(PL_thr_key,                         \
                                        PL_current_context = (void *)(t)))) \
            Perl_croak_nocontext("panic: pthread_setspecific (%d) [%s:%d]", \
                                 _eC_, __FILE__, __LINE__);                 \
        PERL_SET_NON_tTHX_CONTEXT(t);                                       \
    } STMT_END
    /* In some Configurations there may be per-thread information that is
     * carried in a library instead of perl's tTHX structure.  This macro is to
     * be used to handle those when tTHX is changed.  Only locale handling is
     * currently known to be affected. */
#  define PERL_SET_NON_tTHX_CONTEXT(i)                                      \
            STMT_START { if (i) PERL_SET_LOCALE_CONTEXT(i); } STMT_END

Looks like a bug with POSIX Perl's PERL_SET_CONTEXT(t)''s internals which is macro PERL_SET_NON_tTHX_CONTEXT(i).

Imagine is the wrong word b/c I've done this b4 for private biz XS.

So lets imagine, I am a CPAN XS module running on ithread-ed WinPerl, with only 1 Perl thread (my_perl) in the process, and I am using the Native OS's >= Win2000 Thread Pool feature with Perl.

So I am an XS->PP event handler executing inside a TP Thd, The root thread is frozen (blocked until I release control back to the root thread manually). At the end of my TP thread runner, after the call_sv(); but before I return control back to the OS, and send the kernel event to wakeup the root thread, I would be doing PERL_SET_CONTEXT((PerlInterpreter*)NULL); to detach a sleeping/frozen my_perl ptr from my current random OS TP thread, because I don't want that my_perl ptr to continue stay inside OS's Thread Local Storage for that totally random TID, random lifespan thread after I return control of my temporary thd back to the OS.

I'm avoiding an accident if some enumeration API gets smart (not really) and runs my C callback fn ptr asynchronously, in parallel, on multiple cores on multiple TP OS threads. Bad hygiene to leave your de allocated void ptrs in TLS.

So I would definitely want PERL_SET_CONTEXT(NULL); to work.

Or this is an optimization to const fold away all machine code associated with PERL_SET_CONTEXT() on single-threaded perls builds.

But then the question is, why was PERL_SET_CONTEXT() left compiled in, and not #if 0ed away in that XS module? Why is the Perl C API compatible with CPAN XS modules that ithread-aware but are not no-threads-aware and crash/hang/C syntax error on no-thread Perls??

You also wrote, or were the last person to clean it up.

6e13fe3

but the null test existed before the commit above, ill stop git blaming at this point.

diff --git a/locale.c b/locale.c
index 617119fdb8..20d49395fc 100644
--- a/locale.c
+++ b/locale.c
@@ -8538,19 +8538,38 @@ S_my_setlocale_debug_string_i(pTHX_
 #ifdef USE_PERL_SWITCH_LOCALE_CONTEXT
 
 void
-Perl_switch_locale_context()
+Perl_switch_locale_context(pTHX)
 {
     /* libc keeps per-thread locale status information in some configurations.
      * So, we can't just switch out aTHX to switch to a new thread.  libc has
      * to follow along.  This routine does that based on per-interpreter
-     * variables we keep just for this purpose */
-
-    /* Can't use pTHX, because we may be called from a place where that
-     * isn't available */
-    dTHX;
+     * variables we keep just for this purpose.
+     *
+     * There are two implementations where this is an issue.  For the other
+     * implementations, it doesn't matter because libc is using global values
+     * that all threads know about.
+     *
+     * The two implementations are where libc keeps thread-specific information
+     * on its own.  These are
+     *
+     * POSIX 2008:  The current locale is kept by libc as an object.  We save
+     *              a copy of that in the per-thread PL_cur_locale_obj, and so
+     *              this routine uses that copy to tell the thread it should be
+     *              operating with that object
+     * Windows thread-safe locales:  A given thread in Windows can be being run
+     *              with per-thread locales, or not.  When the thread context
+     *              changes, libc doesn't automatically know if the thread is
+     *              using per-thread locales, nor does it know what the new
+     *              thread's locale is.  We keep that information in the
+     *              per-thread variables:
+     *                  PL_controls_locale  indicates if this thread is using
+     *                                      per-thread locales or not
+     *                  PL_cur_LC_ALL       indicates what the the locale
+     *                                      should be if it is a per-thread
+     *                                      locale.
+     */
 
-    if (UNLIKELY(   aTHX == NULL
-                 || PL_veto_switch_non_tTHX_context
+    if (UNLIKELY(   PL_veto_switch_non_tTHX_context
                  || PL_phase == PERL_PHASE_CONSTRUCT))
     {
         return;

regen/embed.pl Outdated
}
else {
push @asserts, "PERL_ASSUME_NON_NULL($argname)";
push @attrs, "__attribute__nonnull__($n)";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can make the compiler optimise away the assert()s we generate, and from looking at the generated code that appears to be the case.

I had a look, with ./Configure -des -Dusedevel -DDEBUGGING && make mg.s Perl_mg_magical (first name I saw it in the proto.h diff).

blead:

Perl_mg_magical:
        subq    $8, %rsp
        testq   %rdi, %rdi  ; check sv
        je      .L136    ; jump to assert code if it is NULL
...
.L136:
        leaq    __PRETTY_FUNCTION__.110(%rip), %rcx
        movl    $136, %edx
        leaq    .LC0(%rip), %rsi
        leaq    .LC1(%rip), %rdi
        call    __assert_fail@PLT

This PR:

Perl_mg_magical:
        movl    12(%rdi), %eax  ; note directly fetches from the sv flags without the check
        movl    %eax, %edx
        andl    $-14680065, %edx
        movl    %edx, 12(%rdi)

I think we'd need to only generate the __attribute__nonnull__ for non-DEBUGGING builds.

I'll admit to being a little uncomfortable with automatically generated ASSUME()s, since they produce runtime UB if the assumption is false, but I think it's reasonable here, assuming we fix the assert() problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd need to only generate the attribute__nonnull for non-DEBUGGING builds.

I don't follow this. What's the downside of generating it for all builds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the downside of generating it for all builds?

It turns all the asserts into no-ops.

void foo(int *) __attribute__((__nonnull__(1)));

void foo(int *ptr) {
    assert(ptr);
    ...
}

Since ptr is marked "nonnull", the compiler "knows" that ptr cannot be null in the function body, so it turns assert(ptr) into a no-op.

Or as Tony wrote:

this can make the compiler optimise away the assert()s we generate

Copy link
Contributor

@Leont Leont Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was exactly my concern. This attribute having both internal and an external effect is most unfortunate. We only want one of them here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't see the problem. Part of the motivation here is to optimize away those asserts. Since these are known at compile time, shouldn't compilations fail if called wrongly, so the asserts aren't needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree with you, but the nonnull information is available across translation units because it's in the prototype. GCC can and does make (limited) use of it. Especially with -fanalyzer, but even -Wnonnull will catch some cases, especially with optimisations enabled. Optimisations are important here, the compiler does more analysis when they're enabled.

For example, this warns with -fanalyzer:

#include <stdlib.h>
 
__attribute__((nonnull(1)))
int foo(char *foo);
 
int getint();
 
char *getstring(int arg) {
    if (arg)
       return "abc";
    else
       return NULL;
}
 
int main () {
    foo(getstring(getint()));
    return 0;
}

This simpler case warns with just -O2 -Wall:

#include <stdlib.h>
 
__attribute__((nonnull(1)))
int foo(char *foo);
 
int getint() {
   return 0;
}
 
char *getstring(int arg) {
    if (arg)
       return "abc";
    else
       return NULL;
}
 
int main () {
    foo(getstring(getint()));
    return 0;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did over-react, sorry.

I do think we don't want assert()s optimised away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about optimizing them away for functions that aren't visible outside the perl core? If that is ok, what about those that are visible only in perl extensions?

Copy link
Contributor

@tonycoz tonycoz Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why optimise them away at all? debugging builds (and I often use -O0 for debugging builds) aren't that painfully slow.

Why take the risk?

For xenu's example, if you move the definition of getstring() and foo() to another CU, you'll only get a warning from -fanalyzer -flto builds, the detection isn't perfect:

tony@venus:.../perl/git$ cat 23641c.c
#include <stdlib.h>
 
__attribute__((nonnull(1)))
int foo(char *foo);

char *getstring(int arg);
 
int getint() {
   return 0;
}
 
int main () {
    foo(getstring(getint()));
    return 0;
}
tony@venus:.../perl/git$ cat 23641d.c
#include <stdlib.h>
#include <stdio.h>

__attribute__((nonnull(1)))
int foo(char *foo);
 
char *getstring(int arg) {
    if (arg)
       return "abc";
    else
       return NULL;
}
 
int foo(char *s) {
  puts(s);
  return 1;
}
tony@venus:.../perl/git$ gcc -Wall -O2 23641c.c 23641d.c
tony@venus:.../perl/git$ gcc -flto -Wall -O2 23641c.c 23641d.c
tony@venus:.../perl/git$ gcc -fanalyzer -Wall -O2 23641c.c 23641d.c
tony@venus:.../perl/git$ gcc -flto -fanalyzer -Wall -O2 23641c.c 23641d.c
23641c.c: In function ‘main’:
23641c.c:13:5: warning: use of NULL where non-null expected [CWE-476] [-Wanalyzer-null-argument]
   13 |     foo(getstring(getint()));
      |     ^
  ‘main’: events 1-2
    |
...

I tried a build with -Dcc='gcc -fanalyzer -flto', I killed the link step for miniperl after several minutes because it was using over 55GB of resident space (around which point the machine started to swap.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll note that nonnull is used by UBSAN as well to trap (not just the path isolation mentioned).)

@tonycoz
Copy link
Contributor

tonycoz commented Aug 28, 2025

Sort of related https://www.youtube.com/watch?v=3zQ4zw4GNV0 which talks about static analysis of the clang nullability attributes.

@khwilliamson
Copy link
Contributor Author

I changed to use the attribute_nonnull only for non-DEBUGGING builds. I also removed the change to use ASSUME. On DEBUGGING builds, the asserts() give clues to the compiler; and on non-DEBUGGING ones, the attribute_nonnull lines give the same clues.

There are other assertions besides the non-NULL ones that go away in non-DEBUGGING, but they are insignificant in comparison with the NULL issues.

@khwilliamson khwilliamson changed the title Change ARGS_ASSERT to use ASSUME(), and __attribute__nonnull() Add __attribute__nonnull() for non-DEBUGGING buils Sep 3, 2025
@khwilliamson khwilliamson changed the title Add __attribute__nonnull() for non-DEBUGGING buils Add __attribute__nonnull__() for non-DEBUGGING buils Sep 3, 2025
@tonycoz
Copy link
Contributor

tonycoz commented Sep 4, 2025

Because of complications, this commit only does this for functions that
don't have a thread context.

Are you aware of the pTHX_1 .. pTHX_9 macros?

@khwilliamson
Copy link
Contributor Author

I wasn't aware of those macros That could help.

But I wonder. Could this result in the compiler optimizing in dangerous ways? like what @xenu and @tonycoz mentioned

@khwilliamson khwilliamson changed the title Add __attribute__nonnull__() for non-DEBUGGING buils Add __attribute__nonnull__() for non-DEBUGGING builds Sep 7, 2025
@tonycoz
Copy link
Contributor

tonycoz commented Sep 7, 2025

But I wonder. Could this result in the compiler optimizing in dangerous ways? like what @xenu and @tonycoz mentioned

All they do is let you use __attribute_nonnull__ for functions that accept a pTHX, if using __attribute_nonnull__ is safe(ish) for functions that don't need pTHX then it's as safe for functions that do need pTHX.

@tonycoz
Copy link
Contributor

tonycoz commented Sep 7, 2025

Interestingly there's already two functions that use __attribute__nonnull__: S_is_dup_mode() in op.c, and set_regex_charset() in op_reg_common.h.

I expect the nonnull make no optimisation difference to either for gcc/clang - they both dereference the pointer without a check.

It would optimise away the assert() in S_is_dup_mode() though. There's only one caller to S_is_dup_mode() so I expect it's inlined anyway, and that caller ensures the pointer is nonnull.

@khwilliamson
Copy link
Contributor Author

khwilliamson commented Sep 11, 2025

I also added an attribute_nonnull for the aTHX parameter, even for DEBUGGING builds. That better not be NULL or else things really break; so I don't see any optimization problem arising as a result of this

@khwilliamson khwilliamson force-pushed the attribute_nonnull branch 2 times, most recently from ed8616a to 85a1136 Compare September 11, 2025 03:27
@khwilliamson
Copy link
Contributor Author

This code now generates several hundred lines of warnings for compiling the code base. Some are for comparing NULL vs non-NULL. Some are more serious, where a function is being called with a NULL but expects it to not be so. This is a real bug that our asserts haven't caught, which tells me it is code that doesn't get exercised in the test suite. It could be unreachable, I suppose.

@jkeenan
Copy link
Contributor

jkeenan commented Sep 11, 2025

This code now generates several hundred lines of warnings for compiling the code base. Some are for comparing NULL vs non-NULL. Some are more serious, where a function is being called with a NULL but expects it to not be so. This is a real bug that our asserts haven't caught, which tells me it is code that doesn't get exercised in the test suite. It could be unreachable, I suppose.

Indeed! The warnings appear to have begun to be emitted with this commit:

commit bfb9271a54aba6bc363b751e5c12cb658bf20595
Author: Karl Williamson <khw@cpan.org>
Date:   Sun Aug 24 11:53:36 2025 -0600

    proto.h: Use __attribute__nonnull__ if available
    
    proto.h contains a generated PERL_ARGS_ASSERT macro for every function.
    It asserts that each parameter that isn't allowed to be NULL actually
    isn't.

It would be better to silence them before merging than wait until later.

@tonycoz
Copy link
Contributor

tonycoz commented Sep 15, 2025

This code now generates several hundred lines of warnings for compiling the code base. Some are for comparing NULL vs non-NULL. Some are more serious, where a function is being called with a NULL but expects it to not be so. This is a real bug that our asserts haven't caught, which tells me it is code that doesn't get exercised in the test suite. It could be unreachable, I suppose.

I think most of the warnings is from macros (PM_GETRE for example) testing for a validity rather than asserting that case, so PM_GETRE() returns NULL on a non-PMOP.

So the compiler sees PM_GETRE() has a conditional that can return NULL and complains when it's value (only called on a PMOP where we always return non-NULL) is supplied to a nonnull parameter.

Similarly for the complaints about various /^Gv(SV|IO|HV|AV)n?$/ macros testing that their gv is non-null, these could be handled by adding GvXXn_NN macros that don't do this check.

There's also some complaints where non-pointer parameters have meaningless NN markers. (regtail() for one)

@khwilliamson
Copy link
Contributor Author

-Wnonnull-compare
Warn when comparing an argument marked with the nonnull function attribute against null inside the function.
-Wnonnull-compare is included in -Wall. It can be disabled with the -Wno-nonnull-compare option.

Many of the warnings are from this. My reaction to seeing this warning is "so what?". I don't want to complicate our code by adding macros just to get around this. If we add the nonnull function attribute to a function, I think this warning should be turned off.

I have no idea about other compilers

@khwilliamson
Copy link
Contributor Author

The latest push generates no warnings on my box's gcc nor clang.

The main source of the warnings can be turned off with -Wno-nonnull-compare on gcc. clang does not raise anything similar. I view these warnings as basically useless for our purposes.

Some of the warnings resulted from careless coding, where the same macro was called multiple times in a row instead of storing the result of the first call, and using that instead of re-calling the macro.

In one case, there already existed a macro to use when it was known that the input is sane;I just changed to use that instead of the more general macro that has a check.

And in another case, I added a macro that assumes the input is sane and called that in the three cases where warnings about the general version were raised. The general version was changed to check, then call the non-general version.

@thesamesam
Copy link

thesamesam commented Sep 16, 2025

Many of the warnings are from this. My reaction to seeing this warning is "so what?". I don't want to complicate our code by adding macros just to get around this. If we add the nonnull function attribute to a function, I think this warning should be turned off.

It's saying any such check would be optimised out anyway, because you promised any such args will be nonnull (unless UBSAN is used, in which case it'll trap when entering the function).

i.e. It's again the annoying dual meaning of the nonnull attribute (both for optimisation and then also a description of an API and something you want diagnostics for...)

peep.c Outdated
Comment on lines 78 to 80

assert(OpSIBLING(kid));
name = op_varname(OpSIBLING(kid));
name = op_varname(OpSIBLING_nocheck(kid));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of removing the duplicate check but keeping the duplicate fetch why not use an intermediate variable:

    OP *varop = OpSIBLING(kid);
    assert(varop);
    name = op_varname(varop);

This avoids the reader having to check both OpSIBLING(kid) are the same (and now that OpSIBLING(kid) and OpSIBLING_nocheck(kid) are the same).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggested change suppresses the warning.

Comment on lines 756 to 758
op_null(kid);
op_null(OpSIBLING(kid)); /* const */
op_null(OpSIBLING_nocheck(kid)); /* const */
if (o != topop) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know kid has a sibling here? There's no logic here to indicate it does.

Note that if the op doesn't have a sibling OpSIBLING_nocheck() will return the wrong OP, not NULL, and in a DEBUGGING build the op_null() parameter assertion will pass, modifying the wrong OP instead of throwing an assertion failure.

Now we do know that there is currently always a sibling, since the code doesn't currently throw, but this change means that if some bug leads to there not being a sibling this is going to fail much later, rather than on entry to op_null().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        op_null(kid);
        OP * const const_op = OpSIBLING(kid);
        ASSUME(const_op);
        op_null(const_op);       /* const */
        if (o != topop) {

suppresses the warning

Comment on lines 320 to 322
effect. */
#define r1_ PERL_UNIQUE_NAME(r)
#define PM_SETRE(o,r) STMT_START { \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces any instance of r1_ globally, which seems heavily name polluting.


/* handle the empty pattern */
if (!RX_PRELEN(PM_GETRE(pm)) && PL_curpm) {
if (!RX_PRELEN(PM_GETRE_raw(pm)) && PL_curpm) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this just be

   if (!RX_PRELEN(new_re) && PL_curpm) {

which saves someone checking the code from having to trace that?

op.c Outdated
Comment on lines 2093 to 2094
assert(OpSIBLING(kid));
name = op_varname(OpSIBLING(kid));
name = op_varname(OpSIBLING_nocheck(kid));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to the change in peep.c this could use an intermediate variable.

This thought the string '0' meant empty; it should be testing that the
string isn't empty.
This got duplicated in recent rebasing
The next commit will want this to be available earlier.
Though code later strips this off, it's best to not put it in in the
first place
Instead store the result of the first one in a variable, and use that
going forward.
Instead, store the result of the first one in a variable, and use that
going forward
There are two sets of macros here, the raw version does no checking; and
the non-raw, which does.  Instead of duplicating the complicated
expressions, use the raw version inside the non-raw one to make it a bit
easier to read
This allows a variable name in a macro to be used that doesn't have much
extra verbiage that otherwise would be required to make it distinct from
other variables.
There are two versions of this macro; the more general one which checks
for sanity; and another, to be used when it's already known that things
are sane.  This commit converts to use the latter, as we do know it's
sane here.
gcc emits a bunch of warnings under -Wall for functions that have
declared that parameter 'n' is never going to be NULL, and then have the
temerity to make sure that that is true.

The problem is that we have many macros that are generalized enough to
handle the case where the parameter is NULL.  It just happens that
sometimes they get called as well from code where it is known to be not
NULL.  We could write additional macros that are known to take a
non-NULL parameter and don't do the test.  The existing macros would be
rewritten to just call the new ones after checking for non-NULL.

But that would complicate our code, and a main point of compilers doing
optimization is to figure out and remove impossible cases, without the
programmers having to be concerned with that.

Just turn off these warnings
This makes sure these macros exist for every current usage.
proto.h contains a generated PERL_ARGS_ASSERT macro for every function.
It asserts that each parameter that isn't allowed to be NULL actually
isn't.

These asserts are disabled when not DEBUGGING.  But many compilers allow
a compile-time assertion to be made for this situation, so we can add an
extra measure of protection for free.  And this gives hints to the
compiler for optimizations when the asserts() aren't there.
This parameter is always non-null when built with MULTIPLICITY.  At no
runtime cost, make sure the compiler knows that.
@tonycoz
Copy link
Contributor

tonycoz commented Sep 30, 2025

I did the following to suppress most of the warnings:

  • make GvIO(), isNAME_C_OR_POSIX() into inline functions
  • used an intermediate variable for the op_varname() call in op.c and the first one in peep.c
  • ASSUME() the intermediate variable in the second peep.c warning
  • use new_re instead of calling PM_GETRE() in pp_regcomp (pp_ctl.c)

I expect the remaining warnings could be resolved similarly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants