Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#5383: macOS a64 client threads and private TLS #7300

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ndrewh
Copy link
Contributor

@ndrewh ndrewh commented Feb 24, 2025

Adds macOS ARM64 support for client threads and private TLS (under -private_loader -- though we don't implement a full private loader yet).

  • Separates macOS x86 and A64 private TLS into separate files. The new A64 private TLS uses TLS_TYPE_SLOT in a similar manner to the linux riscv TLS implementation.
  • A64 handler for new_bsdthread_intercept
  • Fix wrong exit syscall in client_thread_run (it currently exits the entire process rather than thread)

The end result is that client threads can start and terminate properly, and multithreaded applications can also terminate without crashing. I did not test attach/detach, but I'm guessing it's still broken (there is no injector implementation anyway iiuc)

cat_spin:
CALLC2(GLOBAL_REF(atomic_swap), x26, #1)
CALLC2(GLOBAL_REF(atomic_swap), x0, #1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to change since x26 is used on line 365

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, works out I think b/c it's in the 1st arg slot already, but it can be fragile to use the calling conv params in the args of CALL* b/c they can be clobbered by other args in arg setup.

@@ -1246,7 +1246,7 @@ signal_thread_inherit(dcontext_t *dcontext, void *clone_record)
* FIXME: are current pending or blocked inherited?
*/
#ifdef MACOS
if (record->app_thread_xsp != 0) {
if (record->app_thread_xsp == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this code but you'll get heap oob/uaf asserts if this condition is !=.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is freeing the clone record.
See the comment in clone_record_t:

#ifdef MACOS
    /* XXX i#1403: once we have lower-level, earlier thread interception we can
     * likely switch to something closer to what we do on Linux.
     * This is used for bsdthread_create, where app_thread_xsp is NULL;
     * for vfork, app_thread_xsp is non-NULL and this is unused.
     */

This change != to == looks suspect. Please double check vs this pasted comment: if bsdthread_create ends up different on aarch64 please update that comment; does it need separate handling a64 vs x86.

@ndrewh
Copy link
Contributor Author

ndrewh commented Feb 25, 2025

Some messy stuff I'm not sure if there's a better way to do:

  • In dynamo_thread_init we allocate temporary TLS until os_tls_init because we will SEGV in mig_get_reply_port sometimes when acquiring locks if thread register is NULL (this occurs in client threads, which do not inherit TLS).
  • In dynamo_thread_exit_common we cannot use app TLS for a similar reason, because the app TLS is free'd by pthread_terminate before we intercept the call to bsdthread_terminate.

You'll get a backtrace like this if we do not maintain a valid thread register during entry and exit.

* thread #12, stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
  * frame #0: 0x00000001983c24e4 libsystem_kernel.dylib`mig_get_reply_port + 24
    frame #1: 0x00000001983c5144 libsystem_kernel.dylib`semaphore_create + 52
    frame #2: 0x000000010121b59c libdynamorio.dylib`mutex_get_contended_event [inlined] ksynch_init_var(synch=0x0000000300e73ad8) at ksynch_macos.c:87:9 [opt]
    frame #3: 0x000000010121b580 libdynamorio.dylib`mutex_get_contended_event(lock=0x00000001012481c0) at ksynch_macos.c:162:14 [opt]
    frame #4: 0x000000010120cf6c libdynamorio.dylib`mutex_wait_contended_lock(lock=0x00000001012481c0, mc=0x0000000000000000) at os.c:10477:30 [opt]
    frame #5: 0x000000010109ff14 libdynamorio.dylib`d_r_mutex_lock [inlined] d_r_mutex_lock_app(lock=<unavailable>, mc=0x0000000000000000) at utils.c:884:9 [opt]
    frame #6: 0x000000010109fe88 libdynamorio.dylib`d_r_mutex_lock(lock=<unavailable>) at utils.c:897:5 [opt]
    frame #7: 0x0000000101082530 libdynamorio.dylib`dynamo_thread_init(dstack_in="", mc=0x0000000000000000, os_data=0x0000000300e73c00, client_thread=true) at dynamo.c:2290:5 [opt]
    frame #8: 0x000000010120795c libdynamorio.dylib`client_thread_run at os.c:4110:5 [opt]

@ndrewh ndrewh marked this pull request as ready for review February 26, 2025 18:16
@derekbruening
Copy link
Contributor

under -private_loader -- though we don't implement a full private loader yet

Probably orthogonal to this PR: but do you think it is possible to load private copies of library on OSX? #1285 has some discussion about whether it will ever work well. Xref #7312 has some discussion of alternatives.

Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing. I have mostly style comments but also some points need clarifying.

cat_spin:
CALLC2(GLOBAL_REF(atomic_swap), x26, #1)
CALLC2(GLOBAL_REF(atomic_swap), x0, #1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, works out I think b/c it's in the 1st arg slot already, but it can be fragile to use the calling conv params in the args of CALL* b/c they can be clobbered by other args in arg setup.

@@ -325,48 +325,6 @@ new_thread_setup(priv_mcontext_t *mc)
ASSERT_NOT_REACHED();
}

# if defined(MACOS) && defined(X86)
/* Called from new_bsdthread_intercept for targeting a bsd thread user function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the legacy name, this file "x86_code.c" is used with asm code on all arches and not just x86. I don't think this should be moved if the only reason was the file name. If you want to rename it to asm_aux.c or something like that, that seems reasonable.

@@ -85,6 +85,11 @@
# include "vmkuw.h"
#endif

#if defined(MACOS) && defined(AARCH64)
# include "unix/tls.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we avoid including private headers: os_public.h and os_exports.h are what are supposed to be public.

@@ -2246,6 +2257,19 @@ dynamo_thread_init(byte *dstack_in, priv_mcontext_t *mc, void *os_data,
return SUCCESS;
}

/* macOS aarch64 will sometimes crash when acquiring locks if TLS is NULL */
#if defined(MACOS) && defined(AARCH64)
void *tmp_tls = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this inside a unix/ file and export a function to call here.

@@ -2300,6 +2324,13 @@ dynamo_thread_init(byte *dstack_in, priv_mcontext_t *mc, void *os_data,
}

os_tls_init();
#if defined(MACOS) && defined(AARCH64)
if (tmp_tls) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: let's isolate TLS details like this inside unix/

return new_thread;
# else
/* NYI */
return -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add ASSERT_NOT_IMPLEMENTED and up put a XXX comment with an issue number.

# ifdef X64
/* Also update the pthread->fun and pthread->arg fields, since _pthread_start uses
* them instead of the syscall arg0 on some macOS versions */
ASSERT(sys_param(dcontext, 3) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: explicit bools

/* Also update the pthread->fun and pthread->arg fields, since _pthread_start uses
* them instead of the syscall arg0 on some macOS versions */
ASSERT(sys_param(dcontext, 3) &&
"bsdthread_create pthread argument should not be NULL");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ASSERT_MESSAGE

# ifdef X64
/* Also update the pthread->fun and pthread->arg fields, since _pthread_start uses
* them instead of the syscall arg0 on some macOS versions */
ASSERT(sys_param(dcontext, 3) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should ever assert on an app-supplied value: we want to be able to run any app. If the parameter is null, just skip this code gracefully. Use ASSERT_CURIOSITY if you want a non-fatal notification, or SYSLOG_INTERNAL_WARNING for a less visible one.

@@ -1246,7 +1246,7 @@ signal_thread_inherit(dcontext_t *dcontext, void *clone_record)
* FIXME: are current pending or blocked inherited?
*/
#ifdef MACOS
if (record->app_thread_xsp != 0) {
if (record->app_thread_xsp == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is freeing the clone record.
See the comment in clone_record_t:

#ifdef MACOS
    /* XXX i#1403: once we have lower-level, earlier thread interception we can
     * likely switch to something closer to what we do on Linux.
     * This is used for bsdthread_create, where app_thread_xsp is NULL;
     * for vfork, app_thread_xsp is non-NULL and this is unused.
     */

This change != to == looks suspect. Please double check vs this pasted comment: if bsdthread_create ends up different on aarch64 please update that comment; does it need separate handling a64 vs x86.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants