Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add image cache to avoid calling dladdr() and add libbacktrace elf image list refresh #674

1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ set_option(TRACY_NO_CRASH_HANDLER "Disable crash handling" OFF)
set_option(TRACY_TIMER_FALLBACK "Use lower resolution timers" OFF)
set_option(TRACE_CLIENT_LIBUNWIND_BACKTRACE "Use libunwind backtracing where supported" OFF)
set_option(TRACY_SYMBOL_OFFLINE_RESOLVE "Instead of full runtime symbol resolution, only resolve the image path and offset to enable offline symbol resolution" OFF)
set_option(TRACY_ENABLE_IMAGE_CACHE "On glibc platforms, when doing offline symbol resolution, use an cache to determine the image path and offset instead of dladdr()" OFF)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for keeping this as an option?

Copy link
Author

@tiago-rodrigues tiago-rodrigues Dec 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of that option should have included the runtime resolve case. For offline symbol resolving, having the image cache be used by default sounds ok to me.
I wasn't sure what to do with the runtime resolve case, where the image cache is used to detect new images were loaded and trigger libbacktrace state recreation. Maybe the option should be renamed something like "TRACY_ENABLE_LIBBACKTRACE_DYNAMIC_LOADED_IMAGES" and only control enabling that code for the runtime resolve case. It could then be disable by default as maybe not too many users needed (if they don't do dynamic loading of shared libs) and also knowing that it comes with libbacktrace state leaking?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every program, even helloworldware, will do dynamic loading of shared libs. For example, when I put a breakpoint on elf_add in libbacktrace in the tracy test application, I see hits on /proc/self/exe, /usr/lib/libdebuginfod.so.1, /usr/lib/libstdc++.so.6, /usr/lib/libm.so.6, /usr/lib/libgcc_s.so.1, /usr/lib/libc.so.6, /usr/lib/libcurl.so.4, /usr/lib/libelf.so.1, /lib64/ld-linux-x86-64.so.2, and so on.

The call stack for each of these library loads is:

#0  tracy::elf_add (state=0x7fffee270080, filename=0x7fffee260120 "/usr/lib/libgcc_s.so.1", descriptor=232, memory=0x0, memory_size=0, base_address=140737350017024, error_callback=0x555555603dd0 <tracy::CallstackErrorCb(void*, char const*, int)>, data=0x0, fileline_fn=0x7fffef27e608, found_sym=0x7fffef27e754, found_dwarf=0x7fffef27e604, fileline_entry=0x0, exe=0, debuginfo=0, with_buildid_data=0x0, with_buildid_size=0) at ../public/libbacktrace/elf.cpp:6578
#1  0x000055555560f2a1 in tracy::phdr_callback (info=0x7fffee3c00d0, pdata=0x7fffef27e708) at ../public/libbacktrace/elf.cpp:7411
#2  0x000055555560c80c in tracy::backtrace_initialize (state=0x7fffee270080, filename=0x555555633d60 "/proc/self/exe", descriptor=232, error_callback=0x555555603dd0 <tracy::CallstackErrorCb(void*, char const*, int)>, data=0x0, fileline_fn=0x7fffef27e7f8) at ../public/libbacktrace/elf.cpp:7459
#3  0x0000555555609bc8 in tracy::fileline_initialize (state=0x7fffee270080, error_callback=0x555555603dd0 <tracy::CallstackErrorCb(void*, char const*, int)>, data=0x0) at ../public/libbacktrace/fileline.cpp:264
#4  0x000055555560340d in tracy::backtrace_pcinfo (state=0x7fffee270080, pc=93824992316766, callback=0x555555603780 <tracy::CallstackDataCb(void*, unsigned long, unsigned long, char const*, int, char const*)>, error_callback=0x555555603dd0 <tracy::CallstackErrorCb(void*, char const*, int)>, data=0x0) at ../public/libbacktrace/fileline.cpp:298
#5  0x00005555555f2eea in tracy::DecodeCallstackPtr (ptr=93824992316766) at ../public/client/TracyCallstack.cpp:1075
#6  0x00005555555ede59 in tracy::Profiler::HandleSymbolQueueItem (this=0x555555644700 <tracy::s_profiler>, si=...) at ../public/client/TracyProfiler.cpp:3266
#7  0x00005555555f4b3a in tracy::Profiler::SymbolWorker (this=0x555555644700 <tracy::s_profiler>) at ../public/client/TracyProfiler.cpp:3388
#8  0x000055555562afc5 in tracy::Profiler::LaunchSymbolWorker (ptr=0x555555644700 <tracy::s_profiler>) at ../public/common/../client/TracyProfiler.hpp:796
#9  0x000055555562c6ad in tracy::Thread::Launch (ptr=0x7ffff63600e0) at ../public/common/../client/TracyThread.hpp:80

Note that DecodeCallstackPtr is on the stack here, so I'm not sure of your reasoning in that sentence:

This cache can also be used in the runtime symbol resolution case to detect we should recreate "backtrace_state" when new images have been loaded, as libbacktrace symbol resolution will only consider images loaded before the first symbol resolve operation.

Can you tell me why you think the libbacktrace state should be reset when a new library appears? The trace above show that the library should handle this out of the box.

Copy link
Author

@tiago-rodrigues tiago-rodrigues Dec 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, that is strange. I'll unoptimize the code and do some stepping again in my case to make sure what I was seeing was actually correct (I'll hopefully get to it later today). But after the first symbol resolve was hit, I was missing symbol resolving for shared libraries loaded after that. Looking through the libbacktrace code I could see it was using backtrace_state::fileline_fn as the control value to already called backtrace_initialize() (which was what was responsible for the shared library indexing), and that was set to something non-null after the call. That doesn't appear to be the case for you though, maybe I misinterpreted the results in the debugger with optimized code and something else was causing the issues.

Copy link
Author

@tiago-rodrigues tiago-rodrigues Dec 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to preface this comment, by admitting that the code path I've been testing with out main app is still using a 0.9.1 client code branch with extra cherry-picks on top (we haven't been able to move all workflows to use 0.10 just yet), so it's not really the same test with the HEAD code you did (I'll try to do this with HEAD code tomorrow), however I don't see many changes on the libbacktrace code that would have changed this behavior.
I only see it hit fileline_initialize then backtrace_initialize once on the first resolve, because it will then set state->fileline_fn at the end of fileline_initialize() and that will prevent it from calling backtrace_initialize on any subsequent calls.
Maybe you recall if something was changed between 0.9.1 and HEAD that would have allowed backtrace_initialize be called multiple times?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as I understand it, the problem you have is that shared objects that are dlopened at a later time are not visible because libbacktrace has already collected a list of shared objects that the executable is linking to and will not look for more

exactly.

This feels like something libbacktrace should handle itself (in an ideal world).

I guess before I started looking more closely at the code I didn't realize you had already modified that libbacktrace code quite a bit and it wasn't just a dump from upstream.

you already know when you want to reset libbacktrace state. Have you tried running dl_iterate_phdr to call phdr_callback instead?

Yes, libacktrace code is already doing pretty much what I was doing once, so I can try to expose an update method I can call from outside - I'll try to do this this week and update this PR.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess before I started looking more closely at the code I didn't realize you had already modified that libbacktrace code quite a bit and it wasn't just a dump from upstream.

There have been some changes that add some sorely missing features, but there have been no substantial changes. The changes to dl_iterate_phdr are technical: 7e8961d

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, libacktrace code is already doing pretty much what I was doing once, so I can try to expose an update method I can call from outside - I'll try to do this this week and update this PR.

Please check if it would be feasible to implement this update functionality within libbacktrace itself.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to dl_iterate_phdr are technical: 7e8961d

I think this was needed because elf_add can now call debuginfod, which might issue network requests, and this didn't play well with the requirements of dl_iterate_phdr.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if it would be feasible to implement this update functionality within libbacktrace itself.

I made an attempt at it in the latest commit


if(NOT TRACY_STATIC)
target_compile_definitions(TracyClient PRIVATE TRACY_EXPORTS)
Expand Down
163 changes: 157 additions & 6 deletions public/client/TracyCallstack.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,124 @@ extern "C" const char* ___tracy_demangle( const char* mangled )
#endif
#endif

#if defined( TRACY_ENABLE_IMAGE_CACHE) && (TRACY_HAS_CALLSTACK == 3)
# define TRACY_USE_IMAGE_CACHE
# include <link.h>
#endif

#ifdef TRACY_USE_IMAGE_CACHE
// when we have access to dl_iterate_phdr(), we can build a cache of address ranges to image paths
// so we can quickly determine which image an address falls into.
// We refresh this cache only when we hit an address that doesn't fall into any known range.
class ImageCache
{
public:
struct ImageEntry
{
ImageEntry( void* startAddress, void* endAddress, const char* name )
: m_startAddress( startAddress ), m_endAddress( endAddress ), m_name( name ) {}

void* m_startAddress;
void* m_endAddress;
const char* m_name;
};

ImageCache()
{
m_images = (tracy::FastVector<ImageEntry>*)tracy::tracy_malloc( sizeof( tracy::FastVector<ImageEntry> ) );
new(m_images) tracy::FastVector<ImageEntry>( 512 );

Refresh();
}
const ImageEntry* GetImageForAddress( void* address )
{
const ImageEntry* entry = GetImageEntryForAddress( address );
if( !entry )
{
//printf("* addr not found: %p (%d entries) refreshing (m_numberOfRefreshes: %d)\n",
// address, m_images->size(), m_numberOfRefreshes);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove debug code or convert to TracyDebug().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Refresh();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a mechanism that will prevent constant reloads when a large set of unmapped addresses is queried?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't one atm. The expectation would be that it wouldn't actually be called that often in real scenarios. In my case with a few hundred dynamic loaded shared libs, and instrumenting all memory allocations with a callstack captures I only get it called ~5 times until it actually has the full map as symbol resolution is relatively slow in itself.
What would be your suggestion, a "min ms between refreshes" and either skip (and return unknown symbols for that period) or "stall" the resolution thread for a period or time?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stalling to allow batch processing of a larger set may be a solution. But I was more interested in whether you have considered this case. I don't think this is immediately actionable.

return GetImageEntryForAddress( address );
}
return entry;
}

typedef void (*RefreshCallback)();
void SetOnRefreshCallback( RefreshCallback callback )
{
m_onRefreshCallback = callback;
}

private:
tracy::FastVector<ImageEntry>* m_images;
const char* m_imageName = nullptr;
RefreshCallback m_onRefreshCallback;
uint32_t m_numberOfRefreshes = 0;

static int Callback( struct dl_phdr_info* info, size_t size, void* data )
{
ImageCache* cache = reinterpret_cast<ImageCache*>( data );

ImageEntry* image = cache->m_images->push_next();
image->m_startAddress = reinterpret_cast<void*>( info->dlpi_addr );
const uint32_t headerCount = info->dlpi_phnum;
assert( headerCount > 0);
image->m_endAddress = reinterpret_cast<void*>( info->dlpi_addr +
info->dlpi_phdr[info->dlpi_phnum - 1].p_vaddr + info->dlpi_phdr[info->dlpi_phnum - 1].p_memsz);

// the base executable name isn't provided when iterating with dl_iterate_phdr,
// so we must get it in an alternative way and cache it
if( info->dlpi_name && info->dlpi_name[0] != '\0' )
{
image->m_name = info->dlpi_name;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this assignment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point - copied it there as well.
I now free and rebuild images names with every refresh for simplicity - would probably be better if I just recreated the new image entries, although it would require searching and comparing the paths to be sure so not sure it's worth it.

}
else
{
if( !cache->m_imageName )
{
Dl_info dlInfo;
if( dladdr( (void *)info->dlpi_addr, &dlInfo ) )
{
cache->m_imageName = dlInfo.dli_fname;
}
}
image->m_name = cache->m_imageName;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is image->m_name valid after this function returns? My guess would be that the info argument is only valid inside this function, and the dlInfo provided filename will be invalidated after the next call to dladdr.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point - fixed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this memory released?


//printf("\tdl_iterate_phdr:'%s', start:%p, end:%p (headerCount: %d)'\n",
// image->m_name, image->m_startAddress, image->m_endAddress, headerCount);

return 0;
}

void Refresh()
{
m_images->clear();

dl_iterate_phdr( Callback, this );

std::sort( m_images->begin(), m_images->end(),
[]( const ImageEntry& lhs, const ImageEntry& rhs ) { return lhs.m_startAddress > rhs.m_startAddress; } );

if( m_onRefreshCallback ) m_onRefreshCallback();

m_numberOfRefreshes++;
}

const ImageEntry* GetImageEntryForAddress( void* address ) const
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to GetImageForAddressImpl.

{
auto it = std::lower_bound( m_images->begin(), m_images->end(), address,
[]( const ImageEntry& lhs, const void* rhs ) { return lhs.m_startAddress > rhs; } );

if( it != m_images->end() && address < it->m_endAddress )
{
return &(*it);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return it?

}
return nullptr;
}
};
#endif //#ifdef TRACY_USE_IMAGE_CACHE

namespace tracy
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for keeping the above code outside the tracy namespace?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, moved it under

{

Expand Down Expand Up @@ -607,6 +725,9 @@ struct backtrace_state* cb_bts = nullptr;
int cb_num;
CallstackEntry cb_data[MaxCbTrace];
int cb_fixup;
#ifdef TRACY_USE_IMAGE_CACHE
static ImageCache* s_imageCache = nullptr;
#endif //#ifdef TRACY_USE_IMAGE_CACHE

#ifdef TRACY_DEBUGINFOD
debuginfod_client* s_debuginfod;
Expand Down Expand Up @@ -779,6 +900,13 @@ void InitCallstackCritical()

void InitCallstack()
{
InitRpmalloc();

#ifdef TRACY_USE_IMAGE_CACHE
s_imageCache = (ImageCache*)tracy::tracy_malloc( sizeof( ImageCache ) );
new(s_imageCache) ImageCache();
#endif //#ifdef TRACY_USE_IMAGE_CACHE

#ifndef TRACY_SYMBOL_OFFLINE_RESOLVE
s_shouldResolveSymbolsOffline = ShouldResolveSymbolsOffline();
#endif //#ifndef TRACY_SYMBOL_OFFLINE_RESOLVE
Expand All @@ -789,7 +917,15 @@ void InitCallstack()
}
else
{
#ifdef TRACY_USE_IMAGE_CACHE
s_imageCache->SetOnRefreshCallback([]()
{
// FIXME: there is no backtrace_destroy_state(), so we are forced to leak...
cb_bts = backtrace_create_state( nullptr, 0, nullptr, nullptr );
});
#else
cb_bts = backtrace_create_state( nullptr, 0, nullptr, nullptr );
#endif // #ifdef TRACY_USE_IMAGE_CACHE
}

#ifndef TRACY_DEMANGLE
Expand Down Expand Up @@ -869,6 +1005,9 @@ debuginfod_client* GetDebuginfodClient()

void EndCallstack()
{
#ifdef TRACY_USE_IMAGE_CACHE
tracy::tracy_free(s_imageCache);
#endif //#ifdef TRACY_USE_IMAGE_CACHE
#ifndef TRACY_DEMANGLE
___tracy_free_demangle_buffer();
#endif
Expand Down Expand Up @@ -1041,12 +1180,12 @@ void SymInfoError( void* /*data*/, const char* /*msg*/, int /*errnum*/ )
cb_data[cb_num-1].symAddr = 0;
}

void GetSymbolForOfflineResolve(void* address, Dl_info& dlinfo, CallstackEntry& cbEntry)
void GetSymbolForOfflineResolve(void* address, uint64_t imageBaseAddress, CallstackEntry& cbEntry)
{
// tagged with a string that we can identify as an unresolved symbol
cbEntry.name = CopyStringFast( "[unresolved]" );
// set .so relative offset so it can be resolved offline
cbEntry.symAddr = (uint64_t)address - (uint64_t)(dlinfo.dli_fbase);
cbEntry.symAddr = (uint64_t)address - imageBaseAddress;
cbEntry.symLen = 0x0;
cbEntry.file = CopyStringFast( "[unknown]" );
cbEntry.line = 0;
Expand All @@ -1057,17 +1196,29 @@ CallstackEntryData DecodeCallstackPtr( uint64_t ptr )
InitRpmalloc();
if( ptr >> 63 == 0 )
{
const char* symloc = nullptr;
const char* imageName = nullptr;
uint64_t imageBaseAddress = 0x0;

#ifdef TRACY_USE_IMAGE_CACHE
const auto* image = s_imageCache->GetImageForAddress((void*)ptr);
if( image )
{
imageName = image->m_name;
imageBaseAddress = uint64_t(image->m_startAddress);
}
#else
Dl_info dlinfo;
if( dladdr( (void*)ptr, &dlinfo ) )
{
symloc = dlinfo.dli_fname;
imageName = dlinfo.dli_fname;
imageBaseAddress = uint64_t( dlinfo.dli_fbase );
}
#endif

if( s_shouldResolveSymbolsOffline )
{
cb_num = 1;
GetSymbolForOfflineResolve( (void*)ptr, dlinfo, cb_data[0] );
GetSymbolForOfflineResolve( (void*)ptr, imageBaseAddress, cb_data[0] );
}
else
{
Expand All @@ -1078,7 +1229,7 @@ CallstackEntryData DecodeCallstackPtr( uint64_t ptr )
backtrace_syminfo( cb_bts, ptr, SymInfoCallback, SymInfoError, nullptr );
}

return { cb_data, uint8_t( cb_num ), symloc ? symloc : "[unknown]" };
return { cb_data, uint8_t( cb_num ), imageName ? imageName : "[unknown]" };
}
#ifdef __linux
else if( s_kernelSym )
Expand Down
Loading