-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a leak of open Blob files #13106
Conversation
Summary: ... Fixes facebook#13066 Important follow-up (FIXME): The added check discovered some apparent cases of leaked (into table_cache) SST file readers that would stick around until DB::Close(). Need to enable that check, diagnose, and fix. Test Plan: added a check that is called during DB::Close in ASAN builds (to minimize paying the cost in all unit tests). Without the fix, the check failed in at least these tests: ``` db_blob_basic_test DBBlobBasicTest.DynamicallyWarmCacheDuringFlush db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadMerge db_blob_compaction_test DBBlobCompactionTest.MergeBlobWithBase db_blob_compaction_test DBBlobCompactionTest.CompactionDoNotFillCache db_blob_compaction_test DBBlobCompactionTest.SkipUntilFilter db_blob_compaction_test DBBlobCompactionTest.CompactionFilter db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadFilter db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadGarbageCollection ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix @pdillinger !
db/db_impl/db_impl_debug.cc
Outdated
cfd->current()->AddLiveFiles(&live_table_files, &live_blob_files); | ||
} | ||
|
||
std::set<uint64_t> live_files; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting live_table_files
and live_blob_files
and just looking in the appropriate vector based on file type would be probably a bit more efficient (but no big deal since this is test/debug code)
db/version_builder.cc
Outdated
if (vs) { | ||
assert(ioptions); | ||
assert(!ioptions->cf_paths.empty()); | ||
assert(shared_meta); | ||
assert(bc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like cfd_
and thus bc
can be nullptr
. Should this be a check instead (if (bc) { bc->Evict(...); }
) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
// contains entries for live files, to check for effective leaks of open | ||
// files. This can only be called when purging of obsolete files has | ||
// "settled," such as during parts of DB Close(). | ||
void TEST_VerifyNoObsoleteFilesCached(bool db_mutex_already_held) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might have to move this declaration to the #ifdef NDEBUG
block above and #ifdef
the definition+call too to fix those linker errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see that db_impl_debug.cc is already in a giant ifdef
@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@pdillinger has updated the pull request. You must reimport the pull request before landing. |
@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
continue; | ||
} | ||
// Sneakily add both SST and blob files to the same list | ||
cfd->current()->AddLiveFiles(&live_files, &live_files); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one more question here: unlike the previous logic, this only adds the live files from the current Version; is it possible that when this method is called, we still have e.g. iterators or compactions holding on to earlier Versions? (Obsolete files would be the ones that are not needed by any Versions that are in use, not just the latest.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can only be called when purging of obsolete files has
"settled," such as during parts of DB Close().
You can't close with open iterators, but I should be more specific about multiple live versions outside of Close()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to T206372901
@pdillinger merged this pull request in a28cc4a. |
Summary: An earlier change (facebook@b34cef5) removed apparently unused functionality where an obsolete blob file number is passed for removal from TableCache, which manages SST files. This was actually relying on broken/fragile abstractions wherein TableCache and BlobFileCache share the same Cache and using the TableCache interface to manipulate blob file caching. No unit test was actually checking for removal of obsolete blob files from the cache (which is somewhat tricky to check and a second order correctness requirement). Here we fix the leak and add a DEBUG+ASAN-only check in DB::Close() that no obsolete files are lingering in the table/blob file cache. Fixes facebook#13066 Important follow-up (FIXME): The added check discovered some apparent cases of leaked (into table_cache) SST file readers that would stick around until DB::Close(). Need to enable that check, diagnose, and fix. Pull Request resolved: facebook#13106 Test Plan: added a check that is called during DB::Close in ASAN builds (to minimize paying the cost in all unit tests). Without the fix, the check failed in at least these tests: ``` db_blob_basic_test DBBlobBasicTest.DynamicallyWarmCacheDuringFlush db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadMerge db_blob_compaction_test DBBlobCompactionTest.MergeBlobWithBase db_blob_compaction_test DBBlobCompactionTest.CompactionDoNotFillCache db_blob_compaction_test DBBlobCompactionTest.SkipUntilFilter db_blob_compaction_test DBBlobCompactionTest.CompactionFilter db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadFilter db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadGarbageCollection ``` Reviewed By: ltamasi Differential Revision: D65296123 Pulled By: pdillinger fbshipit-source-id: 2276d76482beb2c75c9010bc1bec070bb23a24c0
Summary: An earlier change (facebook@b34cef5) removed apparently unused functionality where an obsolete blob file number is passed for removal from TableCache, which manages SST files. This was actually relying on broken/fragile abstractions wherein TableCache and BlobFileCache share the same Cache and using the TableCache interface to manipulate blob file caching. No unit test was actually checking for removal of obsolete blob files from the cache (which is somewhat tricky to check and a second order correctness requirement). Here we fix the leak and add a DEBUG+ASAN-only check in DB::Close() that no obsolete files are lingering in the table/blob file cache. Fixes facebook#13066 Important follow-up (FIXME): The added check discovered some apparent cases of leaked (into table_cache) SST file readers that would stick around until DB::Close(). Need to enable that check, diagnose, and fix. Pull Request resolved: facebook#13106 Test Plan: added a check that is called during DB::Close in ASAN builds (to minimize paying the cost in all unit tests). Without the fix, the check failed in at least these tests: ``` db_blob_basic_test DBBlobBasicTest.DynamicallyWarmCacheDuringFlush db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadMerge db_blob_compaction_test DBBlobCompactionTest.MergeBlobWithBase db_blob_compaction_test DBBlobCompactionTest.CompactionDoNotFillCache db_blob_compaction_test DBBlobCompactionTest.SkipUntilFilter db_blob_compaction_test DBBlobCompactionTest.CompactionFilter db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadFilter db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadGarbageCollection ``` Reviewed By: ltamasi Differential Revision: D65296123 Pulled By: pdillinger fbshipit-source-id: 2276d76482beb2c75c9010bc1bec070bb23a24c0
Summary: An earlier change (b34cef5) removed apparently unused functionality where an obsolete blob file number is passed for removal from TableCache, which manages SST files. This was actually relying on broken/fragile abstractions wherein TableCache and BlobFileCache share the same Cache and using the TableCache interface to manipulate blob file caching. No unit test was actually checking for removal of obsolete blob files from the cache (which is somewhat tricky to check and a second order correctness requirement). Here we fix the leak and add a DEBUG+ASAN-only check in DB::Close() that no obsolete files are lingering in the table/blob file cache. Fixes #13066 Important follow-up (FIXME): The added check discovered some apparent cases of leaked (into table_cache) SST file readers that would stick around until DB::Close(). Need to enable that check, diagnose, and fix. Pull Request resolved: #13106 Test Plan: added a check that is called during DB::Close in ASAN builds (to minimize paying the cost in all unit tests). Without the fix, the check failed in at least these tests: ``` db_blob_basic_test DBBlobBasicTest.DynamicallyWarmCacheDuringFlush db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadMerge db_blob_compaction_test DBBlobCompactionTest.MergeBlobWithBase db_blob_compaction_test DBBlobCompactionTest.CompactionDoNotFillCache db_blob_compaction_test DBBlobCompactionTest.SkipUntilFilter db_blob_compaction_test DBBlobCompactionTest.CompactionFilter db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadFilter db_blob_compaction_test DBBlobCompactionTest.CompactionReadaheadGarbageCollection ``` Reviewed By: ltamasi Differential Revision: D65296123 Pulled By: pdillinger fbshipit-source-id: 2276d76482beb2c75c9010bc1bec070bb23a24c0
Summary: Follow-up to facebook#13106 which revealed that some SST file readers (in addition to blob files) were being essentially leaked in TableCache (until DB::Close() time). Patched sources of leaks: * Flush that is not committed (builder.cc) * Various obsolete SST files picked up by directory scan but not caught by SubcompactionState::Cleanup() cleaning up from some failed compactions. Dozens of unit tests fail without the "backstop" TableCache::Evict() call in PurgeObsoleteFiles(). We also needed to adjust the check for leaks as follows: * Ok if DB::Open never finished (see comment) * Ok if deletions are disabled (see comment) * Allow "quarantined" files to be in table_cache because (presumably) they might become live again. * Get live files from all live Versions. Suggested follow-up: * Potentially delete more obsolete files sooner with a FIXME in db_impl_files.cc. This could potentially be high value because it seems to gate deletion of any/all newer obsolete files on all older compactions finishing. * Try to catch obsolete files in more places using the VersionSet::obsolete_files_ pipeline rather than relying on them being picked up with directory scan, or deleting them outside of normal mechanisms. Test Plan: updated check used in most all unit tests in ASAN build
Summary: Follow-up to facebook#13106 which revealed that some SST file readers (in addition to blob files) were being essentially leaked in TableCache (until DB::Close() time). Patched sources of leaks: * Flush that is not committed (builder.cc) * Various obsolete SST files picked up by directory scan but not caught by SubcompactionState::Cleanup() cleaning up from some failed compactions. Dozens of unit tests fail without the "backstop" TableCache::Evict() call in PurgeObsoleteFiles(). We also needed to adjust the check for leaks as follows: * Ok if DB::Open never finished (see comment) * Ok if deletions are disabled (see comment) * Allow "quarantined" files to be in table_cache because (presumably) they might become live again. * Get live files from all live Versions. Suggested follow-up: * Potentially delete more obsolete files sooner with a FIXME in db_impl_files.cc. This could potentially be high value because it seems to gate deletion of any/all newer obsolete files on all older compactions finishing. * Try to catch obsolete files in more places using the VersionSet::obsolete_files_ pipeline rather than relying on them being picked up with directory scan, or deleting them outside of normal mechanisms. Test Plan: updated check used in most all unit tests in ASAN build
Summary: Follow-up to #13106 which revealed that some SST file readers (in addition to blob files) were being essentially leaked in TableCache (until DB::Close() time). Patched sources of leaks: * Flush that is not committed (builder.cc) * Various obsolete SST files picked up by directory scan but not caught by SubcompactionState::Cleanup() cleaning up from some failed compactions. Dozens of unit tests fail without the "backstop" TableCache::Evict() call in PurgeObsoleteFiles(). We also needed to adjust the check for leaks as follows: * Ok if DB::Open never finished (see comment) * Ok if deletions are disabled (see comment) * Allow "quarantined" files to be in table_cache because (presumably) they might become live again. * Get live files from all live Versions. Suggested follow-up: * Potentially delete more obsolete files sooner with a FIXME in db_impl_files.cc. This could potentially be high value because it seems to gate deletion of any/all newer obsolete files on all older compactions finishing. * Try to catch obsolete files in more places using the VersionSet::obsolete_files_ pipeline rather than relying on them being picked up with directory scan, or deleting them outside of normal mechanisms. Pull Request resolved: #13117 Test Plan: updated check used in most all unit tests in ASAN build Reviewed By: hx235 Differential Revision: D65502988 Pulled By: pdillinger fbshipit-source-id: aa0795a8a09d9ec578d25183fe43e2a35849209c
Summary: An earlier change (b34cef5) removed apparently unused functionality where an obsolete blob file number is passed for removal from TableCache, which manages SST files. This was actually relying on broken/fragile abstractions wherein TableCache and BlobFileCache share the same Cache and using the TableCache interface to manipulate blob file caching. No unit test was actually checking for removal of obsolete blob files from the cache (which is somewhat tricky to check and a second order correctness requirement).
Here we fix the leak and add a DEBUG+ASAN-only check in DB::Close() that no obsolete files are lingering in the table/blob file cache.
Fixes #13066
Important follow-up (FIXME): The added check discovered some apparent cases of leaked (into table_cache) SST file readers that would stick around until DB::Close(). Need to enable that check, diagnose, and fix.
Test Plan: added a check that is called during DB::Close in ASAN builds (to minimize paying the cost in all unit tests). Without the fix, the check failed in at least these tests: