Skip to content

feat(fracmanager): implement fraction snapshots with wait group reference counting#277

Open
eguguchkin wants to merge 1 commit intomainfrom
275-fractions-snapshot
Open

feat(fracmanager): implement fraction snapshots with wait group reference counting#277
eguguchkin wants to merge 1 commit intomainfrom
275-fractions-snapshot

Conversation

@eguguchkin
Copy link
Contributor

Fixes #275


  • I have read and followed all requirements in CONTRIBUTING.md;
  • I used LLM/AI assistance to make this pull request;

@eguguchkin eguguchkin marked this pull request as draft November 24, 2025 15:07
@eguguchkin eguguchkin force-pushed the 275-fractions-snapshot branch 4 times, most recently from d7cfd54 to 6edb32d Compare November 25, 2025 08:16
@eguguchkin eguguchkin force-pushed the 156-new-lifecycle-manager branch 3 times, most recently from 79ea9fe to 4054ad1 Compare January 16, 2026 15:08
Base automatically changed from 156-new-lifecycle-manager to main January 16, 2026 15:46
@eguguchkin eguguchkin marked this pull request as ready for review January 19, 2026 10:08
@eguguchkin eguguchkin modified the milestone: v0.67.0 Jan 26, 2026
@eguguchkin eguguchkin force-pushed the 275-fractions-snapshot branch from 6edb32d to 38a7993 Compare January 27, 2026 14:30
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 80.23952% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.67%. Comparing base (1744059) to head (38a7993).

Files with missing lines Patch % Lines
fracmanager/fraction_registry.go 80.00% 10 Missing ⚠️
fracmanager/lifecycle_manager.go 62.50% 8 Missing and 1 partial ⚠️
fracmanager/fractions_snapshot.go 81.81% 5 Missing and 1 partial ⚠️
fracmanager/sync_appender.go 75.00% 4 Missing and 2 partials ⚠️
fracmanager/fracmanager.go 85.71% 0 Missing and 1 partial ⚠️
storeapi/grpc_async_search.go 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #277      +/-   ##
==========================================
+ Coverage   71.65%   71.67%   +0.02%     
==========================================
  Files         204      205       +1     
  Lines       14770    14735      -35     
==========================================
- Hits        10583    10562      -21     
+ Misses       3435     3423      -12     
+ Partials      752      750       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Contributor

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table
Name Previous Current Ratio Verdict
FindSequence_Random/small-4 174405 3d34b5
10280.15 MB/s 7073.38 MB/s 0.69 🔴
24.90 ns/op 36.19 ns/op 1.45 🔴
GenerateDocs-4 174405 3d34b5
952.20 ns/op 1142.00 ns/op 1.20 🔴

@dkharms dkharms requested review from cheb0 and forshev February 2, 2026 11:03
searchParams,
tr,
)
release()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If search panics, then release() is never called.


// Collect fractions in correct chronological order: from oldest (remote) to newest (active)
for _, r := range remotes {
r.wg = &fs.wg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the following scenario:

  • we already have a snapshot built - S1
  • search request arrives
  • while search request is being executed, snapshot rebuild is called. Let's call it S2
  • now we overwriting a wait group inside ...SyncDestroyable struct with S2 wait group.
  • at the same time, if we want to destroy a fraction, we call wg.Wait(). But we might issue wait for wait group S2. Since nobody has called wg.Add for a S2 wg, it will return immediately. However, search request might still be working and touch this fraction.

Copy link
Member

@cheb0 cheb0 Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, we issue s.wg = &fs.wg under fraction registry lock (only a single snapshot can be being under construction at the same time).
However, we read this wg group d.wg.Wait() in Destroy.

i.e. techically it's a data race if understand correctly - it's a unsynchronized read/write of a non-atomic variable. The whole question is what guarantees do we have in Go.

all []frac.Fraction // all fractions in creation order (read-only view)
muAll sync.RWMutex // protects active, all, and oldestTotal fields
appender *syncAppender // currently active writable fraction
all *fractionsSnapshot // all fractions in creation order (read-only view)
Copy link
Member

@cheb0 cheb0 Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have search request constantly arriving (let's say 5 requests per second) and taking same fraction snapshot. Can we potentially run into any problems?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactoring Fraction Lifecycle Management System

3 participants