Skip to content

Comments

Fix: Stratified Splitting to Prevent Calibration Crash#58

Open
thechillbasu wants to merge 2 commits intoruxailab:mainfrom
thechillbasu:fix/stratified-split-logic
Open

Fix: Stratified Splitting to Prevent Calibration Crash#58
thechillbasu wants to merge 2 commits intoruxailab:mainfrom
thechillbasu:fix/stratified-split-logic

Conversation

@thechillbasu
Copy link

Description

The calibration system was crashing for specific configurations with low sample counts (e.g., 5 points × 10 samples).

Technical Analysis:
The issue was triggered when clicking the "FINISH" button. The console showed: TypeError: Cannot read property 'PrecisionSD' of undefined.

This occurred because the backend's data splitting logic was random and non-stratified:

X_train_x, X_test_x, y_train_x, y_test_x = train_test_split(
    X_x, X_y, test_size=0.2, random_state=42
)

Note: Be aware that configurations with < 4 points may also face the Frontend Rendering Bug (points not showing up). See the corresponding Frontend Issue and PR (Issue #110 and PR #111).

For these specific combinations, the random split over the pooled dataset deterministically excludes entire points from the test set. Verified failures with random_state=42:

Points Samples per Point Missing Point (Excluded from Test Set)
3 12 Point 0
4 15 Point 1
5 10 Point 0
6 10 Point 2
8 10 Point 5
8 17 Point 5

Reasoning: The fixed random seed (42) causes the random sampler to pick a set of indices that, for these total sample sizes, happens to completely bypass the index range of the missing point. This leaves the test set empty for that specific class. Consequently, the backend returns predictions missing keys, crashing the frontend.

Solution and Reasoning

I implemented stratified splitting in gaze_tracker.py.

Why this fixes it:
By stratifying the split based on the target class (Point IDs), we ensure that every calibration point is represented in both the Training and Test sets, even for small sample sizes.

train_test_split(..., stratify=y)

This guarantees that the backend always returns complete metrics for all points, preventing the frontend crash.

Before and After

Before (Bug)

The application crashes on "FINISH".

before-fix-stratified.mov

After (Fix)

The application successfully processes data and redirects.

after-fix-stratified.mov

Closes #57

@Nirvanjha2004
Copy link

Hii @thechillbasu , is there any specific reason why you changed the variable name or am i considering anything wrong ?

PS :- X_train_x is changed to X_X_train

@thechillbasu
Copy link
Author

Hey @Nirvanjha2004 ! There wasn't any specific reason. To be honest, it was more of an oversight and a mistake from my end. Thank you so much for pointing it out. I will change it back to the original naming convention. Would you like to give some more feedback which I could learn from?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛[Bug] : Calibration Failure with Low Sample Counts (e.g., 5x10)

2 participants