-
Notifications
You must be signed in to change notification settings - Fork 51
Description
Description
The predict() function currently builds the output dictionary using:
for index, row in df_data.iterrows():Inside the loop, it repeatedly filters the entire dataframe to collect values for each (True X, True Y) pair:
df_data[
(df_data["True X"] == row["True X"]) &
(df_data["True Y"] == row["True Y"])
]This causes:
- O(n²) time complexity
Repeated work for duplicated (True X, True Y) combinations
Significant slowdown on medium/large datasets
Unnecessary memory allocations
Root Cause
The algorithm iterates row-by-row while performing full-dataframe filtering for each iteration, despite the fact that grouping by truth coordinates is already available.
Proposed Solution
Refactor the dictionary-building logic to:
Group data once using:
df_data.groupby("True XY'')Iterate over each group only once.
Extract:
-
Predicted X
-
Predicted Y
-
Precomputed precision_xy
-
Precomputed accuracy_xy
Build the nested output dictionary directly from groups.
Expected Benefits
-
Reduce complexity from O(n²) → O(n)
-
Eliminate duplicated computations
-
Improve scalability for large datasets
-
Cleaner and more maintainable code
Acceptance Criteria
-
No iterrows() usage in this section.
-
No dataframe filtering inside loops.
-
Output structure remains identical.
-
Metrics (PrecisionSD, Accuracy) unchanged.
-
Performance improvement confirmed on larger datasets.