Skip to content

Optimize predict() to remove O(n²) dataframe iteration and filtering #74

@midaa1

Description

@midaa1

Description

The predict() function currently builds the output dictionary using:

for index, row in df_data.iterrows():

Inside the loop, it repeatedly filters the entire dataframe to collect values for each (True X, True Y) pair:

df_data[
    (df_data["True X"] == row["True X"]) &
    (df_data["True Y"] == row["True Y"])
]

This causes:

   - O(n²) time complexity

Repeated work for duplicated (True X, True Y) combinations

Significant slowdown on medium/large datasets

Unnecessary memory allocations

Root Cause

The algorithm iterates row-by-row while performing full-dataframe filtering for each iteration, despite the fact that grouping by truth coordinates is already available.

Proposed Solution

Refactor the dictionary-building logic to:

Group data once using:

df_data.groupby("True XY'')

Iterate over each group only once.

Extract:

  • Predicted X

  • Predicted Y

  • Precomputed precision_xy

  • Precomputed accuracy_xy

Build the nested output dictionary directly from groups.

Expected Benefits

  • Reduce complexity from O(n²) → O(n)

  • Eliminate duplicated computations

  • Improve scalability for large datasets

  • Cleaner and more maintainable code

Acceptance Criteria

  • No iterrows() usage in this section.

  • No dataframe filtering inside loops.

  • Output structure remains identical.

  • Metrics (PrecisionSD, Accuracy) unchanged.

  • Performance improvement confirmed on larger datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions