Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove grouping trick #205

Merged
merged 1 commit into from
Feb 3, 2024
Merged

Remove grouping trick #205

merged 1 commit into from
Feb 3, 2024

Conversation

dafeda
Copy link
Collaborator

@dafeda dafeda commented Jan 2, 2024

Resolves: #200

@dafeda dafeda marked this pull request as ready for review January 3, 2024 10:52
@dafeda dafeda self-assigned this Jan 3, 2024
@dafeda
Copy link
Collaborator Author

dafeda commented Jan 3, 2024

Ran with and without trick on a gaussian field with 1 million parameters, 50 observations and 25 realizations. The sigma=10 controls the smoothness of the field.
It runs faster without the trick.

    p = 1 mill, m = 50, N = 25, sigma = 10
    With trick:    1.8s, 1.7s, 1.9s (Mean)
    Without trick: 1.3s, 1.2s, 1.3s (Mean)

image

I also ran Drogon with and without the trick and it runs much faster without the trick
My conclusion for now is that the trick is faster when the field is very smooth which it does not seem to be in practice (Drogon).
So, I suggest we remove it.

@dafeda dafeda force-pushed the remove_grouping_trick branch from 52445e4 to a977d57 Compare January 3, 2024 10:55
@dafeda dafeda force-pushed the remove_grouping_trick branch 2 times, most recently from 9d672d3 to 5ad992d Compare January 3, 2024 11:31
continue
# Loop only over rows with significant correlations
for indices_of_row in np.where(significant_rows)[0]:
unique_row = significant_corr_XY[indices_of_row]

# Get the parameters (X) that have identical significant responses (Y)
X_subset = X[indices_of_row, :]
Copy link
Collaborator

@tommyod tommyod Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since indices_of_row is now an int, this variable X_subset is now a 1D array. I am surprised that an error is not raised or that the code does not fail? Since X or a subset should be a 2D array.

Perhaps look at this quickly before merging. Also, perhaps rename

indices_of_row -> parameter_num
unique_row -> correlated_responses

or something like that. No other comments from me. Looks good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have not looked into it deeply but I'm assuming broadcasting is helping us out in this: X_subset + cov_XY_subset @ T.
Thanks for the better variable names 👍

@dafeda dafeda force-pushed the remove_grouping_trick branch from 5ad992d to ca1b6e0 Compare January 4, 2024 07:37
@dafeda
Copy link
Collaborator Author

dafeda commented Jan 4, 2024

I want to run a final test using Drogon before merging, but we have some problems with bleeding.

@dafeda dafeda force-pushed the remove_grouping_trick branch from ca1b6e0 to fee5737 Compare February 2, 2024 13:02
It seems to only be faster when the Gaussian Field is very smooth,
which is not realistic for real fields.
@dafeda dafeda force-pushed the remove_grouping_trick branch from fee5737 to 124fd46 Compare February 2, 2024 13:16
@dafeda dafeda merged commit 92a0758 into equinor:main Feb 3, 2024
10 checks passed
@dafeda dafeda deleted the remove_grouping_trick branch February 3, 2024 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Benchmark to make sure groupby_indices is necessary
3 participants