Skip to content

Speed up large-index mapping by avoiding repeated unique and using fast integer dedupicator#33

Merged
cmutel merged 5 commits intobrightway-lca:mainfrom
romainsacchi:main
Feb 27, 2026
Merged

Speed up large-index mapping by avoiding repeated unique and using fast integer dedupicator#33
cmutel merged 5 commits intobrightway-lca:mainfrom
romainsacchi:main

Conversation

@romainsacchi
Copy link
Copy Markdown
Contributor

This PR fixes a performance bottleneck in matrix_utils when building large mapped matrices (especially technosphere).

In my large case (~4.46M index entries), almost all runtime was spent inside index deduplication for ArrayMapper, with np.unique taking hundreds of seconds on unsorted integer arrays.

What changed

  • ArrayMapper: For large integer arrays, use np.sort(pd.unique(array)) instead of plain np.unique(array). Keep np.unique as fallback/for smaller arrays.
  • MappedMatrix: Collect raw row/col indices from groups and let ArrayMapper deduplicate once.
  • Avoid redundant per-group unique calls before mapper construction.
  • ResourceGroup: Added raw mapping index accessors (row_indices_for_mapping, col_indices_for_mapping) used by MappedMatrix.

Result

On the same large technosphere case: build_tech_mm: ~500s -> ~0.65s

No other changes are intended. This should not break anything, but I have not tested that thoroughly.

@cmutel
Copy link
Copy Markdown
Member

cmutel commented Feb 26, 2026

It's crazy that using pandas is faster than numpy for unique, but this is apparently well known...

@cmutel
Copy link
Copy Markdown
Member

cmutel commented Feb 26, 2026

@jsvgoncalves This PR proposes including pandas in the core Brightway calculation path, with significant calculation speed benefits. I know you have strong opinions on this, care to weigh in?

@romainsacchi
Copy link
Copy Markdown
Contributor Author

There still seems to be a test failing (tests/monte_carlo.py::test_distributions_without_uncertainties), but it does not appear to be caused by the proposed changes.

@cmutel
Copy link
Copy Markdown
Member

cmutel commented Feb 27, 2026

The failing test is probabilistic - it should fail a small percentage of the time, or at least that the way it currently works.

@cmutel cmutel merged commit ec89c2f into brightway-lca:main Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants