PE, RPE and related optimisations #926

shawnlaffan · 2024-03-11T02:02:53Z

No description provided.

Use a global precalc to get all the range inverse scores, set to zero if the branch length is zero. This avoids a lot of operations when summing the local range weights.

remove one variable

Use refaliasing and postfix for loops, and remove a Biodiverse::Utils call.

This takes time to calculate and is generally not needed by most calcs that depend on _calc_pe. Instead we can calculate it within calc_pe_lists, which is the user-facing calc that provides it.

The CANAPE tests were triggering undef warnings.

We have now weaned it.

…with first path Reduces the work done in the xsub.

The Tree class has a method to get a hash of lengths. This is cached so later calls are very cheap. By using this we can avoid many repeated calls to the get_length method. As fast as it is, they can add up across randomisations.

…state var Minor optimisation but avoids sub call costs. The state var should be rarely needed but we might as well not create a new one every time.

We have already calculated the count of labels unique to set 2 so if that is zero then we can return early.

This way we take advantage of any of the calc_abc variants

Avoid the triple negative.

We can calculate the weighted branch lengths from the local ranges and globally weighted branch lengths. This avoids much repeated summation and thus time.

The element count is needed many times so store it in a scalar instead of repeatedly getting it from the array.

The central variant currently needs special handling but this will be removed in a future commit.

Call calc_phylo_rpe2 where possible, otherwise process the branches ourselves. This moves the special case logic out of calc_phylo_rpe2.

The deleted branch was only needed by calc_phylo_rpe_central, and it now contains the same logic.

Avoid list overheads. We only have three options anyway.

These have large time overheads in some cases but are not always needed. For example the user might only need the PE scores.

Most of the time _calc_pe will have been called previously so the cache will exist. However there are occasions where this is not done, for example the range weighted phylo turnover calcs. That calc has also now been simplified.

shawnlaffan added 27 commits February 27, 2024 18:00

delete_params: avoid some looping

5c30781

Indices: optimise RPE2

36facdf

Use a global precalc to get all the range inverse scores, set to zero if the branch length is zero. This avoids a lot of operations when summing the local range weights.

simplify code a little

fcb8315

remove one variable

Indices: optimise calc_rpe2

fa72208

Use refaliasing and postfix for loops, and remove a Biodiverse::Utils call.

Indices: calc_phylo_rpe2: re-order some early exit conditions

6904a8f

Move comment, clarify it

ad97510

Indices: move PE_RANGELIST out of _calc_pe

36eda2f

This takes time to calculate and is generally not needed by most calcs that depend on _calc_pe. Instead we can calculate it within calc_pe_lists, which is the user-facing calc that provides it.

fix some index metadata

d025dc9

calc_phylo_rpe2: handle undef PE null score

44f0b56

The CANAPE tests were triggering undef warnings.

Indices metadata: calc_phylo_rpe2: remove dependency on calc_pe_lists

88734e1

We have now weaned it.

Indices get_path_lengths_to_root_node: populate the path length hash …

1bd027d

…with first path Reduces the work done in the xsub.

reduce calls to TreeNode->get_length

cac1d78

The Tree class has a method to get a hash of lengths. This is cached so later calls are very cheap. By using this we can avoid many repeated calls to the get_length method. As fast as it is, they can add up across randomisations.

Indices _calc_abc_dispatcher: reorder some logic, use an empty array …

1666094

…state var Minor optimisation but avoids sub call costs. The state var should be rarely needed but we might as well not create a new one every time.

Indices: generalise early return condition in _calc_phylo_abc_lists

d30dfea

We have already calculated the count of labels unique to set 2 so if that is zero then we can return early.

Indices: set _calc_phylo_abc_lists precalc to _calc_abc_any

9091eee

This way we take advantage of any of the calc_abc variants

Indices: simplify use_wts logic in _calc_phylo_mpd_mntd

65ad9ad

Avoid the triple negative.

Indices _calc_pe: refactor weights calcs

73d62d6

We can calculate the weighted branch lengths from the local ranges and globally weighted branch lengths. This avoids much repeated summation and thus time.

Indices _calc_pe: micro-optimise

6bb3e92

The element count is needed many times so store it in a scalar instead of repeatedly getting it from the array.

Indices calc_phylo_rpe2: follow _calc_pe caching approach

da2e198

The central variant currently needs special handling but this will be removed in a future commit.

Indices calc_phylo_rpe_central: process internally

58a4ed5

Call calc_phylo_rpe2 where possible, otherwise process the branches ourselves. This moves the special case logic out of calc_phylo_rpe2.

Indices calc_phylo_rpe_central: less nesting

b0197f8

Indices: calc_phylo_rpe2: cleanup if branch and less nesting

5239765

The deleted branch was only needed by calc_phylo_rpe_central, and it now contains the same logic.

_calc_abc_any: micro-optimise

80b033e

Avoid list overheads. We only have three options anyway.

Indices: list results from _calc_pe are now in _calc_pe_lists

711289f

These have large time overheads in some cases but are not always needed. For example the user might only need the PE scores.

remove some redundant commented code

9231550

Tests: change precision approach in Cluster2

f04a819

shawnlaffan merged commit ddc901f into master Mar 11, 2024
8 checks passed

shawnlaffan deleted the rpe_2024 branch March 11, 2024 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PE, RPE and related optimisations #926

PE, RPE and related optimisations #926

shawnlaffan commented Mar 11, 2024

PE, RPE and related optimisations #926

PE, RPE and related optimisations #926

Conversation

shawnlaffan commented Mar 11, 2024