Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PE, RPE and related optimisations #926

Merged
merged 27 commits into from
Mar 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
5c30781
delete_params: avoid some looping
shawnlaffan Feb 27, 2024
36facdf
Indices: optimise RPE2
shawnlaffan Feb 27, 2024
fcb8315
simplify code a little
shawnlaffan Feb 27, 2024
fa72208
Indices: optimise calc_rpe2
shawnlaffan Feb 27, 2024
6904a8f
Indices: calc_phylo_rpe2: re-order some early exit conditions
shawnlaffan Feb 27, 2024
ad97510
Move comment, clarify it
shawnlaffan Feb 27, 2024
36eda2f
Indices: move PE_RANGELIST out of _calc_pe
shawnlaffan Feb 27, 2024
d025dc9
fix some index metadata
shawnlaffan Feb 27, 2024
44f0b56
calc_phylo_rpe2: handle undef PE null score
shawnlaffan Feb 27, 2024
88734e1
Indices metadata: calc_phylo_rpe2: remove dependency on calc_pe_lists
shawnlaffan Feb 27, 2024
1bd027d
Indices get_path_lengths_to_root_node: populate the path length hash …
shawnlaffan Feb 28, 2024
cac1d78
reduce calls to TreeNode->get_length
shawnlaffan Feb 28, 2024
1666094
Indices _calc_abc_dispatcher: reorder some logic, use an empty array …
shawnlaffan Feb 28, 2024
d30dfea
Indices: generalise early return condition in _calc_phylo_abc_lists
shawnlaffan Mar 4, 2024
9091eee
Indices: set _calc_phylo_abc_lists precalc to _calc_abc_any
shawnlaffan Mar 4, 2024
65ad9ad
Indices: simplify use_wts logic in _calc_phylo_mpd_mntd
shawnlaffan Mar 6, 2024
73d62d6
Indices _calc_pe: refactor weights calcs
shawnlaffan Mar 6, 2024
6bb3e92
Indices _calc_pe: micro-optimise
shawnlaffan Mar 6, 2024
da2e198
Indices calc_phylo_rpe2: follow _calc_pe caching approach
shawnlaffan Mar 7, 2024
58a4ed5
Indices calc_phylo_rpe_central: process internally
shawnlaffan Mar 7, 2024
b0197f8
Indices calc_phylo_rpe_central: less nesting
shawnlaffan Mar 7, 2024
5239765
Indices: calc_phylo_rpe2: cleanup if branch and less nesting
shawnlaffan Mar 7, 2024
80b033e
_calc_abc_any: micro-optimise
shawnlaffan Mar 8, 2024
711289f
Indices: list results from _calc_pe are now in _calc_pe_lists
shawnlaffan Mar 8, 2024
9231550
remove some redundant commented code
shawnlaffan Mar 8, 2024
f04a819
Tests: change precision approach in Cluster2
shawnlaffan Mar 8, 2024
113cd2f
Indices: _calc_pe_lists: call _calc_pe if needed to populate cache
shawnlaffan Mar 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 1 addition & 10 deletions lib/Biodiverse/Common.pm
Original file line number Diff line number Diff line change
Expand Up @@ -447,16 +447,7 @@ sub weaken_param {
sub delete_params {
my $self = shift;

my $count = 0;
foreach my $param (@_) { # should only delete those that exist...
if (delete $self->{PARAMS}{$param}) {
$count ++;
print "Deleted parameter $param from $self\n"
if $self->get_param('PARAM_CHANGE_WARN');
}
} # inefficient, as we could use a hash slice to do all in one hit, but allows better feedback

return $count;
scalar delete @{$self->{PARAMS}}{@_};
}

# an internal apocalyptic sub. use only for destroy methods
Expand Down
35 changes: 14 additions & 21 deletions lib/Biodiverse/Indices/Indices.pm
Original file line number Diff line number Diff line change
Expand Up @@ -1703,22 +1703,13 @@ sub _calc_abc_any {
my $self = shift;

my $cache_hash = $self->get_param('AS_RESULTS_FROM_LOCAL');
my $cache_key
= List::Util::first {defined $cache_hash->{$_}}
(qw/calc_abc calc_abc2 calc_abc3/);

# say STDERR 'NO previous cache key'
# if !$cache_key;
my $results = $cache_hash->{calc_abc}
// $cache_hash->{calc_abc2}
// $cache_hash->{calc_abc3}
// $self->calc_abc(@_);

# fall back to calc_abc if nothing had an explicit abc dependency
my $cached = $cache_key
? $cache_hash->{$cache_key}
: $self->calc_abc(@_);

croak 'No previous calc_abc results found'
if !$cached;

return wantarray ? %$cached : $cached;
return wantarray ? %$results : $results;
}

sub get_metadata_calc_abc {
Expand Down Expand Up @@ -1813,20 +1804,22 @@ sub _calc_abc_dispatcher {
// $args{label_list2}
);

state $empty_array = [];

return $self->_calc_abc_pairwise_mode(%args)
if $self->get_pairwise_mode
&& @{$args{element_list1} // []} == 1
&& @{$args{element_list2} // []} == 1
&& !$have_lb_lists;
if @{$args{element_list1} // $empty_array} == 1
&& @{$args{element_list2} // $empty_array} == 1
&& $self->get_pairwise_mode
&& !$have_lb_lists;

return $self->_calc_abc_hierarchical_mode(%args)
if $args{current_node_details}
&& !$have_lb_lists
&& $self->get_hierarchical_mode;
&& !$have_lb_lists
&& $self->get_hierarchical_mode;

return $self->_calc_abc(%args)
if is_hashref($args{element_list1})
|| @{$args{element_list1} // []} != 1
|| @{$args{element_list1} // $empty_array} != 1
|| defined $args{element_list2}
|| $have_lb_lists;

Expand Down
2 changes: 1 addition & 1 deletion lib/Biodiverse/Indices/PhyloCom.pm
Original file line number Diff line number Diff line change
Expand Up @@ -347,7 +347,7 @@ sub _calc_phylo_mpd_mntd {
# Save some cycles if all the weights are the same.
# If we ever implement dissim then we can also check label_hash2.
if ($use_wts && $label_hashrefs_are_same) {
if (not List::Util::any {$_ != 1} values %$label_hash1) {
if (List::Util::all {$_ == 1} values %$label_hash1) {
$use_wts = undef;
}
}
Expand Down
67 changes: 41 additions & 26 deletions lib/Biodiverse/Indices/Phylogenetic.pm
Original file line number Diff line number Diff line change
Expand Up @@ -664,10 +664,18 @@ sub get_path_lengths_to_root_node {
# but first option is faster still
my $len_hash = $tree_ref->get_node_length_hash;
if (HAVE_BD_UTILS_108) {
# get keys and vals in one call
Biodiverse::Utils::XS::add_hash_keys_and_vals_until_exists_AoA (
$path_hash, \@collected_paths, $len_hash,
);
# sometimes there are no paths
if (@collected_paths) {
# direct assign the first (could do first few, or the longest but that needs another scan)
my $first = shift @collected_paths;
@$path_hash{@$first} = @$len_hash{@$first};
if (@collected_paths) {
# get keys and vals in one call
Biodiverse::Utils::XS::add_hash_keys_and_vals_until_exists_AoA(
$path_hash, \@collected_paths, $len_hash,
);
}
}
}
elsif (HAVE_BD_UTILS) {
Biodiverse::Utils::copy_values_from ($path_hash, $len_hash);
Expand Down Expand Up @@ -772,7 +780,8 @@ sub get_metadata_calc_pe_lists {
name => 'Phylogenetic Endemism lists',
reference => 'Rosauer et al (2009) Mol. Ecol. https://doi.org/10.1111/j.1365-294X.2009.04311.x',
type => 'Phylogenetic Endemism Indices',
pre_calc => ['_calc_pe'],
pre_calc => [qw /_calc_pe_lists/],
pre_calc_global => [ 'get_node_range_hash' ],
uses_nbr_lists => 1,
distribution => 'nonnegative',
indices => {
Expand Down Expand Up @@ -800,6 +809,12 @@ sub calc_pe_lists {

my @keys = qw /PE_WTLIST PE_RANGELIST PE_LOCAL_RANGELIST/;
my %results = %args{@keys};
# PE_RANGELIST used to be done in _calc_pe
if (!$results{PE_RANGELIST}) {
\my %ranges = $args{node_range};
my %h = %ranges{keys %{$results{PE_WTLIST}}};
$results{PE_RANGELIST} = \%h;
}

return wantarray ? %results : \%results;
}
Expand Down Expand Up @@ -830,7 +845,7 @@ END_PEC_DESC
name => 'Phylogenetic Endemism central',
reference => 'Rosauer et al (2009) Mol. Ecol. https://doi.org/10.1111/j.1365-294X.2009.04311.x',
type => 'Phylogenetic Endemism Indices',
pre_calc => [qw /_calc_pe _calc_phylo_abc_lists/],
pre_calc => [qw /_calc_pe _calc_pe_lists _calc_phylo_abc_lists/],
pre_calc_global => [qw /get_trimmed_tree/],
uses_nbr_lists => 1, # how many lists it must have
formula => $formula,
Expand Down Expand Up @@ -888,7 +903,7 @@ END_PEC_DESC
name => 'Phylogenetic Endemism central lists',
reference => 'Rosauer et al (2009) Mol. Ecol. https://doi.org/10.1111/j.1365-294X.2009.04311.x',
type => 'Phylogenetic Endemism Indices',
pre_calc => [qw /_calc_pe _calc_phylo_abc_lists/],
pre_calc => [qw /_calc_pe calc_pe_lists _calc_phylo_abc_lists/],
uses_nbr_lists => 1, # how many lists it must have
distribution => 'nonnegative',
indices => {
Expand Down Expand Up @@ -1100,7 +1115,7 @@ sub get_metadata_calc_pe_clade_contributions {
name => 'PE clade contributions',
reference => '',
type => 'Phylogenetic Endemism Indices',
pre_calc => ['_calc_pe', 'get_sub_tree_as_hash'],
pre_calc => [qw/_calc_pe _calc_pe_lists get_sub_tree_as_hash/],
pre_calc_global => ['get_trimmed_tree'],
uses_nbr_lists => 1,
distribution => 'nonnegative', # default
Expand Down Expand Up @@ -1417,7 +1432,7 @@ EOD
name => 'Phylogenetic Endemism single',
reference => 'Rosauer et al (2009) Mol. Ecol. https://doi.org/10.1111/j.1365-294X.2009.04311.x',
type => 'Phylogenetic Endemism Indices',
pre_calc => ['_calc_pe'],
pre_calc => [qw/_calc_pe calc_pe_lists/],
pre_calc_global => ['get_trimmed_tree'],
uses_nbr_lists => 1,
indices => {
Expand Down Expand Up @@ -1768,14 +1783,14 @@ sub get_inverse_range_weighted_path_lengths {
my %args = @_;

my $tree = $args{tree_ref};
my $node_ranges = $args{node_range};

\my %node_ranges = $args{node_range};
\my %node_length_hash = $tree->get_node_length_hash;

my %range_weighted;

foreach my $node ($tree->get_node_refs) {
my $name = $node->get_name;
next if !$node_ranges->{$name};
$range_weighted{$name} = $node->get_length / $node_ranges->{$name};

foreach my $name (keys %node_length_hash) {
next if !$node_ranges{$name};
$range_weighted{$name} = $node_length_hash{$name} / $node_ranges{$name};
}

my %results = (inverse_range_weighted_node_lengths => \%range_weighted);
Expand Down Expand Up @@ -2653,7 +2668,7 @@ sub get_metadata__calc_phylo_abc_lists {
name => 'Phylogenetic ABC lists',
description => 'Calculate the sets of shared and not shared branches between two sets of labels',
type => 'Phylogenetic Indices',
pre_calc => 'calc_abc',
pre_calc => '_calc_abc_any',
pre_calc_global => [qw /get_trimmed_tree get_path_length_cache set_path_length_cache_by_group_flag/],
uses_nbr_lists => 1, # how many sets of lists it must have
required_args => {tree_ref => 1},
Expand All @@ -2679,7 +2694,9 @@ sub _calc_phylo_abc_lists {
el_list => $args{element_list1},
);

if (!@{$args{element_list2}}) {
# $args{C} is the count of labels unique to set 2
# so if it is zero then we can short-circuit.
if ($args{C} == 0) {
my $res = {
PHYLO_A_LIST => {},
PHYLO_B_LIST => $nodes_in_path1,
Expand All @@ -2688,14 +2705,12 @@ sub _calc_phylo_abc_lists {
return wantarray ? %$res : $res;
}

my $nodes_in_path2 = @{$args{element_list2}}
? $self->get_path_lengths_to_root_node (
%args,
labels => $label_hash2,
tree_ref => $tree,
el_list => $args{element_list2},
)
: {};
my $nodes_in_path2 = $self->get_path_lengths_to_root_node (
%args,
labels => $label_hash2,
tree_ref => $tree,
el_list => $args{element_list2},
);

my %results;
# one day we can clean this all up
Expand Down
Loading
Loading