Type summary function #67

dosumis · 2021-05-19T10:01:23Z

Aim: Given a list of Individuals or Classes, provide a summary of the classes present

from vfb_connect.cross_server_tools import VfbConnect
vc = VfbConnect()

visPN2DC = vc.get_connected_neurons_by_type(upstream_type='visual projection neuron',
                                 downstream_type='adult descending neuron',
                                 weight=10).sort_values('weight', ascending=False)

Histogram of types returned includes leaf nodes and subsuming classes, e.g. LC4, 14, 9 and 10 are subsumed by 'lobular columnar neuron'

In this case, all cells are subclasses of 'visual projection neuron' - so mapping up to that class would tell us nothing - but mapping up to a class below 'adult visual projection neuron' would be useful:

The problem, of course, is how we specify what classes we should map up to? We could allow user input of such classes, but I think that would be asking too much of our users. Is there an algorithm we can apply which selects some informative/representative set of subsuming classes to map up to? Maybe something that could take some tuning variable specifying degree of abstraction?

hkir-dev · 2021-05-19T13:15:42Z

This problem looked similar to the "where to cut" problem in hierarchical clustering. Our query results generate a sub-taxonomy. If we think top-down, we should stop at some point which is not too specific (over-fitting) and not too generic.
Our problem is, we don't have a metric to evaluate the sufficient abstraction level. Seem a little subjective.
I will try scipy hierarchical clustering to see if it will provide meaningful abstractions.

dosumis · 2021-05-19T13:39:03Z

Agree this is underspecified so we just need to play. Is it a potential problem for the dendrogram-based approaches that our class hierarchy is be multi-inheritance?

hkir-dev · 2021-05-19T14:11:58Z

Yes, I was thinking the same issue. Alternatively, we can represent our multi-inheritence taxonomy with an adjacency matrix and apply a graph clustering approach (like spectral clustering).

hkir-dev · 2021-05-24T08:23:04Z

Spectral clustering didn't provided the expected abstraction points.
Using the returned leaf nodes and subsuming classes I rebuild the sub-tree. After visualisation of the result taxonomy, tried a heuristic approach. This approach recommends 'adult visual projection neuron' (FBbt_00048286) and 'lobula columnar neuron' (FBbt_00003870) as abstraction points.

I used three metrics:

node out degree: Higher out degree is better
node's depth in the taxonomy tree: Higher depth is better
node's descendants count: Close the average number of descendants is better (not too low or high)

After the get_connected_neurons_by_type query, tree construction and metrics evaluation adds 4.5 seconds execution time.
Related code is in gist. We can test algorithm with further queries, fine-tune metric weights or add new metrics.

dosumis assigned hkir-dev May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type summary function #67

Type summary function #67

dosumis commented May 19, 2021 •

edited

Loading

hkir-dev commented May 19, 2021

dosumis commented May 19, 2021

hkir-dev commented May 19, 2021

hkir-dev commented May 24, 2021

Type summary function #67

Type summary function #67

Comments

dosumis commented May 19, 2021 • edited Loading

hkir-dev commented May 19, 2021

dosumis commented May 19, 2021

hkir-dev commented May 19, 2021

hkir-dev commented May 24, 2021

dosumis commented May 19, 2021 •

edited

Loading