Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type summary function #67

Open
dosumis opened this issue May 19, 2021 · 4 comments
Open

Type summary function #67

dosumis opened this issue May 19, 2021 · 4 comments
Assignees

Comments

@dosumis
Copy link
Member

dosumis commented May 19, 2021

Aim: Given a list of Individuals or Classes, provide a summary of the classes present

example

from vfb_connect.cross_server_tools import VfbConnect
vc = VfbConnect()

visPN2DC = vc.get_connected_neurons_by_type(upstream_type='visual projection neuron',
                                 downstream_type='adult descending neuron',
                                 weight=10).sort_values('weight', ascending=False)

Histogram of types returned includes leaf nodes and subsuming classes, e.g. LC4, 14, 9 and 10 are subsumed by 'lobular columnar neuron'

image

In this case, all cells are subclasses of 'visual projection neuron' - so mapping up to that class would tell us nothing - but mapping up to a class below 'adult visual projection neuron' would be useful:

image

The problem, of course, is how we specify what classes we should map up to? We could allow user input of such classes, but I think that would be asking too much of our users. Is there an algorithm we can apply which selects some informative/representative set of subsuming classes to map up to? Maybe something that could take some tuning variable specifying degree of abstraction?

@hkir-dev
Copy link

This problem looked similar to the "where to cut" problem in hierarchical clustering. Our query results generate a sub-taxonomy. If we think top-down, we should stop at some point which is not too specific (over-fitting) and not too generic.
Our problem is, we don't have a metric to evaluate the sufficient abstraction level. Seem a little subjective.
I will try scipy hierarchical clustering to see if it will provide meaningful abstractions.

@dosumis
Copy link
Member Author

dosumis commented May 19, 2021

Agree this is underspecified so we just need to play. Is it a potential problem for the dendrogram-based approaches that our class hierarchy is be multi-inheritance?

@hkir-dev
Copy link

Yes, I was thinking the same issue. Alternatively, we can represent our multi-inheritence taxonomy with an adjacency matrix and apply a graph clustering approach (like spectral clustering).

@hkir-dev
Copy link

Spectral clustering didn't provided the expected abstraction points.
Using the returned leaf nodes and subsuming classes I rebuild the sub-tree. After visualisation of the result taxonomy, tried a heuristic approach. This approach recommends 'adult visual projection neuron' (FBbt_00048286) and 'lobula columnar neuron' (FBbt_00003870) as abstraction points.
cut_off2

I used three metrics:

  • node out degree: Higher out degree is better
  • node's depth in the taxonomy tree: Higher depth is better
  • node's descendants count: Close the average number of descendants is better (not too low or high)

After the get_connected_neurons_by_type query, tree construction and metrics evaluation adds 4.5 seconds execution time.
Related code is in gist. We can test algorithm with further queries, fine-tune metric weights or add new metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants