Adding more quantification metrics to connected components #64

LukasHats · 2024-12-19T12:51:05Z

LukasHats
Dec 19, 2024

as you already feel, I am obsessed with your method :D While playing around with the shape features, I realized that maybe the connected component could hold additional value for users. I will try to explain my thoughts:

When looking at the tissue with neighborhoods displayed, people usually then quantify the frequencies of cells belonging to specific neighborhoods and try to summarize and compare this across cohorts. But I feel this again is only a part of the bigger picture.
When calculating the connected components, if I understood it correctly, we basically get how many cells belong to one connected neighborhood component. So to speak a real connected neighborhood blob. As its automatically stored in .obs One could quantify the amount or frequencie of cells of a specific neighborhood that actually is inside such a connected component/neighborhood. This could somehow be a measure of tendencies to form actual connected structures than rather dispersed behaviour?

I have already tried to calculate this with a first code draft (for my anndata format obviously):

def analyze_neighborhood_components(adata, neighborhood):
    # Create initial dataframe from anndata
    df = pd.DataFrame({
        'component': adata.obs['component'],
        'cellcharter_CN': adata.obs['cellcharter_CN'],
        'disease2': adata.obs['disease2'],
        'image_ID': adata.obs['image_ID']
    })
    
    # Filter for specific neighborhood
    neighborhood_df = df[df['cellcharter_CN'] == neighborhood].copy()
    
    # Group by image_ID and disease2
    analysis = (
        neighborhood_df.groupby(['image_ID', 'disease2'])
        .agg({
            'component': [
                ('total_cells', 'size'),
                ('cells_in_components', lambda x: x.notna().sum()),
                ('unique_components', lambda x: x.nunique(dropna=True)),
            ]
        })
        .reset_index()
    )
    
    # Calculate percentage of cells in components
    analysis['percent_cells_in_components'] = (
        analysis[('component', 'cells_in_components')] /
        analysis[('component', 'total_cells')] * 100
    )
    
    # Clean up column names
    analysis.columns = [
        col[0] if col[1] == '' else f"{col[1]}"
        for col in analysis.columns
    ]
    
    return analysis

Although this goes into the direction of purity, I get different results.
Happy to hear your thoughts about this and also maybe you have other ideas on how to use the connected component?

LukasHats · 2024-12-19T13:01:00Z

LukasHats
Dec 19, 2024
Author

Or more comprehensively written:

def analyze_neighborhood_components(adata, neighborhood):
    return (
        pd.DataFrame({
            'component': adata.obs['component'],
            'cellcharter_CN': adata.obs['cellcharter_CN'],
            'disease2': adata.obs['disease2'],
            'image_ID': adata.obs['image_ID']
        })
        .query('cellcharter_CN == @neighborhood')
        .groupby(['image_ID', 'disease2'])['component']
        .agg(**{
            'total_cells': 'size',
            'cells_in_components': lambda x: x.notna().sum(),
            'unique_components': lambda x: x.nunique(dropna=True)
        })
        .assign(
            percent_cells_in_components=lambda x: (
                x['cells_in_components'] / x['total_cells'] 
            )
        )
        .reset_index()
    )

0 replies

LukasHats · 2024-12-19T13:40:14Z

LukasHats
Dec 19, 2024
Author

I feel its also a kind of measure how dispersed a neighborhood is growing. E.g. if I set min_cells=50 (I have smaller IMC images) for some neighborhoods I get a lot of connected components, for others not. Meaning they seem to be distributed quite differently. I will also try to somehow visualize this aspect.

0 replies

marcovarrone · 2024-12-20T17:45:00Z

marcovarrone
Dec 20, 2024
Maintainer

Hi @LukasHats, first of all, there is no better compliment than what you wrote, so thank you very much :)

Yes, that's a very good point. In the past, I have measured something similar to the number of "components" for a niche/domain, but your approach is definitely better.
And I also agree with the fact that it's in some way related to purity but they don't represent the same thing.

If you are up for it, after the holidays we can write a pull request together and decide whether to keep it as a separate measure or to find ways to combine it with purity to have a more comprehensive score.

0 replies

LukasHats · 2024-12-21T08:36:13Z

LukasHats
Dec 21, 2024
Author

That sounds awesome @marcovarrone ! As you can guess, I would love to contribute something. I already have an enhanced version of the function above that takes the .obs component and plots the fraction of cells from a neighborhood that are inside a connected component (and doing that per image, also allowing us to summarize across cohorts with a datapoint per image per neighborhood). We could call it cc.pl.frac_connectec_components, cc.pl.neighborhood_connectivity or similar. Happy to open a first PR draft after Christmas holidays!!

0 replies

LukasHats · 2024-12-23T16:49:26Z

LukasHats
Dec 23, 2024
Author

@marcovarrone

How do you prefer to proceed? Would you open a new branch so we can first open the PR to the newly opened branch and work on this? I already started to implement the plotting function into your shape.py function (see here). But I don't have the overview about the whole package and if you want to implement test etc. So let me know where to open the draft PR. Happy holidays!

0 replies

marcovarrone · 2025-01-06T08:42:45Z

marcovarrone
Jan 6, 2025
Maintainer

Hi @LukasHats, sorry I was on holiday and decided not to check work stuff during that time :)

The best approach would be to create a pull request directly from the nhood_connectivity branch in the forked repository that you created.
In theory, there should be a way to allow the maintainers of the original repository (i.e., me) to do changes directly (as shown here).

2 replies

LukasHats Jan 6, 2025
Author

No worries, same for me! Hope you enjoyed your time off! Yes that's what I planned to do, my question was more asking to what branch you want me to open the PR? Your main?

marcovarrone Jan 6, 2025
Maintainer

Ah sorry I misunderstood the question, yes the main is ok!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding more quantification metrics to connected components #64

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Adding more quantification metrics to connected components #64

LukasHats Dec 19, 2024

Replies: 6 comments · 2 replies

LukasHats Dec 19, 2024 Author

LukasHats Dec 19, 2024 Author

marcovarrone Dec 20, 2024 Maintainer

LukasHats Dec 21, 2024 Author

LukasHats Dec 23, 2024 Author

marcovarrone Jan 6, 2025 Maintainer

LukasHats Jan 6, 2025 Author

marcovarrone Jan 6, 2025 Maintainer

LukasHats
Dec 19, 2024

Replies: 6 comments 2 replies

LukasHats
Dec 19, 2024
Author

LukasHats
Dec 19, 2024
Author

marcovarrone
Dec 20, 2024
Maintainer

LukasHats
Dec 21, 2024
Author

LukasHats
Dec 23, 2024
Author

marcovarrone
Jan 6, 2025
Maintainer

LukasHats Jan 6, 2025
Author

marcovarrone Jan 6, 2025
Maintainer