Updated draft for the volcanoplot function#167
Conversation
update with main
update with main
…ract some steps to helper functions and extend the docstring
update with main
There was a problem hiding this comment.
Pull request overview
This pull request introduces a volcano plot wrapper function and a general layered plotting system for alphapepttools. The implementation adds new data manipulation utilities and a flexible PlotConfig dataclass to support hierarchical scatterplot layering, where points can be colored and styled based on multiple overlapping criteria.
Changes:
- Added
layered_plotmethod to enable hierarchical plotting with automatic point assignment and deduplication - Added
volcanowrapper function that useslayered_plotinternally for differential expression visualization - Introduced
PlotConfigdataclass andmake_scatter_configfactory for flexible plot configuration - Added utility functions (
data_index_to_array,subset_data,_tolist) to support data manipulation - Enhanced scatter method with automatic limit calculation and renamed limit parameters from singular to plural
Reviewed changes
Copilot reviewed 4 out of 6 changed files in this pull request and generated 16 comments.
| File | Description |
|---|---|
| src/alphapepttools/pp/data.py | Added data utility functions for index extraction, subsetting, and list conversion; updated documentation |
| src/alphapepttools/pl/plots.py | Core implementation of layered plotting system, PlotConfig dataclass, volcano function, and helper utilities |
| src/alphapepttools/pl/init.py | Exported new PlotConfig and make_scatter_config for public API |
| docs/notebooks/studies/study_01_biomarker_csf.ipynb | Unintentional notebook output changes from re-execution |
Comments suppressed due to low confidence (1)
src/alphapepttools/pl/plots.py:1083
- The documentation still references the old parameter names 'xlim' and 'ylim', but the actual parameters have been renamed to 'xlims' and 'ylims' (plural). Update the documentation to match the actual parameter names.
xlim : tuple[float, float], optional
Limits for the x-axis. By default None.
ylim : tuple[float, float], optional
Limits for the y-axis. By default None.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #167 +/- ##
==========================================
- Coverage 73.61% 72.32% -1.30%
==========================================
Files 35 35
Lines 1823 1980 +157
==========================================
+ Hits 1342 1432 +90
- Misses 481 548 +67
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
I like the plots that are returned by the function a lot, they look really clean.
I commented on some specific things associated with the pull request.
I marked issues that were identified by Claude with [Claude] based on the following prompt:
You are a scientific software developer who is putting a lot of thought into user-friendliness of APIs etc. You are working on a software package for the analysis of proteomics data with anndata as data container. Critically review the changes in the current branch by comparing them to the main branch that aim to implement a convenience function for volcano plots
On a general note, I find the plots module increasingly confusing to review:
- the
Plotsclass does not provide a unifying config as its config is not used in any of the functions. This in combination with the@classmethodpattern means that individual functions would behave equivalently. This would also make the package more aligned with the other scverse packages (that use individual functions). - the number of utility functions + implicit contracts (e.g. expected behavior of configs) makes it complicated to understand the logic
| ----- | ||
| Uses pandas.isna() to handle both NaN and None values correctly. | ||
| """ | ||
| keep_mask = ~(pd.isna(x_values) | pd.isna(y_values)) |
There was a problem hiding this comment.
This action might lead to surprising results (imagine all data coords have a missing value and suddenly no point gets plotted) - should it emit a warning?
There was a problem hiding this comment.
I suppose this is the tradeoff, if a datapoint has a missing value in either the x or y axis, how would it be plottable at all? I am quite averse to imputing them with zeros, if there are nonstandard axis choices or only a single point is concerned, it could lead to false conclusions about points that have no actual values - I'd much rather handle this upstream with imputation.
As for warnings, this would then raise a warning if any point has a missing value, which could be seen as excessive/annoying.
The scenario where this causes (or rather exposes) a problem is when users expect to see e.g. a certain gene but it does not show up in the plot, which would likely cause them to search for it in the data where the missing values could be identified & addressed.
A pure discovery scenario would (in my opinion) not benefit from somehow showing/including points that have no values associated.
| >>> _get_plot_lims(values, 1.1, set_left=0) | ||
| (0, 3.3) | ||
| """ | ||
| series = pd.Series(values) |
There was a problem hiding this comment.
Why is this conversion necessary?
Wouldn't
max(abs(min(values)), abs(max(values)))
produce equivalent results?
| cls, | ||
| # Required data parameters | ||
| data: ad.AnnData | pd.DataFrame, | ||
| x_column: str, |
There was a problem hiding this comment.
could these values all default to the standardized output of the tl.diff_exp functions?
We would expect that the users just plug in our DEG results (and if they don't, they can change it)
There was a problem hiding this comment.
Nevermind, I see that the diff-exp outputs are not really standardized at the moment as they vary depending on which kinds of contrasts are passed (t_value__condition1_condition2). Should we update that?
There was a problem hiding this comment.
The diff exp outputs are standardized with respect to the necessary columns, but we keep the method-specific columns around as well in case users want them. The standardized columns are in tl.defaults.py, so we could set column defaults to those
| xlim: tuple[float, float] | None = None, | ||
| ylim: tuple[float, float] | None = None, | ||
| figure_kwargs: dict | None = None, | ||
| xlims: tuple[float, float] | None = None, |
There was a problem hiding this comment.
This is a breaking change - Is it necessary to rename the arguments in this PR?
| layers: list[tuple] | None = None, | ||
| color_dict: dict[str, str | tuple] | None = None, | ||
| # Volcano-specific thresholds | ||
| x_thresholds: tuple | None = (-1, 1), |
There was a problem hiding this comment.
I think the None option does not add value here
| max_labels: int | None = None, | ||
| x_label_anchors: list[float] | None = None, | ||
| y_display_start: float | None = 1, | ||
| y_padding_factor: float | None = 4, |
There was a problem hiding this comment.
Would drop None default
| display_id_column: str | None = None, | ||
| max_labels: int | None = None, | ||
| x_label_anchors: list[float] | None = None, | ||
| y_display_start: float | None = 1, |
There was a problem hiding this comment.
Would drop None default
| left = -abs_max * padding_factor | ||
| right = abs_max * padding_factor | ||
| else: | ||
| left = series.min() * padding_factor |
There was a problem hiding this comment.
[Claude] This assumes that the minimal value of series is negative (otherwise the limit is shifted to the right)
| data: ad.AnnData | pd.DataFrame | None = None | ||
| _extra: dict | None = None # Store additional fields | ||
|
|
||
| def __post_init__(self): |
There was a problem hiding this comment.
from dataclasses import dataclass, field
@dataclass
class Data:
extra: dict[str, int] = field(default_factory=dict)
"After archiving the initial volcanoplot branch, this is the new and improved volcanoplot branch"
This PR adds a volcanoplot wrapper
By popular and LLM demand, there should be a volcanoplot wrapper function in alphatools. As such, it should abstract the following functionalities:
Main difference to the previous PR
Previously, layering of plot features was handled by
volcano(), which created a suboptimal entanglement of general purpose plotting functionality (layering plots) and the specific application of a volcanoplot. Currently, two new methods + helpers have been added to the plots.py module:layered_plot: Calls a plotting function repeatedly for any number of specified data layers, ensuring that indices are only plotted once and that indices not used by any layer are plotted in a default color. It operates using aPlotConfiginstance and aCallablecapable of interpreting the parameters from thePlotConfigand layer-specific parameters. An example is added intutorials/tutorial_02_basic_plotting_workflow.ipynb.To avoid having to specify basic parameters for each layer, we can summarize them in a PlotConfig instance with flexible attributes to accomodate different plotting functions:
The main specification of 'layered_plot' is the
layerslist, which consists of tuples with 3-4 elements:color_dictfor that layer (here synonymous to the layer match values in 2.)The function is called like this on an
AnnDataorDataFrameobject:New
volcano():Using
layered_plotinternally, thevolcanofunction is drastically simplified and can be called as demonstrated intutorials/tutorial_04_volcanoplot.ipynb. It also incorporates labelling of different layers, which can be specified by theircolor_dictkey.volcanointernally constructs aPlotConfiginstance to handle default arguments, meaning that users do not have to interact withPlotConfig.As before, the first element refers to the column in data to be used for filtering, the second element specifies a match value or list for that column and the third provides the color lookup for that layer.
Generating the volcanoplot (with anchored labels):
To create something like this:

Open points:
label_plotmethod will takeAnnData/DataFrameinstances with specified columns instead of separate arrays.layered_plottheoretically allow for layering arbitraryCallableinstances (plotting functions) with arbitrary global and layer-specific arguments. Perhaps there is a good use case for that in future demonstration notebooks?