diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 16f44a4..6386889 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -12,6 +12,16 @@ This repository contains two distinct Python packages for biomedical analysis: **Key frameworks**: NumPy/SciPy for numerical computing, Matplotlib for plotting, Gurobi for optimization (KoMbine only) **Target runtime**: Python 3.11+ (tested with 3.12) +## Code Style Guidelines + +**CRITICAL: Indentation and Formatting** +- **Use 2-space indentation** for all Python code (not 4 spaces or tabs) +- **No trailing whitespace** - remove all trailing spaces at the end of lines +- **Always run pylint after editing** and fix any errors before committing +- Run `python -m pylint ` to check for issues +- Nontrivial errors that would require major refactoring can be ignored via `# pylint: disable=error-name` comments +- Target pylint score: 10.00/10 + ## Package Structure ### Core Packages @@ -20,7 +30,7 @@ This repository contains two distinct Python packages for biomedical analysis: - `discrete_base.py`, `continuous_distributions.py` - Supporting classes - `command_line_interface.py` - ROC-specific CLI functions - `datacard.py` - Re-exports Datacard from kombine for compatibility - + - **`kombine/`**: Kaplan-Meier analysis functionality - `kaplan_meier*.py` - KM curve and likelihood methods - `discrete_optimization.py`, `utilities.py` - Optimization utilities (used by KM methods) @@ -32,11 +42,26 @@ This repository contains two distinct Python packages for biomedical analysis: - **`test/kombine/`**: KoMbine tests, datacards, and reference data - **Shared files**: `utility_testing_functions.py`, `test_continuous_distributions.py` (copied to both) -### Documentation Structure +### Documentation Structure - **`docs/roc_picker/`**: ROC Picker documentation, LaTeX files, and plotting scripts - **`docs/kombine/`**: KoMbine documentation, LaTeX files, and plotting scripts - Each has independent numbering starting from 01, with separate `compile_*_plots.sh` scripts +**KoMbine documentation files**: +- `01_table_of_contents.md` - Index of all documentation files (synced with Jupytext) +- `02_kombine.tex` - LaTeX paper with mathematical details (JSS submission) +- `03_kaplan_meier_example.md` - Jupyter notebook showing Python API usage examples +- `04_compare_to_lifelines.md` - Jupyter notebook comparing to `lifelines` package +- `05_command_line_interface.md` - **Pure Markdown** (no Python cells) documenting all CLI options for `kombine` and `kombine_twogroups` commands (synced with Jupytext) + +**Documentation style guidelines**: +- Files `03_*.md` and `04_*.md` are Jupytext notebooks with Python cells for interactive examples +- Files `01_*.md` and `05_*.md` are pure Markdown with Jupytext headers but no Python cells +- All documentation markdown files must have Jupytext headers to allow `jupytext --sync` to process them +- All CLI options must be documented in `05_command_line_interface.md` and verified by `test/kombine/test_documentation.py` +- When adding new CLI arguments, update `05_command_line_interface.md` and run the documentation test +- The table of contents (`01_table_of_contents.md`) must list all numbered documentation files + ## Critical Build Information ### Installation and Environment Setup diff --git a/.github/workflows/ci-cd.yml b/.github/workflows/ci-cd.yml index 63c0a6a..5f34323 100644 --- a/.github/workflows/ci-cd.yml +++ b/.github/workflows/ci-cd.yml @@ -150,6 +150,14 @@ jobs: run: | python -m test.kombine.test_km_likelihood + - name: KM plotting tests + run: | + python -m test.kombine.test_km_plotting + + - name: Documentation tests + run: | + python -m test.kombine.test_documentation + - name: convert KoMbine notebooks run: | jupytext --sync docs/kombine/*.md diff --git a/docs/kombine/01_table_of_contents.md b/docs/kombine/01_table_of_contents.md index 7e10ad5..221631f 100644 --- a/docs/kombine/01_table_of_contents.md +++ b/docs/kombine/01_table_of_contents.md @@ -22,6 +22,8 @@ jupyter: - an example of how to use the likelihood method for uncertainties on Kaplan-Meier curves - `04_compare_to_lifelines.html` - a comparison of our Kaplan-Meier likelihood method to the `lifelines` package + - `05_command_line_interface.html` + - documentation for the command line interface (`kombine` and `kombine_twogroups` commands) # Compilation instructions diff --git a/docs/kombine/05_command_line_interface.md b/docs/kombine/05_command_line_interface.md new file mode 100644 index 0000000..7ae1e34 --- /dev/null +++ b/docs/kombine/05_command_line_interface.md @@ -0,0 +1,168 @@ +--- +jupyter: + jupytext: + cell_metadata_filter: -all + formats: ipynb,md,py + main_language: python + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.18.1 +--- + +# KoMbine Command Line Interface + +This document describes the command line interface for KoMbine's Kaplan-Meier likelihood methods. + +## Commands + +KoMbine provides two command line tools: + +- `kombine` - Generate Kaplan-Meier likelihood plots for a single group +- `kombine_twogroups` - Generate Kaplan-Meier likelihood plots comparing two groups (high vs. low parameter values) + +## Common Arguments + +### Required Arguments + +Both commands require these positional arguments: + +- `datacard` - Path to the datacard file containing patient data +- `output_file` - Path to save the output plot (PDF format) + +### Error Band Options + +Control which error bands are displayed in the plot: + +- `--include-binomial-only` - Include error bands for the binomial error alone (mutually exclusive with `--include-patient-wise-only`) +- `--include-patient-wise-only` - Include error bands for the patient-wise error alone (mutually exclusive with `--include-binomial-only`) +- `--include-exponential-greenwood` - Include the binomial-only exponential Greenwood error band in the plot +- `--exclude-full-nll` - Exclude the full NLL (negative log-likelihood) from the plot +- `--exclude-nominal` - Exclude the nominal line from the plot + +### Plot Appearance Options + +Customize the visual appearance of the plot: + +- `--color {blue,red,green,purple,orange,teal,brown,pink}` - Color scheme for the plot (default: blue) [kombine only] +- `--high-color {blue,red,green,purple,orange,teal,brown,pink}` - Color scheme for the high group (default: blue) [kombine_twogroups only] +- `--low-color {blue,red,green,purple,orange,teal,brown,pink}` - Color scheme for the low group (default: red) [kombine_twogroups only] +- `--figsize WIDTH HEIGHT` - Figure size in inches (default: 10 7) +- `--no-tight-layout` - Do not use tight layout for the plot +- `--xmax XMAX` - Maximum time for x-axis range. Limits the plot to [0, xmax] + +### Legend and Labels + +Customize text elements in the plot: + +- `--title TITLE` - Title for the plot (default: "Kaplan-Meier Curves") +- `--xlabel XLABEL` - Label for the x-axis (default: "Time") +- `--ylabel YLABEL` - Label for the y-axis (default: "Survival Probability") +- `--legend-loc LEGEND_LOC` - Location of the legend in the plot (mutually exclusive with `--legend-saveas`) +- `--legend-saveas LEGEND_SAVEAS` - Path to save the legend separately. If provided, the legend will be left off the main plot (mutually exclusive with `--legend-loc`) +- `--include-median-survival` - Include the median survival time in the legend +- `--no-n-in-legend` - Do not include number of patients in each group in the legend labels [kombine_twogroups only] + +### Legend Suffixes + +Customize suffixes added to legend labels: + +- `--patient-wise-only-suffix PATIENT_WISE_ONLY_SUFFIX` - Suffix to add to the patient-wise-only label in the legend (default: "Patient-wise only") +- `--binomial-only-suffix BINOMIAL_ONLY_SUFFIX` - Suffix to add to the binomial-only label in the legend (default: "Binomial only") +- `--full-nll-suffix FULL_NLL_SUFFIX` - Suffix to add to the full NLL label in the legend (default: "") +- `--exponential-greenwood-suffix EXPONENTIAL_GREENWOOD_SUFFIX` - Suffix to add to the exponential Greenwood label in the legend (default: "Binomial only, exp. Greenwood") + +### Font Sizes + +Adjust font sizes for various text elements: + +- `--legend-fontsize LEGEND_FONTSIZE` - Font size for legend text (default: 10) +- `--label-fontsize LABEL_FONTSIZE` - Font size for axis labels (default: 12) +- `--title-fontsize TITLE_FONTSIZE` - Font size for the plot title (default: 14) +- `--tick-fontsize TICK_FONTSIZE` - Font size for the tick labels (default: 10) +- `--pvalue-fontsize PVALUE_FONTSIZE` - Font size for p value text (default: 12) [kombine_twogroups only] + +### Computational Options + +Control the likelihood calculation: + +- `--parameter-min PARAMETER_MIN` - Minimum parameter value for filtering patients +- `--parameter-max PARAMETER_MAX` - Maximum parameter value for filtering patients +- `--endpoint-epsilon ENDPOINT_EPSILON` - Endpoint epsilon for the likelihood calculation (default: 1e-6) +- `--log-zero-epsilon LOG_ZERO_EPSILON` - Log zero epsilon for the likelihood calculation (default: 1e-10) +- `--dont-collapse-consecutive-deaths` - Disable collapsing of consecutive death times with no intervening censoring (slower but may be more accurate) +- `--print-progress` - Print progress messages during the computation + +## kombine_twogroups Specific Arguments + +These arguments are specific to the `kombine_twogroups` command: + +### Group Separation + +- `--parameter-threshold PARAMETER_THRESHOLD` - **Required.** The parameter threshold for separating high and low groups + +### P-value Options + +Control p-value calculation and display: + +- `--exclude-p-value` - Exclude p value calculation and display from the plot +- `--include-logrank-pvalue` - Include conventional logrank p value for comparison with likelihood method +- `--p-value-tie-handling {breslow}` - Method for handling ties in p value calculation (default: breslow) +- `--pvalue-format PVALUE_FORMAT` - Format string for p value display (default: '.3g') + +## Examples + +### Basic single group plot + +``` +kombine datacard.txt output.pdf +``` + +### Single group with custom color and x-axis limit + +``` +kombine datacard.txt output.pdf --color green --xmax 50.0 +``` + +### Single group with patient-wise error only + +``` +kombine datacard.txt output.pdf --include-patient-wise-only +``` + +### Two group comparison + +``` +kombine_twogroups datacard.txt output.pdf --parameter-threshold 0.45 +``` + +### Two groups with custom colors + +``` +kombine_twogroups datacard.txt output.pdf --parameter-threshold 0.45 --high-color purple --low-color orange +``` + +### Two groups with custom parameter range and x-axis limit + +``` +kombine_twogroups datacard.txt output.pdf \ + --parameter-threshold 0.45 \ + --parameter-min 0.0 \ + --parameter-max 1.0 \ + --xmax 100.0 +``` + +### Advanced: Custom plot appearance + +``` +kombine datacard.txt output.pdf \ + --color teal \ + --xmax 75.0 \ + --title "Survival Analysis" \ + --xlabel "Time (months)" \ + --ylabel "Survival Rate" \ + --figsize 12 8 \ + --legend-fontsize 12 \ + --label-fontsize 14 +``` diff --git a/kombine/command_line_interface.py b/kombine/command_line_interface.py index 52f44ca..f74f1e5 100644 --- a/kombine/command_line_interface.py +++ b/kombine/command_line_interface.py @@ -14,6 +14,65 @@ from .utilities import LOG_ZERO_EPSILON_DEFAULT +# Color scheme definitions for plots +COLOR_SCHEMES = { + 'blue': { + 'main': 'blue', + 'shades': ['dodgerblue', 'skyblue'], + }, + 'red': { + 'main': 'red', + 'shades': ['orangered', 'lightcoral'], + }, + 'green': { + 'main': 'green', + 'shades': ['mediumseagreen', 'lightgreen'], + }, + 'purple': { + 'main': 'purple', + 'shades': ['mediumpurple', 'plum'], + }, + 'orange': { + 'main': 'orange', + 'shades': ['darkorange', 'lightsalmon'], + }, + 'teal': { + 'main': 'teal', + 'shades': ['darkturquoise', 'paleturquoise'], + }, + 'brown': { + 'main': 'saddlebrown', + 'shades': ['peru', 'tan'], + }, + 'pink': { + 'main': 'deeppink', + 'shades': ['hotpink', 'lightpink'], + }, +} + + +def _get_color_scheme(color_name: str) -> dict: + """ + Get the color scheme for a given color name. + + Parameters + ---------- + color_name : str + Name of the color scheme (e.g., 'blue', 'red', 'green'). + + Returns + ------- + dict + Dictionary with 'main' color and 'shades' list. + """ + if color_name not in COLOR_SCHEMES: + raise ValueError( + f"Invalid color '{color_name}'. " + f"Available colors: {', '.join(COLOR_SCHEMES.keys())}" + ) + return COLOR_SCHEMES[color_name] + + def _make_common_parser(description: str) -> argparse.ArgumentParser: # pylint: disable=line-too-long parser = argparse.ArgumentParser(description=description) @@ -29,6 +88,7 @@ def _make_common_parser(description: str) -> argparse.ArgumentParser: parser.add_argument("--print-progress", action="store_true", dest="print_progress", help="Print progress messages during the computation.") parser.add_argument("--endpoint-epsilon", type=float, dest="endpoint_epsilon", default=1e-6, help="Endpoint epsilon for the likelihood calculation.") parser.add_argument("--log-zero-epsilon", type=float, dest="log_zero_epsilon", default=LOG_ZERO_EPSILON_DEFAULT, help="Log zero epsilon for the likelihood calculation.") + parser.add_argument("--xmax", type=float, dest="xmax", default=None, help="Maximum time for x-axis range. Limits the plot to [0, xmax].") parser.add_argument("--figsize", nargs=2, type=float, metavar=("WIDTH", "HEIGHT"), help="Figure size in inches.", default=KaplanMeierPlotConfig.figsize) parser.add_argument("--no-tight-layout", action="store_false", help="Do not use tight layout for the plot.", default=True, dest="tight_layout") parser.add_argument("--legend-fontsize", type=float, help="Font size for legend text.", default=KaplanMeierPlotConfig.legend_fontsize) @@ -70,6 +130,7 @@ def _extract_common_plot_config_args(args: argparse.Namespace) -> dict: "include_nominal": args.__dict__.pop("include_nominal"), "include_median_survival": args.__dict__.pop("include_median_survival"), "print_progress": args.__dict__.pop("print_progress"), + "xmax": args.__dict__.pop("xmax"), "figsize": args.__dict__.pop("figsize"), "tight_layout": args.__dict__.pop("tight_layout"), "legend_fontsize": args.__dict__.pop("legend_fontsize"), @@ -88,16 +149,30 @@ def _extract_common_plot_config_args(args: argparse.Namespace) -> dict: } -def plot_km_likelihood(): +def _make_kombine_parser() -> argparse.ArgumentParser: """ - Run Kaplan-Meier likelihood method from a datacard. + Create the argument parser for the kombine command. + + Returns + ------- + argparse.ArgumentParser + Configured argument parser for kombine. """ # pylint: disable=line-too-long parser = _make_common_parser("Run Kaplan-Meier likelihood method from a datacard.") parser.add_argument("--parameter-min", type=float, dest="parameter_min", default=-np.inf) parser.add_argument("--parameter-max", type=float, dest="parameter_max", default=np.inf) parser.add_argument("--dont-collapse-consecutive-deaths", action="store_true", dest="dont_collapse_consecutive_deaths", help="Disable collapsing of consecutive death times with no intervening censoring (slower but may be more accurate)") + parser.add_argument("--color", type=str, dest="color", default=None, choices=list(COLOR_SCHEMES.keys()), help=f"Color scheme for the plot. Options: {', '.join(COLOR_SCHEMES.keys())}. Default is blue for single plots.") # pylint: enable=line-too-long + return parser + + +def plot_km_likelihood(): + """ + Run Kaplan-Meier likelihood method from a datacard. + """ + parser = _make_kombine_parser() args = parser.parse_args() _validate_plot_args(args, parser) @@ -110,10 +185,18 @@ def plot_km_likelihood(): collapse_consecutive_deaths=not args.__dict__.pop("dont_collapse_consecutive_deaths"), ) + # Handle color scheme + color = args.__dict__.pop("color") + if color is None: + color = 'blue' # Default color + color_scheme = _get_color_scheme(color) + plot_config = KaplanMeierPlotConfig( **_extract_common_plot_config_args(args), saveas=args.__dict__.pop("output_file"), legend_saveas=args.__dict__.pop("legend_saveas"), + best_color=color_scheme['main'], + CL_colors=color_scheme['shades'], ) kml.plot(config=plot_config) @@ -122,23 +205,39 @@ def plot_km_likelihood(): raise ValueError(f"Unused arguments: {args.__dict__}") -def plot_km_likelihood_two_groups(): # pylint: disable=too-many-locals, too-many-statements +def _make_kombine_twogroups_parser() -> argparse.ArgumentParser: """ - Run Kaplan-Meier likelihood method from a datacard, and plot Kaplan-Meier - curves for two groups separated into high and low values of the parameter. + Create the argument parser for the kombine_twogroups command. + + Returns + ------- + argparse.ArgumentParser + Configured argument parser for kombine_twogroups. """ # pylint: disable=line-too-long parser = _make_common_parser("Run Kaplan-Meier likelihood method from a datacard, and plot Kaplan-Meier curves for two groups separated into high and low values of the parameter.") parser.add_argument("--parameter-threshold", type=float, dest="parameter_threshold", required=True, help="The parameter threshold for separating high and low groups.") parser.add_argument("--parameter-min", type=float, dest="parameter_min", default=-np.inf, help="The minimum parameter value for the low group.") parser.add_argument("--parameter-max", type=float, dest="parameter_max", default=np.inf, help="The maximum parameter value for the high group.") + parser.add_argument("--high-color", type=str, dest="high_color", default=None, choices=list(COLOR_SCHEMES.keys()), help=f"Color scheme for the high group. Options: {', '.join(COLOR_SCHEMES.keys())}. Default is blue.") + parser.add_argument("--low-color", type=str, dest="low_color", default=None, choices=list(COLOR_SCHEMES.keys()), help=f"Color scheme for the low group. Options: {', '.join(COLOR_SCHEMES.keys())}. Default is red.") parser.add_argument("--exclude-p-value", action="store_false", dest="include_p_value", default=True, help="Exclude p value calculation and display from the plot.") parser.add_argument("--include-logrank-pvalue", action="store_true", dest="include_logrank_pvalue", help="Include conventional logrank p value for comparison with likelihood method.") parser.add_argument("--p-value-tie-handling", type=str, dest="p_value_tie_handling", choices=["breslow"], default="breslow", help="Method for handling ties in p value calculation: currently only option is 'breslow' (Breslow approximation).") parser.add_argument("--pvalue-fontsize", type=float, dest="pvalue_fontsize", default=KaplanMeierPlotConfig.pvalue_fontsize, help="Font size for p value text.") parser.add_argument("--pvalue-format", type=str, dest="pvalue_format", default=KaplanMeierPlotConfig.pvalue_format, help="Format string for p value display (e.g., '.3g', '.2f').") parser.add_argument("--dont-collapse-consecutive-deaths", action="store_true", dest="dont_collapse_consecutive_deaths", help="Disable collapsing of consecutive death times with no intervening censoring (slower but may be more accurate)") + parser.add_argument("--no-n-in-legend", action="store_false", dest="n_in_legend", help="Do not include number of patients in each group in the legend labels.", default=True) # pylint: enable=line-too-long + return parser + + +def plot_km_likelihood_two_groups(): # pylint: disable=too-many-locals, too-many-statements + """ + Run Kaplan-Meier likelihood method from a datacard, and plot Kaplan-Meier + curves for two groups separated into high and low values of the parameter. + """ + parser = _make_kombine_twogroups_parser() args = parser.parse_args() _validate_plot_args(args, parser) @@ -154,6 +253,7 @@ def plot_km_likelihood_two_groups(): # pylint: disable=too-many-locals, too-many p_value_tie_handling = args.__dict__.pop("p_value_tie_handling") pvalue_fontsize = args.__dict__.pop("pvalue_fontsize") pvalue_format = args.__dict__.pop("pvalue_format") + n_in_legend = args.__dict__.pop("n_in_legend") kml_low = datacard.km_likelihood( parameter_min=parameter_min, @@ -172,6 +272,21 @@ def plot_km_likelihood_two_groups(): # pylint: disable=too-many-locals, too-many common_plot_kwargs = _extract_common_plot_config_args(args) + # Handle color schemes + high_color = args.__dict__.pop("high_color") + low_color = args.__dict__.pop("low_color") + if high_color is None: + high_color = 'blue' # Default high color + if low_color is None: + low_color = 'red' # Default low color + high_color_scheme = _get_color_scheme(high_color) + low_color_scheme = _get_color_scheme(low_color) + + high_label = "High" + low_label = "Low" + if n_in_legend: + high_label += f" (n={len(kml_high.nominalkm.patients)})" + low_label += f" (n={len(kml_low.nominalkm.patients)})" config_high = KaplanMeierPlotConfig( **common_plot_kwargs, create_figure=True, @@ -179,9 +294,9 @@ def plot_km_likelihood_two_groups(): # pylint: disable=too-many-locals, too-many show=False, saveas=None, legend_saveas=os.devnull, - best_label=f"High (n={len(kml_high.nominalkm.patients)})", - best_color="blue", - CL_colors=["dodgerblue", "skyblue"], + best_label=high_label, + best_color=high_color_scheme['main'], + CL_colors=high_color_scheme['shades'], pvalue_fontsize=pvalue_fontsize, pvalue_format=pvalue_format, ) @@ -194,9 +309,9 @@ def plot_km_likelihood_two_groups(): # pylint: disable=too-many-locals, too-many show=False, saveas=None, legend_saveas=args.__dict__.pop("legend_saveas"), - best_label=f"Low (n={len(kml_low.nominalkm.patients)})", - best_color="red", - CL_colors=["orangered", "lightcoral"], + best_label=low_label, + best_color=low_color_scheme['main'], + CL_colors=low_color_scheme['shades'], pvalue_fontsize=pvalue_fontsize, pvalue_format=pvalue_format, ) diff --git a/kombine/kaplan_meier.py b/kombine/kaplan_meier.py index df66623..fa8545d 100644 --- a/kombine/kaplan_meier.py +++ b/kombine/kaplan_meier.py @@ -71,14 +71,40 @@ def patient_censored_times(self) -> frozenset[float]: """ Returns the survival times of the patients who were censored. """ - @functools.cached_property - def times_for_plot(self) -> list[float]: + def get_times_for_plot(self, xmax: float | None = None) -> list[float]: """ Returns the survival times for the Kaplan-Meier curve. The times are the unique survival times of the patients, - plus a point at 0 and a point beyond the last time. + plus a point at 0 and a point beyond the last time (or xmax). + + Parameters + ---------- + xmax : float, optional + Maximum time for the x-axis. If provided, includes all times <= xmax + and xmax itself for smooth curve termination. + + Returns + ------- + list[float] + List of time points for plotting. """ times_for_plot = sorted(self.patient_death_times) + + if xmax is not None: + # Include all times <= xmax + times_within_xmax = [t for t in times_for_plot if t <= xmax] + + # Build the time list: 0, times <= xmax, xmax + result = [0.] + result.extend(times_within_xmax) + + # Add xmax at the end if it's not already in the list + if xmax not in result: + result.append(xmax) + + return result + + # Original behavior: add 0 and a point beyond the last time times_for_plot = ( [0] + times_for_plot @@ -86,6 +112,18 @@ def times_for_plot(self) -> list[float]: ) return times_for_plot + @functools.cached_property + def times_for_plot(self) -> list[float]: + """ + Returns the survival times for the Kaplan-Meier curve. + The times are the unique survival times of the patients, + plus a point at 0 and a point beyond the last time. + + This is a cached property for backward compatibility. + For custom time ranges, use get_times_for_plot(xmax=...). + """ + return self.get_times_for_plot(xmax=None) + @staticmethod def get_points_for_plot(times_for_plot, survival_probabilities): """ diff --git a/kombine/kaplan_meier_likelihood.py b/kombine/kaplan_meier_likelihood.py index 09123cf..7595b44 100644 --- a/kombine/kaplan_meier_likelihood.py +++ b/kombine/kaplan_meier_likelihood.py @@ -33,6 +33,7 @@ class KaplanMeierPlotConfig: #pylint: disable=too-many-instance-attributes Attributes: times_for_plot: Sequence of time points for plotting the survival probabilities. + xmax: Maximum time for x-axis range. If provided, limits the plot to [0, xmax]. include_binomial_only: If True, include error bands for the binomial error alone. include_greenwood: If True, include error bands for the binomial error using the exponential Greenwood method. @@ -79,6 +80,7 @@ class KaplanMeierPlotConfig: #pylint: disable=too-many-instance-attributes pvalue_format: Format string for p-value display (e.g., '.3g', '.2f'). """ times_for_plot: typing.Sequence[float] | None = None + xmax: float | None = None include_binomial_only: bool = False include_exponential_greenwood: bool = False include_patient_wise_only: bool = False @@ -562,10 +564,10 @@ def plot(self, config: KaplanMeierPlotConfig | None = None, **kwargs) -> dict: elif kwargs: # If config is provided and kwargs are also given, update config with kwargs config = dataclasses.replace(config, **kwargs) - # Use config.times_for_plot, falling back to self.times_for_plot if None + # Use config.times_for_plot, falling back to self.get_times_for_plot(xmax) if None times_for_plot = config.times_for_plot if times_for_plot is None: - times_for_plot = self.times_for_plot + times_for_plot = self.get_times_for_plot(xmax=config.xmax) fig, ax = self._prepare_figure(config) @@ -895,6 +897,10 @@ def _finalize_plot( ax.grid(visible=config.show_grid) ax.set_ylim(0, 1.05) # Ensure y-axis is from 0 to 1.05 for survival probability + # Set x-axis limits if xmax is specified + if config.xmax is not None: + ax.set_xlim(0, config.xmax) + #set font sizes ax.tick_params(labelsize=config.tick_fontsize) if config.title is not None: diff --git a/test/kombine/test_documentation.py b/test/kombine/test_documentation.py new file mode 100644 index 0000000..7d6f23d --- /dev/null +++ b/test/kombine/test_documentation.py @@ -0,0 +1,235 @@ +""" +Test that all CLI options are documented in the CLI documentation file. +""" + +import argparse +import pathlib +import re + +from kombine.command_line_interface import ( + _make_kombine_parser, + _make_kombine_twogroups_parser, +) + + +def extract_cli_arguments(parser: argparse.ArgumentParser) -> set[str]: + """ + Extract all argument names from an ArgumentParser. + + Returns a set of argument names without the leading dashes. + """ + arguments = set() + for action in parser._actions: # pylint: disable=protected-access + if action.option_strings: + for opt in action.option_strings: + # Remove leading dashes and convert to the canonical form + arg_name = opt.lstrip('-') + arguments.add(arg_name) + return arguments + + +def extract_documented_arguments(doc_file: pathlib.Path) -> set[str]: + """ + Extract all documented CLI arguments from the documentation file. + + Returns a set of argument names found in code blocks or inline code. + """ + documented = set() + content = doc_file.read_text() + + # Find all instances of --argument-name in the documentation + # Matches both inline code and code blocks + # Matches full argument names (terminated by space, `, or other non-word char) + pattern = r'--([a-z0-9]+(?:-[a-z0-9]+)*)' + matches = re.findall(pattern, content) + + for match in matches: + documented.add(match) + + return documented + + +def test_cli_documentation_completeness(): + """ + Test that all CLI arguments are documented in 05_command_line_interface.md + and that all documented arguments actually exist in the CLI. + """ + here = pathlib.Path(__file__).parent.parent.parent + doc_file = here / "docs" / "kombine" / "05_command_line_interface.md" + + # Check that the documentation file exists + assert doc_file.exists(), f"CLI documentation file not found: {doc_file}" + + # Get all documented arguments + documented_args = extract_documented_arguments(doc_file) + + # Get arguments from kombine command + parser_kombine = _make_kombine_parser() + kombine_args = extract_cli_arguments(parser_kombine) + + # Get arguments from kombine_twogroups command + parser_twogroups = _make_kombine_twogroups_parser() + twogroups_args = extract_cli_arguments(parser_twogroups) + + # Combine all unique arguments from both commands + all_cli_args = kombine_args | twogroups_args + + # Remove 'h' and 'help' as these are standard argparse arguments + all_cli_args.discard('h') + all_cli_args.discard('help') + + # Check which arguments are missing from documentation + missing_from_docs = all_cli_args - documented_args + + # Check which documented arguments don't exist in CLI + # Filter out some common words that might appear in documentation + common_words = { + 'default', 'example', 'options', 'required', 'optional', 'note', + 'kombine', 'twogroups', 'kaplan', 'meier', 'format', + 'months', 'rate', 'analysis', 'survival', 'only', 'v', 'g', 'f', + 'x', 'y', 'all', 'and', 'or', 'the', 'in', 'to', 'a', 'for', + 'mutually', 'exclusive', 'with', 'blue', 'red', 'green', 'purple', + 'orange', 'teal', 'brown', 'pink', 'group', 'groups', 'high', + 'low', 'two', 'single', 'command', 'line', 'interface', 'this', + 'document', 'describes', 'provides', 'tools', 'generate', 'plots', + 'comparing', 'vs', 'values', 'path', 'save', 'output', 'plot', + 'containing', 'patient', 'data', 'control', 'which', 'error', + 'bands', 'are', 'displayed', 'alone', 'include', 'binomial', + 'patient-wise', 'exponential', 'greenwood', 'band', 'exclude', + 'full', 'nll', 'negative', 'log-likelihood', 'from', 'nominal', + 'customize', 'visual', 'appearance', 'of', 'scheme', 'size', + 'inches', 'do', 'not', 'use', 'tight', 'layout', 'time', + 'axis', 'range', 'limits', 'text', 'elements', 'title', + 'label', 'probability', 'location', 'legend', 'if', 'provided', + 'will', 'be', 'left', 'off', 'main', 'median', 'number', 'patients', + 'each', 'labels', 'suffixes', 'added', 'suffix', 'add', 'font', + 'sizes', 'adjust', 'various', 'computational', 'calculation', + 'value', 'filtering', 'epsilon', + 'disable', 'collapsing', 'consecutive', 'death', 'times', 'no', + 'intervening', 'censoring', 'slower', 'but', 'may', 'more', + 'accurate', 'print', 'progress', 'messages', 'during', 'computation', + 'specific', 'arguments', 'these', 'separation', 'separating', + 'threshold', 'p-value', 'conventional', + 'logrank', 'comparison', 'likelihood', 'method', 'handling', + 'ties', 'breslow', 'approximation', 'string', 'examples', 'basic', + 'custom', 'limit', 'advanced', 'tick', 'txt', 'datacard', + 'pdf', 'png', 'jpg', 'html', + } + + # Arguments in docs but not in CLI (potential typos or outdated docs) + extra_in_docs = documented_args - all_cli_args - common_words + + # Report findings + if missing_from_docs: + print(f"\n❌ CLI arguments missing from documentation: {sorted(missing_from_docs)}") + + if extra_in_docs: + print(f"\n❌ Documented arguments that don't exist in CLI: {sorted(extra_in_docs)}") + print(" These may be typos or outdated documentation.") + + # The test passes if: + # 1. All CLI arguments are documented + # 2. All documented arguments (except common words) exist in CLI + assert not missing_from_docs, ( + f"The following CLI arguments are not documented in {doc_file}: " + f"{', '.join(sorted(missing_from_docs))}" + ) + + assert not extra_in_docs, ( + f"The following documented arguments don't exist in the CLI: " + f"{', '.join(sorted(extra_in_docs))}. " + "These may be typos or outdated documentation." + ) + + print(f"\n✓ All {len(all_cli_args)} CLI arguments are documented in {doc_file.name}") + print("✓ All documented arguments exist in the CLI") + + +def test_table_of_contents_sync(): + """ + Test that the table of contents is synchronized with actual documentation files. + Bidirectional check: + 1. All documentation files are listed in the TOC + 2. All files referenced in the TOC actually exist + """ + here = pathlib.Path(__file__).parent.parent.parent + docs_dir = here / "docs" / "kombine" + toc_file = docs_dir / "01_table_of_contents.md" + + # Check that the table of contents exists + assert toc_file.exists(), f"Table of contents not found: {toc_file}" + + # Read the table of contents + toc_content = toc_file.read_text() + + # Find all documentation files in the directory (numbered markdown/tex files) + doc_files = sorted([ + f.name for f in docs_dir.glob("[0-9][0-9]_*.*") + if f.name != "01_table_of_contents.md" + and (f.name.endswith('.md') or f.name.endswith('.tex')) + ]) + + # Extract file references from the table of contents + # Looking for patterns like `02_kombine.pdf` or `03_kaplan_meier_example.html` + toc_pattern = r'`(\d\d_[^`]+\.(pdf|html))`' + toc_references = set(re.findall(toc_pattern, toc_content)) + + # Direction 1: Check which actual files are missing from TOC + missing_from_toc = set() + for doc_file in doc_files: + # Check if the file is mentioned in the TOC + # For .tex files, they should be listed as .pdf + # For .md files, they should be listed as .html + if doc_file.endswith('.tex'): + expected_ref = doc_file.replace('.tex', '.pdf') + else: + expected_ref = doc_file.replace('.md', '.html') + + if expected_ref not in [ref[0] for ref in toc_references]: + missing_from_toc.add(doc_file) + + # Direction 2: Check which TOC references don't have corresponding files + missing_files = set() + for ref, _ in toc_references: + # Skip the table of contents itself + if ref.startswith('01_table_of_contents'): + continue + + # Convert TOC reference back to actual file name + # .pdf references should have .tex files + # .html references should have .md files + if ref.endswith('.pdf'): + actual_file = ref.replace('.pdf', '.tex') + else: # .html + actual_file = ref.replace('.html', '.md') + + # Check if the actual file exists + if actual_file not in doc_files: + missing_files.add(ref) + + # Report findings + if missing_from_toc: + print(f"\n❌ Documentation files missing from table of contents: {sorted(missing_from_toc)}") + + if missing_files: + print(f"\n❌ Files referenced in TOC that don't exist: {sorted(missing_files)}") + + # Assert both directions + assert not missing_from_toc, ( + f"The following documentation files are not listed in {toc_file.name}: " + f"{', '.join(sorted(missing_from_toc))}" + ) + + assert not missing_files, ( + f"The following files are referenced in {toc_file.name} but don't exist: " + f"{', '.join(sorted(missing_files))}" + ) + + print(f"\n✓ All {len(doc_files)} documentation files are listed in the table of contents") + print(f"✓ All {len(toc_references)} TOC references point to existing files") + + +if __name__ == "__main__": + test_cli_documentation_completeness() + test_table_of_contents_sync() + print("\n✅ All documentation tests passed!") diff --git a/test/kombine/test_km_plotting.py b/test/kombine/test_km_plotting.py new file mode 100644 index 0000000..e1341be --- /dev/null +++ b/test/kombine/test_km_plotting.py @@ -0,0 +1,716 @@ +""" +Test Kaplan-Meier plotting functionality. + +This module contains comprehensive tests for KM plotting features including: + +1. X-axis range control (xmax): + - Basic functionality and time point selection + - Plot generation with xmax parameter + - X-axis limit setting + - Backward compatibility (no xmax) + - Edge cases: xmax beyond all times, at death time, before first death + - Multiple xmax values + +2. Plot configuration options: + - Custom title and axis labels + - Custom figure size + - Legend control (location, no legend) + - Tight layout control + - Custom font sizes (title, labels, ticks, legend) + +3. Error band options: + - Binomial-only error bands + - Patient-wise-only error bands + - Combining xmax with different error band configurations + +4. Basic plotting: + - Basic plot generation + - Include/exclude nominal curve + - Different output formats (PDF, PNG) + - Plot results structure validation + +5. Combined features: + - xmax with custom plot options + - xmax with different error bands +""" + +import pathlib +import numpy as np +import matplotlib.pyplot as plt + +import kombine.datacard +from kombine.kaplan_meier_likelihood import KaplanMeierPlotConfig + +here = pathlib.Path(__file__).parent +datacards = here / "datacards" / "simple_examples" + + +# ============================================================================ +# X-axis range control (xmax) tests +# ============================================================================ + +def test_xmax_times_for_plot(): + """ + Test that get_times_for_plot correctly includes times <= xmax and xmax itself. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + # Get the full times_for_plot (no xmax) + times_full = kml.times_for_plot + print(f"Full times_for_plot: {times_full}") + + # Test with xmax=3.5 + # Death times are at 2, 3, 4, 5 based on the datacard + # With xmax=3.5, we should get: 0, 2, 3, and 3.5 itself + times_xmax = kml.get_times_for_plot(xmax=3.5) + print(f"Times with xmax=3.5: {times_xmax}") + + # Verify that all times <= 3.5 are included + assert 0 in times_xmax, "Time 0 should be included" + assert 2 in times_xmax, "Time 2 should be included (death time <= xmax)" + assert 3 in times_xmax, "Time 3 should be included (death time <= xmax)" + + # Verify that xmax itself is included if not already present + assert 3.5 in times_xmax, "xmax (3.5) should be included" + + # Verify that times beyond xmax are not included + assert 4 not in times_xmax, "Time 4 should not be included (> xmax)" + assert 5 not in times_xmax, "Time 5 should not be included (> xmax)" + + print("✓ test_xmax_times_for_plot passed") + + +def test_xmax_plot_generation(): + """ + Test that KM plot can be generated with xmax parameter. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + # Test with xmax=3.5 + output_file = here / "test_output" / "test_xmax_plot.pdf" + output_file.parent.mkdir(parents=True, exist_ok=True) + + config = KaplanMeierPlotConfig( + xmax=3.5, + saveas=output_file, + show=False, + print_progress=False, + ) + + results = kml.plot(config=config) + + # Verify that the plot was created + assert output_file.exists(), "Plot file should be created" + + # Verify that results contain expected keys + assert "x" in results, "Results should contain 'x' key" + assert "nominal" in results, "Results should contain 'nominal' key" + + print(f"✓ test_xmax_plot_generation passed - plot saved to {output_file}") + + +def test_xmax_xlim_set(): + """ + Test that x-axis limits are correctly set when xmax is provided. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + # Create a plot with xmax + config = KaplanMeierPlotConfig( + xmax=3.5, + show=False, + create_figure=True, + close_figure=False, + ) + + kml.plot(config=config) + + # Get the current axes and check xlim + ax = plt.gca() + xlim = ax.get_xlim() + + # Verify x-axis limits are set to [0, xmax] + assert xlim[0] == 0, f"X-axis lower limit should be 0, got {xlim[0]}" + assert xlim[1] == 3.5, f"X-axis upper limit should be 3.5, got {xlim[1]}" + + plt.close() + + print("✓ test_xmax_xlim_set passed") + + +def test_no_xmax_backward_compatibility(): + """ + Test that omitting xmax results in full-range plot (backward compatibility). + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + # Get times without xmax (should be same as before) + times_no_xmax = kml.get_times_for_plot(xmax=None) + times_cached = kml.times_for_plot + + # Should be identical + assert times_no_xmax == times_cached, "get_times_for_plot(None) should match cached property" + + # Create a plot without xmax + config = KaplanMeierPlotConfig( + show=False, + create_figure=True, + close_figure=False, + ) + + kml.plot(config=config) + + # Get the current axes + ax = plt.gca() + xlim = ax.get_xlim() + + # X-axis should NOT be limited to a specific value (matplotlib default behavior) + # The upper limit should be auto-scaled beyond the last data point + print(f"X-axis limits without xmax: {xlim}") + + plt.close() + + print("✓ test_no_xmax_backward_compatibility passed") + + +# ============================================================================ +# Edge cases for xmax +# ============================================================================ + +def test_xmax_beyond_all_times(): + """ + Test that xmax beyond all death times works correctly. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + # Death times are at 2, 3, 4, 5 + # Use xmax=10 which is beyond all death times + times_xmax = kml.get_times_for_plot(xmax=10.0) + print(f"Times with xmax=10.0 (beyond all deaths): {times_xmax}") + + # Should include all death times and xmax + assert 0 in times_xmax + assert 2 in times_xmax + assert 3 in times_xmax + assert 4 in times_xmax + assert 5 in times_xmax + assert 10.0 in times_xmax + + # Should not have any "extra" time beyond last death since there's no time > xmax + # The list should be: 0, 2, 3, 4, 5, 10 + assert len(times_xmax) == 6, f"Expected 6 times, got {len(times_xmax)}" + + print("✓ test_xmax_beyond_all_times passed") + + +def test_xmax_at_death_time(): + """ + Test that xmax exactly at a death time works correctly. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + # Death times are at 2, 3, 4, 5 + # Use xmax=4.0 which is exactly a death time + times_xmax = kml.get_times_for_plot(xmax=4.0) + print(f"Times with xmax=4.0 (at death time): {times_xmax}") + + # Should include 0, 2, 3, 4 (all times <= xmax) + assert 0 in times_xmax + assert 2 in times_xmax + assert 3 in times_xmax + assert 4 in times_xmax + + # Times beyond xmax should not be included + assert 5 not in times_xmax, "Times beyond xmax should not be included" + + # xmax=4 is already a death time, so it shouldn't be added again + # The list should be: 0, 2, 3, 4 + assert len(times_xmax) == 4, f"Expected 4 times, got {len(times_xmax)}" + + print("✓ test_xmax_at_death_time passed") + + +def test_xmax_before_first_death(): + """ + Test that xmax before the first death time works correctly. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + # Death times are at 2, 3, 4, 5 + # Use xmax=1.5 which is before the first death time + times_xmax = kml.get_times_for_plot(xmax=1.5) + print(f"Times with xmax=1.5 (before first death): {times_xmax}") + + # Should include 0 and xmax (1.5) + assert 0 in times_xmax + assert 1.5 in times_xmax + + # Should not include any death times since they're all > xmax + assert 2 not in times_xmax + assert 3 not in times_xmax + assert 4 not in times_xmax + assert 5 not in times_xmax + + print("✓ test_xmax_before_first_death passed") + + +def test_xmax_multiple_values(): + """ + Test xmax with multiple xmax values to ensure it works consistently. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + # Test with different xmax values + for xmax_val in [2.5, 3.5, 4.5]: + output_file = here / "test_output" / f"test_xmax_edge_{xmax_val}.pdf" + config = KaplanMeierPlotConfig( + xmax=xmax_val, + saveas=output_file, + show=False, + print_progress=False, + ) + + kml.plot(config=config) + + # Verify the plot was created + assert output_file.exists(), f"Plot file should be created for xmax={xmax_val}" + + print("✓ test_xmax_multiple_values passed - multiple plots created") + + +# ============================================================================ +# Basic plotting tests +# ============================================================================ + +def test_basic_plot_generation(): + """ + Test that basic KM plot can be generated without any special options. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + output_file = here / "test_output" / "test_basic_plot.pdf" + output_file.parent.mkdir(parents=True, exist_ok=True) + + config = KaplanMeierPlotConfig( + saveas=output_file, + show=False, + print_progress=False, + ) + + results = kml.plot(config=config) + + # Verify that the plot was created + assert output_file.exists(), "Plot file should be created" + + # Verify that results contain expected keys + assert "x" in results, "Results should contain 'x' key" + assert "nominal" in results, "Results should contain 'nominal' key" + + print(f"✓ test_basic_plot_generation passed - plot saved to {output_file}") + + +def test_plot_with_custom_title_labels(): + """ + Test that custom title and axis labels can be set. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + output_file = here / "test_output" / "test_custom_labels.pdf" + custom_title = "Custom Survival Analysis" + custom_xlabel = "Time (months)" + custom_ylabel = "Survival Probability" + + config = KaplanMeierPlotConfig( + saveas=output_file, + show=False, + print_progress=False, + create_figure=True, + close_figure=False, + title=custom_title, + xlabel=custom_xlabel, + ylabel=custom_ylabel, + ) + + kml.plot(config=config) + + # Verify the plot was created + assert output_file.exists(), "Plot file should be created" + + # Verify the labels are actually set + ax = plt.gca() + assert ax.get_title() == custom_title, f"Title should be '{custom_title}'" + assert ax.get_xlabel() == custom_xlabel, f"X-label should be '{custom_xlabel}'" + assert ax.get_ylabel() == custom_ylabel, f"Y-label should be '{custom_ylabel}'" + + plt.close() + + print("✓ test_plot_with_custom_title_labels passed") + + +def test_plot_with_custom_figsize(): + """ + Test that custom figure size can be set. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + output_file = here / "test_output" / "test_custom_figsize.pdf" + custom_figsize = (10, 6) + + config = KaplanMeierPlotConfig( + saveas=output_file, + show=False, + print_progress=False, + create_figure=True, + close_figure=False, + figsize=custom_figsize, + ) + + kml.plot(config=config) + + # Verify the plot was created + assert output_file.exists(), "Plot file should be created" + + # Verify the figure size is correct + fig = plt.gcf() + actual_figsize = fig.get_size_inches() + assert np.allclose(actual_figsize, custom_figsize), \ + f"Figure size should be {custom_figsize}, got {actual_figsize}" + + plt.close() + + print("✓ test_plot_with_custom_figsize passed") + + +def test_plot_exclude_nominal(): + """ + Test plotting without the nominal line. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + output_file = here / "test_output" / "test_exclude_nominal.pdf" + config = KaplanMeierPlotConfig( + saveas=output_file, + show=False, + print_progress=False, + create_figure=True, + close_figure=False, + include_nominal=False, + ) + + results = kml.plot(config=config) + + assert output_file.exists(), "Plot file should be created" + + # The nominal curve should still be in results even if not plotted + assert "nominal" in results, "Results should contain 'nominal' key" + + # Verify that the nominal line is not in the plot + # Check the number of lines - should have fewer without nominal + ax = plt.gca() + lines = ax.get_lines() + # With include_nominal=False, there should be no line labeled as nominal + line_labels = [str(line.get_label()).lower() for line in lines] + has_nominal = "kaplan-meier" in line_labels or any("nominal" in label for label in line_labels) + assert not has_nominal, "Nominal line should not be plotted when include_nominal=False" + + plt.close() + + print("✓ test_plot_exclude_nominal passed") + + +def test_plot_include_binomial_only(): + """ + Test plotting with binomial-only error bands. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + output_file = here / "test_output" / "test_binomial_only.pdf" + config = KaplanMeierPlotConfig( + saveas=output_file, + show=False, + print_progress=False, + include_binomial_only=True, + ) + + results = kml.plot(config=config) + + assert output_file.exists(), "Plot file should be created" + assert "x" in results, "Results should contain 'x' key" + print("✓ test_plot_include_binomial_only passed") + + +def test_plot_include_patient_wise_only(): + """ + Test plotting with patient-wise-only error bands. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + output_file = here / "test_output" / "test_patient_wise_only.pdf" + config = KaplanMeierPlotConfig( + saveas=output_file, + show=False, + print_progress=False, + include_patient_wise_only=True, + ) + + results = kml.plot(config=config) + + assert output_file.exists(), "Plot file should be created" + assert "x" in results, "Results should contain 'x' key" + print("✓ test_plot_include_patient_wise_only passed") + + +def test_plot_no_legend(): + """ + Test plotting with legend_loc=None (implementation may vary). + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + output_file = here / "test_output" / "test_no_legend.pdf" + config = KaplanMeierPlotConfig( + saveas=output_file, + show=False, + print_progress=False, + legend_loc=None, # Request no legend + ) + + # Test that the config is accepted and plot is created + kml.plot(config=config) + + assert output_file.exists(), "Plot file should be created" + # Note: The actual legend behavior may depend on the implementation + # This test verifies that legend_loc=None is accepted as a valid config + + print("✓ test_plot_no_legend passed") + + +def test_plot_with_xmax_and_custom_options(): + """ + Test combining xmax with other custom plot options. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + output_file = here / "test_output" / "test_xmax_custom_options.pdf" + config = KaplanMeierPlotConfig( + saveas=output_file, + show=False, + print_progress=False, + xmax=4.0, + title="Survival Analysis (Limited Range)", + xlabel="Time (months)", + ylabel="Survival Rate", + figsize=(8, 6), + ) + + results = kml.plot(config=config) + + assert output_file.exists(), "Plot file should be created" + assert "x" in results, "Results should contain 'x' key" + print("✓ test_plot_with_xmax_and_custom_options passed") + + +def test_plot_different_output_formats(): + """ + Test that plots can be saved in different formats. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + # Test PDF (already tested elsewhere, but for completeness) + pdf_file = here / "test_output" / "test_format.pdf" + config_pdf = KaplanMeierPlotConfig( + saveas=pdf_file, + show=False, + print_progress=False, + ) + kml.plot(config=config_pdf) + assert pdf_file.exists(), "PDF file should be created" + + # Test PNG + png_file = here / "test_output" / "test_format.png" + config_png = KaplanMeierPlotConfig( + saveas=png_file, + show=False, + print_progress=False, + ) + kml.plot(config=config_png) + assert png_file.exists(), "PNG file should be created" + + print("✓ test_plot_different_output_formats passed") + + +def test_plot_returns_expected_structure(): + """ + Test that plot results contain all expected keys and data structures. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + config = KaplanMeierPlotConfig( + show=False, + print_progress=False, + ) + + results = kml.plot(config=config) + + # Check that essential keys are present + assert "x" in results, "Results should contain 'x' key" + assert "nominal" in results, "Results should contain 'nominal' key" + + # Check that x and nominal are array-like + assert len(results["x"]) > 0, "x should have data points" + assert len(results["nominal"]) > 0, "nominal should have data points" + + # Check that x and nominal have the same length + assert len(results["x"]) == len(results["nominal"]), \ + "x and nominal should have the same length" + + print("✓ test_plot_returns_expected_structure passed") + + +def test_plot_with_no_tight_layout(): + """ + Test plotting with tight_layout disabled. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + output_file = here / "test_output" / "test_no_tight_layout.pdf" + config = KaplanMeierPlotConfig( + saveas=output_file, + show=False, + print_progress=False, + tight_layout=False, + ) + + kml.plot(config=config) + + assert output_file.exists(), "Plot file should be created" + print("✓ test_plot_with_no_tight_layout passed") + + +def test_xmax_with_different_error_bands(): + """ + Test that xmax works correctly with different error band configurations. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + # Test xmax with binomial-only + output_file1 = here / "test_output" / "test_xmax_binomial.pdf" + config1 = KaplanMeierPlotConfig( + saveas=output_file1, + show=False, + print_progress=False, + xmax=3.5, + include_binomial_only=True, + ) + kml.plot(config=config1) + assert output_file1.exists(), "Plot with xmax and binomial-only should be created" + + # Test xmax with patient-wise-only + output_file2 = here / "test_output" / "test_xmax_patient_wise.pdf" + config2 = KaplanMeierPlotConfig( + saveas=output_file2, + show=False, + print_progress=False, + xmax=3.5, + include_patient_wise_only=True, + ) + kml.plot(config=config2) + assert output_file2.exists(), "Plot with xmax and patient-wise-only should be created" + + print("✓ test_xmax_with_different_error_bands passed") + + +def test_plot_with_custom_font_sizes(): + """ + Test plotting with custom font sizes. + """ + dcfile = datacards / "simple_km_few_deaths.txt" + datacard = kombine.datacard.Datacard.parse_datacard(dcfile) + kml = datacard.km_likelihood(parameter_min=-np.inf, parameter_max=np.inf) + + output_file = here / "test_output" / "test_custom_fonts.pdf" + config = KaplanMeierPlotConfig( + saveas=output_file, + show=False, + print_progress=False, + title="Test Plot", + title_fontsize=16, + label_fontsize=12, + tick_fontsize=10, + legend_fontsize=10, + ) + + kml.plot(config=config) + + assert output_file.exists(), "Plot file should be created" + print("✓ test_plot_with_custom_font_sizes passed") + + +if __name__ == "__main__": + # X-axis range control tests + test_xmax_times_for_plot() + test_xmax_plot_generation() + test_xmax_xlim_set() + test_no_xmax_backward_compatibility() + + # Edge case tests + test_xmax_beyond_all_times() + test_xmax_at_death_time() + test_xmax_before_first_death() + test_xmax_multiple_values() + + # Basic plotting tests + test_basic_plot_generation() + test_plot_with_custom_title_labels() + test_plot_with_custom_figsize() + test_plot_exclude_nominal() + test_plot_include_binomial_only() + test_plot_include_patient_wise_only() + test_plot_no_legend() + test_plot_with_xmax_and_custom_options() + test_plot_different_output_formats() + test_plot_returns_expected_structure() + test_plot_with_no_tight_layout() + test_xmax_with_different_error_bands() + test_plot_with_custom_font_sizes() + + print("\n✅ All KM plotting tests passed!")