Skip to content

Commit 94afe8a

Browse files
authored
Merge pull request #2 from mwydmuch/fw-updates
Refactor of Frank Wolfe method, add support for predictions without the k budget constraint, implemented some online experiments
2 parents ef7cf98 + a6640dc commit 94afe8a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+11342
-2304
lines changed

.gitignore

+103-5
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,106 @@
1-
experiments/datasets
2-
experiments/predictions
3-
experiments/results*
4-
lightning_logs
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
55

6-
__pycache__
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
share/python-wheels/
24+
*.egg-info/
25+
.installed.cfg
26+
*.egg
27+
MANIFEST
28+
29+
# PyInstaller
30+
# Usually these files are written by a python script from a template
31+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
32+
*.manifest
33+
*.spec
34+
35+
# Installer logs
36+
pip-log.txt
37+
pip-delete-this-directory.txt
38+
39+
# Unit test / coverage reports
40+
htmlcov/
41+
.tox/
42+
.nox/
43+
.coverage
44+
.coverage.*
45+
.cache
46+
nosetests.xml
47+
coverage.xml
48+
*.cover
49+
*.py,cover
50+
.hypothesis/
51+
.pytest_cache/
52+
cover/
53+
54+
# Sphinx documentation
55+
docs/_build/
56+
57+
# PyBuilder
58+
.pybuilder/
59+
target/
60+
61+
# Jupyter Notebook
62+
.ipynb_checkpoints
63+
64+
# IPython
65+
profile_default/
66+
ipython_config.py
67+
68+
69+
# Environments
70+
.env
71+
.venv
72+
env/
73+
venv/
74+
ENV/
75+
env.bak/
76+
venv.bak/
77+
78+
# Spyder project settings
79+
.spyderproject
80+
.spyproject
81+
82+
# Rope project settings
83+
.ropeproject
84+
85+
# mkdocs documentation
86+
/site
87+
88+
# mypy
89+
.mypy_cache/
90+
.dmypy.json
91+
dmypy.json
92+
93+
# Pyre type checker
94+
.pyre/
95+
96+
# pytype static type analyzer
97+
.pytype/
98+
99+
# Cython debug symbols
100+
cython_debug/
101+
102+
# VS Code
7103
.vscode
104+
105+
# PyCharm
8106
.idea

.pre-commit-config.yaml

+7-4
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,13 @@ repos:
1717
- id: check-shebang-scripts-are-executable
1818
- id: detect-private-key
1919
- id: debug-statements
20-
- repo: https://github.com/codespell-project/codespell
21-
rev: v2.2.4
22-
hooks:
23-
- id: codespell
20+
# - repo: https://github.com/codespell-project/codespell
21+
# rev: v2.2.4
22+
# hooks:
23+
# - id: codespell
24+
# name: codespell
25+
# entry: codespell
26+
# args: ["xcolumns"]
2427
# - repo: https://github.com/PyCQA/flake8
2528
# rev: 6.0.0
2629
# hooks:

README.md

+56-16
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,82 @@
1-
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://pre-commit.com/) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
1+
[![PyPI version](https://badge.fury.io/py/xcolumns.svg)](https://badge.fury.io/py/xcolumns)
2+
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://pre-commit.com/)
3+
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
4+
25

36
<p align="center">
4-
<img src="https://raw.githubusercontent.com/mwydmuch/xCOLUMNs/master/xCOLUMNs_logo.png" width="500px"/>
7+
<img src="https://raw.githubusercontent.com/mwydmuch/xCOLUMNs/master/docs/_static/xCOLUMNs_logo.png" width="500px"/>
58
</p>
69

710
# x **Consistent Optimization of Label-wise Utilities in Multi-label classificatioN** s
811

9-
xCOLUMNs is a small Python library aims to implement different methods for optimization of general family of label-wise utilities (performance metrics) in multi-label classification, which scale to large (extreme) datasets.
12+
xCOLUMNs is a small Python library that aims to implement different methods for the optimization of a general family of
13+
metrics that can be defined on multi-label classification matrices.
14+
These include, but are not limited to, label-wise metrics.
15+
The library provides an efficient implementation of the different optimization methods that easily scale to the extreme multi-label classification (XMLC) - problems with a very large number of labels and instances.
16+
17+
All the methods operate on conditional probability estimates of the labels, which are the output of the multi-label classification models.
18+
Based on these estimates, the methods aim to find the optimal prediction for a given test set or to find the optimal population classifier as a plug-in rule on top of the conditional probability estimator.
19+
This makes the library very flexible and allows to use it with any multi-label classification model that provides conditional probability estimates.
20+
The library directly supports numpy arrays, PyTorch tensors, and sparse CSR matrices from scipy as input/output data types.
21+
22+
For more details, please see our short usage guide, the documentation, and/or the papers that describe the methods implemented in the library.
1023

1124

12-
## Installation
25+
## Quick start
26+
27+
### Installation
1328

1429
The library can be installed using pip:
1530
```sh
1631
pip install xcolumns
1732
```
18-
It should work on all major platforms (Linux, Windows, Mac) and with Python 3.8+.
33+
It should work on all major platforms (Linux, macOS, Windows) and with Python 3.8+.
1934

2035

21-
## Repository structure
36+
### Usage
2237

23-
The repository is organized as follows:
24-
- `docs/` - Sphinx documentation (work in progress)
25-
- `experiments/` - a code for reproducing experiments from the papers
26-
- `xcolumns/` - Python package with the library
38+
We provide a short usage guide for the library in [short_usage_guide.ipynb](https://github.com/mwydmuch/xCOLUMNs/blob/master/short_usage_guide.ipynb) notebook.
39+
You can also check the documentation for more details.
2740

2841

2942
## Methods, usage, and how to cite
3043

3144
The library implements the following methods:
3245

33-
### Block Coordinate Ascent/Descent (BCA/BCD)
46+
### Instance-wise weighted prediction
47+
48+
The library implements a set of methods for instance-wise weighted prediction, that include optimal prediction strategies for different metrics, such as:
49+
- Precision at k
50+
- Propensity-scored precision at k
51+
- Macro-averaged recall at k
52+
- Macro-averaged balanced accuracy at k
53+
- and others ...
54+
55+
### Optimization of prediction for a given test set using Block Coordinate Ascent/Descent (BCA/BCD)
3456

35-
The method is described in the paper:
36-
> [Erik Schultheis, Marek Wydmuch, Wojciech Kotłowski, Rohit Babbar, Krzysztof Dembczyński. Generalized test utilities for long-tail performance in
37-
extreme multi-label classification. NeurIPS 2023.](https://arxiv.org/abs/2311.05081)
57+
The method aims to optimize the prediction for a given test set using the block coordinate ascent/descent algorithm.
58+
59+
The method was first introduced and described in the paper:
60+
> [Erik Schultheis, Marek Wydmuch, Wojciech Kotłowski, Rohit Babbar, Krzysztof Dembczyński. Generalized test utilities for long-tail performance in extreme multi-label classification. NeurIPS 2023.](https://arxiv.org/abs/2311.05081)
61+
62+
### Finding optimal population classifier via Frank-Wolfe (FW)
63+
64+
The method was first introduced and described in the paper:
65+
> [Erik Schultheis, Wojciech Kotłowski, Marek Wydmuch, Rohit Babbar, Strom Borman, Krzysztof Dembczyński. Consistent algorithms for multi-label classification with macro-at-k metrics. ICLR 2024.](https://arxiv.org/abs/2401.16594)
66+
67+
68+
## Repository structure
69+
70+
The repository is organized as follows:
71+
- `docs/` - Sphinx documentation (work in progress)
72+
- `experiments/` - a code for reproducing experiments from the papers, see the README.md file in the directory for details
73+
- `xcolumns/` - Python package with the library
74+
- `tests/` - tests for the library (the coverage is bit limited at the moment, but these test should guarantee that the main components of the library works as expected)
3875

3976

40-
### Frank-Wolfe (FW)
77+
## Development and contributing
4178

42-
Description is work in progress.
79+
The library was created as a part of our research projects.
80+
We are happy to share it with the community and we hope that someone will find it useful.
81+
If you have any questions or suggestions or if you found a bug, please open an issue.
82+
We are also happy to accept contributions in the form of pull requests.

docs/Makefile

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = .
9+
BUILDDIR = _build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

docs/README.md

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Documentation
2+
3+
Documentation for xCOLUMNs is generated using [Sphinx](https://www.sphinx-doc.org/)
4+
After each commit on `master`, documentation is updated and published to [Read the Docs](https://xcolumns.readthedocs.io).
5+
6+
You can build the documentation locally. Just install Sphinx and run in ``docs`` directory:
7+
8+
```
9+
pip install -r requirements.txt
10+
make html
11+
```
12+
13+
Documentation will be created in `docs/_build` directory.

docs/_static/favicon.png

2.26 KB
Loading

generate_logo.py renamed to docs/_static/generate_logo.py

+39-13
Original file line numberDiff line numberDiff line change
@@ -55,14 +55,15 @@ def create_logo_image(grid, filled_color, column_gradients, cell_size):
5555
draw = ImageDraw.Draw(img)
5656

5757
# Create a gradient for each column
58-
for i in range(len(grid[0])):
59-
gradient_image = create_gradient(
60-
cell_size[0],
61-
image_height,
62-
column_gradients[i % len(column_gradients)],
63-
pi / 2,
64-
)
65-
img.paste(gradient_image, (i * cell_size[0], 0), gradient_image)
58+
if column_gradients is not None:
59+
for i in range(len(grid[0])):
60+
gradient_image = create_gradient(
61+
cell_size[0],
62+
image_height,
63+
column_gradients[i % len(column_gradients)],
64+
pi / 2,
65+
)
66+
img.paste(gradient_image, (i * cell_size[0], 0), gradient_image)
6667

6768
for i, row in enumerate(grid):
6869
# Define the starting and ending y coordinates for the row
@@ -146,7 +147,7 @@ def color_mod_val():
146147
)
147148

148149
# Logo with the same number of filled cells in each row (k=13)
149-
grid = """
150+
logo_grid = """
150151
....................................
151152
.....XXX.XXX.X.......X...X.X..X..XX.
152153
.....X...X.X.X.....X.XX.XX.XX.X.X...
@@ -160,6 +161,20 @@ def color_mod_val():
160161
"\n"
161162
)
162163

164+
favicon_grid = """
165+
.........
166+
.....XXX.
167+
.....X...
168+
.....X...
169+
.X.X.X...
170+
..X..X...
171+
.X.X.X...
172+
.X.X.XXX.
173+
.........
174+
""".strip().split(
175+
"\n"
176+
)
177+
163178
# Count the number of filled cells in each row
164179
# for i, row in enumerate(grid):
165180
# print(f"Full cells in row {i}: {row.count('X')}")
@@ -171,8 +186,19 @@ def color_mod_val():
171186
)
172187

173188
# Generate the gradient image
174-
logo_image = create_logo_image(grid, filled_color, columns_gradients, cell_size)
175-
176-
# Save the image or display it
189+
logo_image = create_logo_image(
190+
logo_grid, filled_color, columns_gradients, cell_size
191+
)
177192
logo_image.save("xCOLUMNs_logo.png") # Save the image as 'xCOLUMNs_logo.png'
178-
logo_image.show() # Show the image
193+
194+
# Generate the gradient image
195+
logo_image = create_logo_image(logo_grid, filled_color, None, cell_size)
196+
logo_image.save(
197+
"xCOLUMNs_logo_nobg.png"
198+
) # Save the image as 'xCOLUMNs_logo_nobg.png'
199+
200+
# Generate the favicon image
201+
favicon_image = create_logo_image(
202+
favicon_grid, filled_color, columns_gradients, cell_size
203+
)
204+
favicon_image.save("favicon.png") # Save the image as 'favicon.png'
File renamed without changes.

docs/_static/xCOLUMNs_logo_nobg.png

1.11 KB
Loading

docs/api/block_coordinate.md

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Block Coordinate-based prediction methods
2+
3+
`xcolumns.block_coordinate` module implements the methods for finding the optimal prediction for given test set using the Block Coordinate Ascend/Desend algorithm with 0-th order approximation of expected utility.
4+
The method was first introduced and described in the paper:
5+
> [Erik Schultheis, Marek Wydmuch, Wojciech Kotłowski, Rohit Babbar, Krzysztof Dembczyński. Generalized test utilities for long-tail performance in extreme multi-label classification. NeurIPS 2023.](https://arxiv.org/abs/2311.05081)
6+
7+
Note: BCA/BCD with 0-approximationuses tp, fp, fn, tn matrices parametrization of the confussion matrix,
8+
as opposed to algorithms presented in the paper, which use :math:`t, q, p` parametrization. However both algorithms are equivalent.
9+
10+
The main function of the module is [**predict_using_bc_with_0approx**](#xcolumns.block_coordinate.predict_using_bc_with_0approx):
11+
12+
```{eval-rst}
13+
.. autofunction:: xcolumns.block_coordinate.predict_using_bc_with_0approx
14+
```
15+
16+
## Wrapper functions for specific metrics
17+
18+
The module provides the wrapper functions for specific metrics that can be used as arguments for the `predict_using_bc_with_0approx` function as well as factory function for creating such wrapper functions.
19+
20+
```{eval-rst}
21+
.. automodule:: xcolumns.block_coordinate
22+
:members:
23+
:exclude-members: predict_using_bc_with_0approx, predict_optimizing_coverage_using_bc
24+
:undoc-members:
25+
:show-inheritance:
26+
```
27+
28+
29+
## Special function for optimization of coverage
30+
31+
The module provides the special function for optimization of coverage metric that use other way of estimating the expected value of the metric than `predict_using_bc_with_0approx` function.e
32+
33+
```{eval-rst}
34+
.. autofunction:: xcolumns.block_coordinate.predict_optimizing_coverage_using_bc
35+
```

docs/api/confusion_matrix.md

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Confusion Matrix
2+
3+
`xcolumns.confusion_matrix` module implements confusion matrix object and functions that can be used to calculate it.
4+
In xCOLUMNs, the confusion matrix is parametrized by four matrices: true positive (tp), false positive (fp), false negative (fn), and true negative (tn).
5+
6+
```{eval-rst}
7+
.. automodule:: xcolumns.confusion_matrix
8+
:members:
9+
:undoc-members:
10+
:show-inheritance:
11+
```

0 commit comments

Comments
 (0)