|
1 |
| -[](https://pre-commit.com/) [](https://github.com/psf/black) |
| 1 | +[](https://badge.fury.io/py/xcolumns) |
| 2 | +[](https://pre-commit.com/) |
| 3 | +[](https://github.com/psf/black) |
| 4 | + |
2 | 5 |
|
3 | 6 | <p align="center">
|
4 |
| - <img src="https://raw.githubusercontent.com/mwydmuch/xCOLUMNs/master/xCOLUMNs_logo.png" width="500px"/> |
| 7 | + <img src="https://raw.githubusercontent.com/mwydmuch/xCOLUMNs/master/docs/_static/xCOLUMNs_logo.png" width="500px"/> |
5 | 8 | </p>
|
6 | 9 |
|
7 | 10 | # x **Consistent Optimization of Label-wise Utilities in Multi-label classificatioN** s
|
8 | 11 |
|
9 |
| -xCOLUMNs is a small Python library aims to implement different methods for optimization of general family of label-wise utilities (performance metrics) in multi-label classification, which scale to large (extreme) datasets. |
| 12 | +xCOLUMNs is a small Python library that aims to implement different methods for the optimization of a general family of |
| 13 | +metrics that can be defined on multi-label classification matrices. |
| 14 | +These include, but are not limited to, label-wise metrics. |
| 15 | +The library provides an efficient implementation of the different optimization methods that easily scale to the extreme multi-label classification (XMLC) - problems with a very large number of labels and instances. |
| 16 | + |
| 17 | +All the methods operate on conditional probability estimates of the labels, which are the output of the multi-label classification models. |
| 18 | +Based on these estimates, the methods aim to find the optimal prediction for a given test set or to find the optimal population classifier as a plug-in rule on top of the conditional probability estimator. |
| 19 | +This makes the library very flexible and allows to use it with any multi-label classification model that provides conditional probability estimates. |
| 20 | +The library directly supports numpy arrays, PyTorch tensors, and sparse CSR matrices from scipy as input/output data types. |
| 21 | + |
| 22 | +For more details, please see our short usage guide, the documentation, and/or the papers that describe the methods implemented in the library. |
10 | 23 |
|
11 | 24 |
|
12 |
| -## Installation |
| 25 | +## Quick start |
| 26 | + |
| 27 | +### Installation |
13 | 28 |
|
14 | 29 | The library can be installed using pip:
|
15 | 30 | ```sh
|
16 | 31 | pip install xcolumns
|
17 | 32 | ```
|
18 |
| -It should work on all major platforms (Linux, Windows, Mac) and with Python 3.8+. |
| 33 | +It should work on all major platforms (Linux, macOS, Windows) and with Python 3.8+. |
19 | 34 |
|
20 | 35 |
|
21 |
| -## Repository structure |
| 36 | +### Usage |
22 | 37 |
|
23 |
| -The repository is organized as follows: |
24 |
| -- `docs/` - Sphinx documentation (work in progress) |
25 |
| -- `experiments/` - a code for reproducing experiments from the papers |
26 |
| -- `xcolumns/` - Python package with the library |
| 38 | +We provide a short usage guide for the library in [short_usage_guide.ipynb](https://github.com/mwydmuch/xCOLUMNs/blob/master/short_usage_guide.ipynb) notebook. |
| 39 | +You can also check the documentation for more details. |
27 | 40 |
|
28 | 41 |
|
29 | 42 | ## Methods, usage, and how to cite
|
30 | 43 |
|
31 | 44 | The library implements the following methods:
|
32 | 45 |
|
33 |
| -### Block Coordinate Ascent/Descent (BCA/BCD) |
| 46 | +### Instance-wise weighted prediction |
| 47 | + |
| 48 | +The library implements a set of methods for instance-wise weighted prediction, that include optimal prediction strategies for different metrics, such as: |
| 49 | +- Precision at k |
| 50 | +- Propensity-scored precision at k |
| 51 | +- Macro-averaged recall at k |
| 52 | +- Macro-averaged balanced accuracy at k |
| 53 | +- and others ... |
| 54 | + |
| 55 | +### Optimization of prediction for a given test set using Block Coordinate Ascent/Descent (BCA/BCD) |
34 | 56 |
|
35 |
| -The method is described in the paper: |
36 |
| -> [Erik Schultheis, Marek Wydmuch, Wojciech Kotłowski, Rohit Babbar, Krzysztof Dembczyński. Generalized test utilities for long-tail performance in |
37 |
| -extreme multi-label classification. NeurIPS 2023.](https://arxiv.org/abs/2311.05081) |
| 57 | +The method aims to optimize the prediction for a given test set using the block coordinate ascent/descent algorithm. |
| 58 | + |
| 59 | +The method was first introduced and described in the paper: |
| 60 | +> [Erik Schultheis, Marek Wydmuch, Wojciech Kotłowski, Rohit Babbar, Krzysztof Dembczyński. Generalized test utilities for long-tail performance in extreme multi-label classification. NeurIPS 2023.](https://arxiv.org/abs/2311.05081) |
| 61 | +
|
| 62 | +### Finding optimal population classifier via Frank-Wolfe (FW) |
| 63 | + |
| 64 | +The method was first introduced and described in the paper: |
| 65 | +> [Erik Schultheis, Wojciech Kotłowski, Marek Wydmuch, Rohit Babbar, Strom Borman, Krzysztof Dembczyński. Consistent algorithms for multi-label classification with macro-at-k metrics. ICLR 2024.](https://arxiv.org/abs/2401.16594) |
| 66 | +
|
| 67 | + |
| 68 | +## Repository structure |
| 69 | + |
| 70 | +The repository is organized as follows: |
| 71 | +- `docs/` - Sphinx documentation (work in progress) |
| 72 | +- `experiments/` - a code for reproducing experiments from the papers, see the README.md file in the directory for details |
| 73 | +- `xcolumns/` - Python package with the library |
| 74 | +- `tests/` - tests for the library (the coverage is bit limited at the moment, but these test should guarantee that the main components of the library works as expected) |
38 | 75 |
|
39 | 76 |
|
40 |
| -### Frank-Wolfe (FW) |
| 77 | +## Development and contributing |
41 | 78 |
|
42 |
| -Description is work in progress. |
| 79 | +The library was created as a part of our research projects. |
| 80 | +We are happy to share it with the community and we hope that someone will find it useful. |
| 81 | +If you have any questions or suggestions or if you found a bug, please open an issue. |
| 82 | +We are also happy to accept contributions in the form of pull requests. |
0 commit comments