Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…into main
  • Loading branch information
hosseinfani committed Mar 8, 2023
2 parents d0de3ad + 3603bc1 commit c12d284
Showing 1 changed file with 31 additions and 15 deletions.
46 changes: 31 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,10 @@
1. [Setup](#1-setup)
2. [Quickstart](#2-quickstart)
3. [Pipeline](#3-pipeline)
4. [Acknowledgement](#4-acknowledgement)
5. [License](#5-license)
6. [Citation](#6-citation)
4. [Result](#4-result)
5. [Acknowledgement](#5-acknowledgement)
6. [License](#6-license)
7. [Citation](#7-citation)

## 1. Setup
`Adila` needs ``Python=3.8`` and others packages listed in [``requirements.txt``](requirements.txt):
Expand Down Expand Up @@ -53,11 +54,19 @@ python -u main.py \

Where the arguements are:

> `fteamsvecs`: the sparse matrix representation of all teams in a pickle file, including the teams whose members are predicted in `--pred`. It should contain a dictionary of three `lil_matrix` with keys `[id]` of size `[#teams × 1]`, `[skill]` of size `[#teams × #skills]`, `[member]` of size `[#teams × #experts]`. Simply, each row of a metrix shows the occurrence vector of skills and experts in a team. For a toy example, try ```import pickle; with open([`./data/preprocessed/dblp/toy.dblp.v12.json/teamsvecs.pkl`](./data/preprocessed/dblp/toy.dblp.v12.json/teamsvecs.pkl))) as f: teams=pickle.load(f)```
> `fteamsvecs`: the sparse matrix representation of all teams in a pickle file, including the teams whose members are predicted in `--pred`. It should contain a dictionary of three `lil_matrix` with keys `[id]` of size `[#teams × 1]`, `[skill]` of size `[#teams × #skills]`, `[member]` of size `[#teams × #experts]`. Simply, each row of a metrix shows the occurrence vector of skills and experts in a team. For a toy example, try
```
import pickle
with open(./data/preprocessed/dblp/toy.dblp.v12.json/teamsvecs.pkl) as f: teams=pickle.load(f)
```

> `fsplit`: the split.json file that indicates the index (rowid) of teams whose members are predicted in `--pred`. For a toy example, see [`output/toy.dblp.v12.json/splits.json`](output/toy.dblp.v12.json/splits.json)
> `fpred`: a file or folder that includes the prediction files of a neural team formation methods in the format of `torch.ndarray`. The file name(s) should be `*.pred` and the content is `[#test × #experts]` probabilities that shows the membership probability of an expert to a team in test set. For a toy example, try ```import torch; torch.load([`./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/f0.test.pred`](./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/f0.test.pred)))```
> `fpred`: a file or folder that includes the prediction files of a neural team formation methods in the format of `torch.ndarray`. The file name(s) should be `*.pred` and the content is `[#test × #experts]` probabilities that shows the membership probability of an expert to a team in test set. For a toy example, try
```
import torch
torch.load(./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/f0.test.pred)
```

> `ratio`: the desired `nonpopular` ratio among members of predicted teams after mitigation process by re-ranking algorithms. E.g., 0.5.
Expand All @@ -72,31 +81,38 @@ Where the arguements are:
1. Labeling: Based on the distribution of experts on teams, which is power law (long tail) as shown in the figure, we label those in the `tail` as `nonpopular` and those in the `head` as popular.
To find the cutoff between `head` and `tail`, we calculate the average number of teams per expert over the whole dataset. As seen in table, this number is `62.45` and the popular/nonpopular ratio is `0.426/0.574`. The result is a Boolean value in `{popular: True, nonpopular: False}` for each expert and is save in `{output}/popularity.csv` like [`./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/popularity.csv`](./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/popularity.csv)

`Future:` We will consider equal area under the curve for the cutoff
`Future:` We will consider equal area under the curve for the cutoff.

2. Reranking: We apply rerankers from [`deterministic greedy re-ranking methods [Geyik et al. KDD'19]`](https://dl.acm.org/doi/10.1145/3292500.3330691), including `{'det_greedy', 'det_cons', 'det_relaxed'}` to mitigate `populairty bias`. The reranker needs a cutoff `k_max` which is set to `10` by default.
The result of predictions after reranking is saved in `{output}/rerank/{fpred}.rerank.{reranker}.{k_max}' like ***.
The result of predictions after reranking is saved in `{output}/rerank/{fpred}.rerank.{reranker}.{k_max}` like ***.

3. Evaluations: We evaluate `fairness` and `utility` metrics `before` and `after` applying rerankers on team predictions to answer two research questions (RQs):
RQ1: Do state-of-the-art neural team formation models produce fair teams of experts in terms of popularity bias? To this end, we measure the fairness scores of predicted teams `before` applying rerankers.
RQ2: Do state-of-the-art deterministic greedy re-ranking algorithms improve the fairness of neural team formation models while maintaining their accuracy? To this end, we measure the `fairness` and `utility` metrics `before` and `after` applying rerankers.
The result of `fairness` metrics `before` and `after` are stored in `{output}.{algorithm}.{k_max}.{faireval}` like ***.
The result of `utility` metrics `before` and `after` are stored in `{output}.{algorithm}.{k_max}.{utileval}` like ***.

**`RQ1:`** Do state-of-the-art neural team formation models produce fair teams of experts in terms of popularity bias? To this end, we measure the fairness scores of predicted teams `before` applying rerankers.

**`RQ2:`** Do state-of-the-art deterministic greedy re-ranking algorithms improve the fairness of neural team formation models while maintaining their accuracy? To this end, we measure the `fairness` and `utility` metrics `before` and `after` applying rerankers.

The result of `fairness` metrics `before` and `after` will be stored in `{output}.{algorithm}.{k_max}.{faireval}.csv` like ***.

The result of `utility` metrics `before` and `after` will be stored in `{output}.{algorithm}.{k_max}.{utileval}.csv` like ***.

`Future:` We will consider other fairness metrics.

## 4. Acknowledgement

## 4. Result
***

## 5. Acknowledgement
We benefit from [``pytrec``](https://github.com/cvangysel/pytrec_eval), [``reranking``](https://github.com/yuanlonghao/reranking), and other libraries. We would like to thank the authors of these libraries and helpful resources.

## 5. License
## 6. License
©2023. This work is licensed under a [CC BY-NC-SA 4.0](license.txt) license.

Hamed Loghmani<sup>1</sup>, [Hossein Fani](https://hosseinfani.github.io/)<sup>1,2</sup>

<sup><sup>1</sup>School of Computer Science, Faculty of Science, University of Windsor, ON, Canada.</sup>
<sup><sup>2</sup>[hfani@uwindsor.ca](mailto:hfani@uwindsor.ca)</sup>

## 6. Citation
## 7. Citation
```
@inproceedings{DBLP:conf/bias/LoghmaniF23,
author = {Hamed Loghmani and Hossein Fani},
Expand Down

0 comments on commit c12d284

Please sign in to comment.