Merge branch 'main' of https://github.com/fani-lab/fair_team_formation …

…into main
fani-lab · Mar 8, 2023 · c12d284 · c12d284
2 parents d0de3ad + 3603bc1
commit c12d284
Showing 1 changed file with 31 additions and 15 deletions.
diff --git a/README.md b/README.md
@@ -14,9 +14,10 @@
 1. [Setup](#1-setup)
 2. [Quickstart](#2-quickstart)
 3. [Pipeline](#3-pipeline)
-4. [Acknowledgement](#4-acknowledgement)
-5. [License](#5-license)  
-6. [Citation](#6-citation)
+4. [Result](#4-result)
+5. [Acknowledgement](#5-acknowledgement)
+6. [License](#6-license)  
+7. [Citation](#7-citation)
 
 ## 1. Setup
 `Adila` needs ``Python=3.8`` and others packages listed in [``requirements.txt``](requirements.txt):
@@ -53,11 +54,19 @@ python -u main.py \
 
 Where the arguements are:
 
-  > `fteamsvecs`: the sparse matrix representation of all teams in a pickle file, including the teams whose members are predicted in `--pred`. It should contain a dictionary of three `lil_matrix` with keys `[id]` of size `[#teams × 1]`, `[skill]` of size `[#teams × #skills]`, `[member]` of size `[#teams × #experts]`. Simply, each row of a metrix shows the occurrence vector of skills and experts in a team. For a toy example, try ```import pickle; with open([`./data/preprocessed/dblp/toy.dblp.v12.json/teamsvecs.pkl`](./data/preprocessed/dblp/toy.dblp.v12.json/teamsvecs.pkl))) as f: teams=pickle.load(f)```
+  > `fteamsvecs`: the sparse matrix representation of all teams in a pickle file, including the teams whose members are predicted in `--pred`. It should contain a dictionary of three `lil_matrix` with keys `[id]` of size `[#teams × 1]`, `[skill]` of size `[#teams × #skills]`, `[member]` of size `[#teams × #experts]`. Simply, each row of a metrix shows the occurrence vector of skills and experts in a team. For a toy example, try 
+  ```
+  import pickle
+  with open(./data/preprocessed/dblp/toy.dblp.v12.json/teamsvecs.pkl) as f: teams=pickle.load(f)
+  ```
 
   > `fsplit`: the split.json file that indicates the index (rowid) of teams whose members are predicted in `--pred`. For a toy example, see [`output/toy.dblp.v12.json/splits.json`](output/toy.dblp.v12.json/splits.json)  
 
-  > `fpred`: a file or folder that includes the prediction files of a neural team formation methods in the format of `torch.ndarray`. The file name(s) should be `*.pred` and the content is `[#test × #experts]` probabilities that shows the membership probability of an expert to a team in test set. For a toy example, try ```import torch; torch.load([`./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/f0.test.pred`](./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/f0.test.pred)))```     
+  > `fpred`: a file or folder that includes the prediction files of a neural team formation methods in the format of `torch.ndarray`. The file name(s) should be `*.pred` and the content is `[#test × #experts]` probabilities that shows the membership probability of an expert to a team in test set. For a toy example, try 
+  ```
+  import torch
+  torch.load(./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/f0.test.pred)
+  ```     
 
   > `ratio`: the desired `nonpopular` ratio among members of predicted teams after mitigation process by re-ranking algorithms. E.g., 0.5.
   
@@ -72,31 +81,38 @@ Where the arguements are:
 1. Labeling: Based on the distribution of experts on teams, which is power law (long tail) as shown in the figure, we label those in the `tail` as `nonpopular` and those in the `head` as popular.
    To find the cutoff between `head` and `tail`, we calculate the average number of teams per expert over the whole dataset. As seen in table, this number is `62.45` and the popular/nonpopular ratio is `0.426/0.574`.  The result is a Boolean value in `{popular: True, nonpopular: False}` for each expert and is save in `{output}/popularity.csv` like [`./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/popularity.csv`](./output/toy.dblp.v12.json/bnn/t31.s11.m13.l[100].lr0.1.b4096.e20.s1/rerank/popularity.csv) 
 
-    `Future:` We will consider equal area under the curve for the cutoff
+    `Future:` We will consider equal area under the curve for the cutoff.
 
 2. Reranking: We apply rerankers from [`deterministic greedy re-ranking methods [Geyik et al. KDD'19]`](https://dl.acm.org/doi/10.1145/3292500.3330691), including `{'det_greedy', 'det_cons', 'det_relaxed'}` to mitigate `populairty bias`. The reranker needs a cutoff `k_max` which is set to `10` by default. 
-   The result of predictions after reranking is saved in `{output}/rerank/{fpred}.rerank.{reranker}.{k_max}' like ***.
+   The result of predictions after reranking is saved in `{output}/rerank/{fpred}.rerank.{reranker}.{k_max}` like ***.
 
 3. Evaluations: We evaluate `fairness` and `utility` metrics `before` and `after` applying rerankers on team predictions to answer two research questions (RQs):
-    RQ1: Do state-of-the-art neural team formation models produce fair teams of experts in terms of popularity bias? To this end, we measure the fairness scores of predicted teams `before` applying rerankers. 
-    RQ2: Do state-of-the-art deterministic greedy re-ranking algorithms improve the fairness of neural team formation models while maintaining their accuracy? To this end, we measure the `fairness` and `utility` metrics `before` and `after` applying rerankers.
-    The result of `fairness` metrics `before` and `after` are stored in `{output}.{algorithm}.{k_max}.{faireval}` like ***.
-    The result of `utility` metrics `before` and `after` are stored in `{output}.{algorithm}.{k_max}.{utileval}` like ***.
+
+    **`RQ1:`** Do state-of-the-art neural team formation models produce fair teams of experts in terms of popularity bias? To this end, we measure the fairness scores of predicted teams `before` applying rerankers. 
+
+    **`RQ2:`** Do state-of-the-art deterministic greedy re-ranking algorithms improve the fairness of neural team formation models while maintaining their accuracy? To this end, we measure the `fairness` and `utility` metrics `before` and `after` applying rerankers.
+
+    The result of `fairness` metrics `before` and `after` will be stored in `{output}.{algorithm}.{k_max}.{faireval}.csv` like ***.
+
+    The result of `utility` metrics `before` and `after` will be stored in `{output}.{algorithm}.{k_max}.{utileval}.csv` like ***.
 
     `Future:` We will consider other fairness metrics.
-
-## 4. Acknowledgement
+
+## 4. Result
+***
+
+## 5. Acknowledgement
 We benefit from [``pytrec``](https://github.com/cvangysel/pytrec_eval), [``reranking``](https://github.com/yuanlonghao/reranking), and other libraries. We would like to thank the authors of these libraries and helpful resources.
 
-## 5. License
+## 6. License
 ©2023. This work is licensed under a [CC BY-NC-SA 4.0](license.txt) license.
 
 Hamed Loghmani<sup>1</sup>, [Hossein Fani](https://hosseinfani.github.io/)<sup>1,2</sup> 
 
 <sup><sup>1</sup>School of Computer Science, Faculty of Science, University of Windsor, ON, Canada.</sup>
 <sup><sup>2</sup>[hfani@uwindsor.ca](mailto:hfani@uwindsor.ca)</sup>
 
-## 6. Citation
+## 7. Citation
 ```
 @inproceedings{DBLP:conf/bias/LoghmaniF23,
   author    = {Hamed Loghmani and Hossein Fani},