Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FA*IR Method for Neural Team Formation #80

Closed
Hamedloghmani opened this issue Aug 14, 2023 · 7 comments
Closed

FA*IR Method for Neural Team Formation #80

Hamedloghmani opened this issue Aug 14, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request experiment

Comments

@Hamedloghmani
Copy link
Member

Since the paper summarized in #66 seemed to be sufficient for our problem in theory, I decided to prototype an implementation to see the results on our partial or even toy dataset.
The following diagram shows my proposed pipeline for Adila based on FA*IR algorithm:
pipeline

Please note that this is an ongoing process and I will update each step in this thread.

@Hamedloghmani Hamedloghmani added the enhancement New feature or request label Aug 14, 2023
@Hamedloghmani Hamedloghmani self-assigned this Aug 14, 2023
Hamedloghmani added a commit that referenced this issue Aug 14, 2023
@hosseinfani
Copy link
Member

@Hamedloghmani
I'll be in lab ~3pm today to discuss flow and the codes.

@Hamedloghmani
Copy link
Member Author

@hosseinfani
Thanks a lot, see you soon.

@Hamedloghmani
Copy link
Member Author

@hosseinfani
The following table contains my early experiment results on imdb dataset. I used unigram_b outputs for bnn and bnn_emb baselines. The following experiment was done with k (top-k) = 100 and the significance level of 0.08. The significance level represents the threshold at which you're willing to consider evidence strong enough to reject the null hypothesis. Choosing an appropriate significance level involves a trade-off between being cautious about making false claims (Type I errors) and being open to detecting true effects (avoiding Type II errors).
To the best of my knowledge, these early results demonstrate negligible changes in performance ( in terms of MAP10 and NDCG10) while boosting the fairness which in our problem means lowering the ndkl metric. If you kindly approve and comfirm the validity of my pipeline and methodology, I can proceed and do the experiment on dblp as well.
One of the key parts of this new pipeline is, we only rerank in case the team determined to be unfair ( which is done by a function from the fa*ir library called is_fair())

  • In the literature, color blind ranking indicates sorting based on score or probability and ignore fairness considerations.
  • I picked 0.08 as the significance level since it is a common choice in the literature.
Dataset Fairness Notion Baseline k significance level Reranking Algorithm Color Blind NDKL NDKL After Color Blind MAP10 MAP10 After Color Blind NDCG10 NDCG10 After
imdb Demographic Parity random 100 0.08 fa*ir 0.007230588657 0.06924057782 0.001588093065 0.0012440639 0.003662102291 0.003086464484
bnn 100 0.08 fa*ir 0.2316633978 0.1792892199 0.00466983802 0.004678485827 0.01057994689 0.01059885315
bnn_emb 100 0.08 fa*ir 0.2779183553 0.182014503 0.005727984715 0.005727984715 0.0126618403 0.0126618403

@hosseinfani
Copy link
Member

@Hamedloghmani
Thank you. As we discussed during office hour, please

  • choose 0.05 as significance value, also after that with 0.01
  • consider experiments on equality of odds (the other notion of fairness)
  • consider a second metric of fairness

Please continue to log your progress here as you perform the experiments.

@Hamedloghmani
Copy link
Member Author

Hi, just wanted to share an update on the recent commits although I have updated them in #47 as well.

  • Initial integration of fa*ir in the pipeline has been finished.
  • skew implementation will be pushed tonight or tomorrow ( 5 days ahead of our initial schedule). Afterwards I'll dedicate 1 day to find potential bugs and clean the written code in the past few days.

I will start running the experiments after we finalize the above bulletpoints.
Thank you.

@hosseinfani
Copy link
Member

@Hamedloghmani
let me know when we can review the code together. thanks.

@Hamedloghmani
Copy link
Member Author

In the last commit the initial implementation of skew into the pipeline has been done. I am writing this comment to provide some explanation about how to interpret skew for ease of access.
Skew represents the logarithmic comparison between the percentage of candidates with a particular attribute value among the top k ranked results and the ideal percentage for that particular attribute. A negative Skew for a indicates an underrepresentation of candidates with the sensitive attribute a in the top k results, while a positive Skew implies a preference for such candidates. We use the logarithm to ensure that Skew values are symmetrical around zero concerning the ratios in favor of or against a specific attribute value a. For instance, a ratio of 2 to 0.5 results in the same Skew value in terms of magnitude but with opposite signs.

Hamedloghmani added a commit that referenced this issue Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request experiment
Projects
None yet
Development

No branches or pull requests

2 participants