-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(II. Mitigating the Bias) Learning to Rank (Rerank) the Recommended Team Members to Mitigate Popularity Bias #19
Comments
In order to have a more fair recommendation and form teams with members from different levels in work( a new researcher for example with a reasonable qualification compared to a more experienced person gets less chance to be recommended) we need to solve the long-tail problem. That means we need to present more long-tail items (people in dblp dataset) since these are the ones who are less likely to be chosen as a team member but might be qualified. Otherwise, we will have a few top researchers who are being recommended most of the time. |
I'm still not sure if we can apply what they have done in this paper to the team formation popularity bias. Because they firstly measure gender and age bias in the recommendation and then present algorithms for computing fairness-aware re-ranking. |
Read these two papers again and made changes in our proposed method document accordingly. |
Updating the readme file and completing it. |
@Rounique where do you do these changes? |
@hosseinfani We wanted to push it after discussing it. But it is added now. |
@Rounique @yogeswarl I'm not sure I can see any changes about what you said (skew). Can you? |
Hi Dr. @hosseinfani, The skew and infeasible metric were worked. It is the commit done on 14th April |
Good evening Dr. @hosseinfani, We are currently trying to run our code on the large dataset. But we keep running into errors, Also I spoke with Karan to get some help regarding the predictions stored in opeNTF to compare results on both our projects which is being currently worked on. thanks, |
Hi @yogeswarl
|
As I said, I could not find any lines of change regarding the skew metric. The link I posted is for the 14th April change. |
Thank you very much for your response Dr. @hosseinfani Explaining further, the Skew metric is the logarithmic ratio of the proportion of candidates having the attribute value among the top K results corresponding to the desired proportion of attribute value. A negative skew is less than desired, A positive skew is more than desired. The infeasible index and count is also worked on. They were available from the library. Infeasible index : number of indices of a list which violate a given criteria |
The problem is indeed related to the out of memory error, My self and Roonak are getting it fixed to give final results before the submission of the evaluation! |
@yogeswarl So, can you explain, based on codelines (you can put links to codelines to show where sth is done):
When you're explaining, please explain wordings that I may not familiar with. For example, 'attribute value' means 'skill'? If so, what 'desired proportion of attribute value' mean? Same, when you explain 'infeasible', what 'violate a given criteria' mean in our project. |
please post the error stacktrace here. |
@yogeswarl |
The Code simply stops execution, There are no errors that show up. |
I will indeed do that with Comments and better variable naming following our lab's standards. |
Weird! How do you know that it's due to oom then? Do you monitor your mem usage? Take a snapshot of your screen while running ur code then. |
you have to:
try to find the error source instead of a random guess. |
Hello Dr Fani,
This was solved by reading the json file line by line and appending it to the array for creation of objects and lists for the reranking. |
@yogeswarl @Rounique Also, read the readme of the opentf project for more info. I upload the pickle in your channel. It reduced the size to 20M! |
@yogeswarl |
Dr @hosseinfani the commit to fix it can be found here. yogeswarl/fair-team-formation@a2828f1 |
Thank you for this! I will use this for comparison |
not sure what u mean by comparison. please change ur code to directly work with the pickle file. |
I meant the pickle file compared with the result, |
Hi Dr. @hosseinfani, Thanks. |
@yogeswarl have you had a look at fani-lab/OpeNTF#79 |
Thank you very much for this!. I will use this to implement. |
Update : Dr.@hosseinfani thank you for the issue suggestion and the pickle files posted on teams, I have gotten contents from the pickle file and converted it into a list to pass them to our reranker. Will update tonight on the final output received from the reranking. made it work on the toy dataset. Integrating it with my code so it can run on the final dataset!. |
Update: Dr. @hosseinfani |
not sure I understood "OpenTF values"? |
using the OpenTF test data from the splits.json, I ran the NDKL from the vector of members. After reranking the vector of members I ran the NDKL metric again. the results are posted above. thank you I will reach out to Arman. |
Just one last Query. Since me and Roonak are running out of time. Can we please write our evaluation for the NLP project based on this metric and continue working further later on as there is a lot to write in a short amount of time and honestly I want enough time to write everything we have found to get the maximum marks out of our evaluation phase. please let us know with your suggestions. Yours Truly, |
using the OpenTF test data from the splits.json, I ran the NDKL from the vector of members => from where did you get the vector of members? |
this is totally up to you. |
|
Thank you. |
this is the golden list of members. So, you showed that ur result is more fair than the original list of members. the opentf predictions is in the pred file. |
@hosseinfani Right now this is running on toy's test dataset. I will run it now on the given prediction's dataset and provide the results |
@yogeswarl
Also, you are supposed to calculate the ndcg for after reranking. |
I have reranked the predicted list of members based on rows provided from the splits.json https://github.com/yogeswarl/fair-team-formation/blob/fc29c7fe2eabf398928a8e2365dde3f581e3abef/main.py#L21. This is from the prediction where I rerank by row. so every test column is reranked and their NDKL is returned based after ranking and before ranking. The problem with NDCG is that I ran into an error when I tried to run the pytrec eval code after reranking. I am trying to solve it since last night, to only be met by this error Thank you. |
@yogeswarl Re. the error, you need to see what are the inputs to the library. |
Hello Dr. @hosseinfani, sincerely apologies for not being able to make it today. I just reached Windsor. Is there another time slot available today or tomorrow. thank you for consideration. |
Dr. @hosseinfani, We are done with the code for the toy dataset, But the provided test set file has 0.0 as values for everything. I am not sure what's wrong. But I have created a fork of the project to have pulled every code available. Please do a code review and let me know of all the changes required. Thanks, |
@
Dr @hosseinfani, |
@yogeswarl and @Rounique the code is in our lab's github. Please do the followings by our next meeting:
@Rounique |
The Problem: For recommendation, we actually order the whole list but usually, we need top-k of those items/people. What we aim to do is, having an already given T' (list of recommended people), re-rank that list in a way that the fairness increases, but we still need to make sure that compared to the list T (which is our Golden Standard), the metrics do not drop(or at least not considerably). To start, we need to show there is already bias in our T or T'(which??), then recommend a T'' list, that has less bias but is still close to the Golden Standard. |
T or T' (which??) ==> based on our base paper, we're dealing with algorithmic bias. So, we don't need to show there is a bias in T. But we can show there is a bias in T' |
@Rounique |
@hosseinfani |
This is the second step of mitigating the popularity bias using a reranking method.
The text was updated successfully, but these errors were encountered: