Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(II. Mitigating the Bias) Learning to Rank (Rerank) the Recommended Team Members to Mitigate Popularity Bias #19

Open
hosseinfani opened this issue Mar 18, 2022 · 54 comments
Assignees
Labels
enhancement New feature or request

Comments

@hosseinfani
Copy link
Member

This is the second step of mitigating the popularity bias using a reranking method.

@hosseinfani hosseinfani added the enhancement New feature or request label Mar 18, 2022
@Rounique
Copy link
Contributor

In order to have a more fair recommendation and form teams with members from different levels in work( a new researcher for example with a reasonable qualification compared to a more experienced person gets less chance to be recommended) we need to solve the long-tail problem. That means we need to present more long-tail items (people in dblp dataset) since these are the ones who are less likely to be chosen as a team member but might be qualified. Otherwise, we will have a few top researchers who are being recommended most of the time.

@Rounique
Copy link
Contributor

I'm still not sure if we can apply what they have done in this paper to the team formation popularity bias. Because they firstly measure gender and age bias in the recommendation and then present algorithms for computing fairness-aware re-ranking.
So I am trying to figure out how we can relate these two.

@Rounique
Copy link
Contributor

Rounique commented Apr 8, 2022

Read these two papers again and made changes in our proposed method document accordingly.

  1. Controlling Popularity Bias in Learning-to-Rank Recommendation
  2. Managing Popularity Bias in Recommender Systems with Personalized Re-ranking

@Rounique
Copy link
Contributor

Adding Skew measurement for evaluating bias in the recommendation.
A negative SKEWai @k indicates that candidates with value ai are underrepresented in the top k outcomes, whereas a positive SKEWai @k indicates that such candidates are favoured.

@Rounique
Copy link
Contributor

Updating the readme file and completing it.

@hosseinfani
Copy link
Member Author

@Rounique where do you do these changes?

@Rounique
Copy link
Contributor

@hosseinfani We wanted to push it after discussing it. But it is added now.

@hosseinfani
Copy link
Member Author

@Rounique @yogeswarl
you mean the changes here: yogeswarl/fair-team-formation@ee3a3ce

Screen Shot 2022-04-15 at 3 59 03 PM

I'm not sure I can see any changes about what you said (skew). Can you?

@yogeswarl
Copy link
Member

Hi Dr. @hosseinfani, The skew and infeasible metric were worked. It is the commit done on 14th April

@yogeswarl
Copy link
Member

Good evening Dr. @hosseinfani, We are currently trying to run our code on the large dataset. But we keep running into errors, Also I spoke with Karan to get some help regarding the predictions stored in opeNTF to compare results on both our projects which is being currently worked on.
Could you help us by trying to run our code to see if it runs on the DBLP dataset. Our latest work is stored in the dev branch located here. If that does not work, we will rework to see our mistake in loading the files.

thanks,
Yogeswar

@hosseinfani
Copy link
Member Author

Hi @yogeswarl

  • Can you post here what errors you face? If the code works on the toy dataset, it should work on the main dataset as they have same file structure. Unless it is related to the load (e.g., oom)
  • I'm sorry, I'm busy with the last week of classes and exam preps.

@hosseinfani
Copy link
Member Author

Hi Dr. @hosseinfani, The skew and infeasible metric were worked. It is the commit done on 14th April

As I said, I could not find any lines of change regarding the skew metric. The link I posted is for the 14th April change.

@yogeswarl
Copy link
Member

Hi Dr. @hosseinfani, The skew and infeasible metric were worked. It is the commit done on 14th April

As I said, I could not find any lines of change regarding the skew metric. The link I posted is for the 14th April change.

Thank you very much for your response Dr. @hosseinfani
Below I attach an image that shows what the skew and infeasible metric is .

Explaining further, the Skew metric is the logarithmic ratio of the proportion of candidates having the attribute value among the top K results corresponding to the desired proportion of attribute value.

A negative skew is less than desired, A positive skew is more than desired.

The infeasible index and count is also worked on. They were available from the library.

Infeasible index : number of indices of a list which violate a given criteria
Infeasible count : number of (attribute value and indices) pair of a list which violate a given criteria

Screen Shot 2022-04-21 at 12 15 01 AM

@yogeswarl
Copy link
Member

Hi @yogeswarl

  • Can you post here what errors you face? If the code works on the toy dataset, it should work on the main dataset as they have same file structure. Unless it is related to the load (e.g., oom)
  • I'm sorry, I'm busy with the last week of classes and exam preps.

The problem is indeed related to the out of memory error, My self and Roonak are getting it fixed to give final results before the submission of the evaluation!

@hosseinfani
Copy link
Member Author

@yogeswarl
Thank you. I can see it now.

So, can you explain, based on codelines (you can put links to codelines to show where sth is done):

  1. given a set of skill, e.g., S={s1,s4,s6}, we have golden truth that there is a team T={m3, m10}.
  2. Based on our method, we find a team T'={m8, m10, m23}.
  3. Based on the method you're using, you find a team T''={m3, m8, m10}. How does the reranker make the prediction better?
  4. You calculate metrics for T, T' and T''. What are the metrics and what are the meanings with respect to our project

When you're explaining, please explain wordings that I may not familiar with. For example, 'attribute value' means 'skill'? If so, what 'desired proportion of attribute value' mean?

Same, when you explain 'infeasible', what 'violate a given criteria' mean in our project.

@hosseinfani
Copy link
Member Author

Hi @yogeswarl

  • Can you post here what errors you face? If the code works on the toy dataset, it should work on the main dataset as they have same file structure. Unless it is related to the load (e.g., oom)
  • I'm sorry, I'm busy with the last week of classes and exam preps.

The problem is indeed related to the out of memory error, My self and Roonak are getting it fixed to give final results before the submission of the evaluation!

please post the error stacktrace here.

@hosseinfani
Copy link
Member Author

@yogeswarl
please also refactor your code and make it readable/understandable to a general user without your direct consultation.

@yogeswarl
Copy link
Member

Hi @yogeswarl

  • Can you post here what errors you face? If the code works on the toy dataset, it should work on the main dataset as they have same file structure. Unless it is related to the load (e.g., oom)
  • I'm sorry, I'm busy with the last week of classes and exam preps.

The problem is indeed related to the out of memory error, My self and Roonak are getting it fixed to give final results before the submission of the evaluation!

please post the error stacktrace here.

The Code simply stops execution, There are no errors that show up.

@yogeswarl
Copy link
Member

@yogeswarl
please also refactor your code and make it readable/understandable to a general user without your direct consultation.

I will indeed do that with Comments and better variable naming following our lab's standards.

@hosseinfani
Copy link
Member Author

Hi @yogeswarl

  • Can you post here what errors you face? If the code works on the toy dataset, it should work on the main dataset as they have same file structure. Unless it is related to the load (e.g., oom)
  • I'm sorry, I'm busy with the last week of classes and exam preps.

The problem is indeed related to the out of memory error, My self and Roonak are getting it fixed to give final results before the submission of the evaluation!

please post the error stacktrace here.

The Code simply stops execution, There are no errors that show up.

Weird! How do you know that it's due to oom then? Do you monitor your mem usage? Take a snapshot of your screen while running ur code then.

@yogeswarl
Copy link
Member

yogeswarl commented Apr 21, 2022

First screenshot!
Screen Shot 2022-04-21 at 2 31 54 AM
Second Screenshot of peak usage
Screen Shot 2022-04-21 at 2 40 33 AM

third screenshot of shell killing the program
Screen Shot 2022-04-21 at 3 02 43 AM

I even asked soroush to run this code, but it seems to behave the same way even in his workstation

@hosseinfani
Copy link
Member Author

you have to:

  1. check the shell's log. usually there is a system-wide log that provide more explanation behind the killing

  2. put log lines and see up to which part of ur code everything is fine and where it's resource intensive

try to find the error source instead of a random guess.

@yogeswarl
Copy link
Member

Hello Dr Fani,
Thank you for the tip on debugging errors occurring in large dataset.
I have fixed the issue.
Explanation on the issue:
Apparently there occurs 2 problems from my exploration of the error,

  1. When you use the pandas version of read json, it expects the json file to be an array of json objects that are without any hierarchy i.e 1 level json objects. Since the DBLP dataset is > 1. It causes this issue, Although I don't understand why it didn't happen on the test dataset but happens on the large dataset.
  2. The second is obviously the error on creating a single data frame from the JSON and looping it. In our toy json file this was too small to even be an issue, but on the large dataset it occurs, creating multiple lists to append and push the data indeed creates a lot of problem.

This was solved by reading the json file line by line and appending it to the array for creation of objects and lists for the reranking.
Although I have run it, the results are yet to come back. I will update on this once they are ready.

@hosseinfani
Copy link
Member Author

hosseinfani commented Apr 23, 2022

@yogeswarl @Rounique
There is no need to load the actual json file. We have a sparse matrix of the whole dataset ready available. Have a look at fani-lab/OpeNTF#79

Also, read the readme of the opentf project for more info. I upload the pickle in your channel.

It reduced the size to 20M!

@hosseinfani
Copy link
Member Author

@yogeswarl
also, where did you fix the problem? there should be some code change, right?

@yogeswarl
Copy link
Member

Dr @hosseinfani the commit to fix it can be found here. yogeswarl/fair-team-formation@a2828f1

@yogeswarl
Copy link
Member

@yogeswarl @Rounique There is no need to load the actual json file. We have a sparse matrix of the whole dataset ready available. Have a look at fani-lab/OpeNTF#79

Also, read the readme of the opentf project for more info. I upload the pickle in your channel.

It reduced the size to 20M!

Thank you for this! I will use this for comparison

@hosseinfani
Copy link
Member Author

not sure what u mean by comparison. please change ur code to directly work with the pickle file.

@yogeswarl
Copy link
Member

I meant the pickle file compared with the result,

@yogeswarl
Copy link
Member

Hi Dr. @hosseinfani,
I am having difficulties understanding the pickle file. After loading and reading the file, I can see there are different json objects which are mapped to candidates and skills.
Our fair team formation projects requires reranking based on the number of times the candidate has published a paper on the skill as a count to decide if he is a popular author or not. In the pickle file I am not able to find any such thing.
Could you please advice how to decide in this scenario. Only this logic remains for me to feed the pickle file into the reranking!.

Thanks.

@hosseinfani
Copy link
Member Author

@yogeswarl have you had a look at fani-lab/OpeNTF#79
In the pickle file, we have a sparse matrix, each row is a team.

@yogeswarl
Copy link
Member

Thank you very much for this!. I will use this to implement.

@yogeswarl
Copy link
Member

yogeswarl commented Apr 28, 2022

Update : Dr.@hosseinfani thank you for the issue suggestion and the pickle files posted on teams, I have gotten contents from the pickle file and converted it into a list to pass them to our reranker. Will update tonight on the final output received from the reranking. made it work on the toy dataset. Integrating it with my code so it can run on the final dataset!.

@yogeswarl
Copy link
Member

yogeswarl commented May 2, 2022

Update: Dr. @hosseinfani
I am having great difficulties understanding the f1.test.pred file which was posted in our teams group 'fair team formation'
to get an understanding I looked into the toy dataset. Still of no use. I was able to extract the splits.json and work on reranking the tests alone. But could you please explain a bit more on prediction . I.E using the evaluations
Below I attach a screenshot on our NDKL value. The before reranking is the OpenTF values and after reranking is Fair team formation.
Screen Shot 2022-05-02 at 7 34 24 PM

@hosseinfani
Copy link
Member Author

not sure I understood "OpenTF values"?
Ask @VaghehDashti about the format of f1.test.pred. He can help.

@yogeswarl
Copy link
Member

yogeswarl commented May 3, 2022

using the OpenTF test data from the splits.json, I ran the NDKL from the vector of members. After reranking the vector of members I ran the NDKL metric again. the results are posted above. thank you I will reach out to Arman.

@yogeswarl
Copy link
Member

Just one last Query. Since me and Roonak are running out of time. Can we please write our evaluation for the NLP project based on this metric and continue working further later on as there is a lot to write in a short amount of time and honestly I want enough time to write everything we have found to get the maximum marks out of our evaluation phase. please let us know with your suggestions.

Yours Truly,

@hosseinfani
Copy link
Member Author

using the OpenTF test data from the splits.json, I ran the NDKL from the vector of members => from where did you get the vector of members?

@hosseinfani
Copy link
Member Author

Just one last Query. Since me and Roonak are running out of time. Can we please write our evaluation for the NLP project based on this metric and continue working further later on as there is a lot to write in a short amount of time and honestly I want enough time to write everything we have found to get the maximum marks out of our evaluation phase. please let us know with your suggestions.

Yours Truly,

this is totally up to you.

@yogeswarl
Copy link
Member

using the OpenTF test data from the splits.json, I ran the NDKL from the vector of members => from where did you get the vector of members?

Y_test = vecs['member'][splits['test']]

@yogeswarl
Copy link
Member

Just one last Query. Since me and Roonak are running out of time. Can we please write our evaluation for the NLP project based on this metric and continue working further later on as there is a lot to write in a short amount of time and honestly I want enough time to write everything we have found to get the maximum marks out of our evaluation phase. please let us know with your suggestions.
Yours Truly,

this is totally up to you.

Thank you.

@hosseinfani
Copy link
Member Author

using the OpenTF test data from the splits.json, I ran the NDKL from the vector of members => from where did you get the vector of members?

Y_test = vecs['member'][splits['test']]

this is the golden list of members. So, you showed that ur result is more fair than the original list of members.

the opentf predictions is in the pred file.

@yogeswarl
Copy link
Member

yogeswarl commented May 3, 2022

@hosseinfani
Here is the image of comparison between the prediction file and the reranker. Now we completely understood the assignment and are working on completing it!
Screen Shot 2022-05-03 at 1 09 54 AM

Right now this is running on toy's test dataset. I will run it now on the given prediction's dataset and provide the results

@hosseinfani
Copy link
Member Author

@yogeswarl
I am not sure I understood the numbers in the figure. Why we have 4 rows. Ideally, we should have two rows:

  1. Golden list of members before and after reranking
  2. Pred list of members before and after reranking

Also, you are supposed to calculate the ndcg for after reranking.

@yogeswarl
Copy link
Member

I have reranked the predicted list of members based on rows provided from the splits.json https://github.com/yogeswarl/fair-team-formation/blob/fc29c7fe2eabf398928a8e2365dde3f581e3abef/main.py#L21. This is from the prediction where I rerank by row. so every test column is reranked and their NDKL is returned based after ranking and before ranking.

The problem with NDCG is that I ran into an error when I tried to run the pytrec eval code after reranking.
Screen Shot 2022-05-03 at 5 32 03 PM

I am trying to solve it since last night, to only be met by this error
I would really like more details on this reranking if you are available anytime during the week, I would be happy to drop by for half an hour to your office based on your availability, else I will also wait until our meeting this week. I believe I have lost myself in a confusion about understanding the requirements. Apologies for delaying the work.

Thank you.
Yogesh

@hosseinfani
Copy link
Member Author

@yogeswarl
I'm available tomorrow 8-9am.

Re. the error, you need to see what are the inputs to the library.

@yogeswarl
Copy link
Member

Hello Dr. @hosseinfani, sincerely apologies for not being able to make it today. I just reached Windsor. Is there another time slot available today or tomorrow.

thank you for consideration.
Yogeswar

@yogeswarl
Copy link
Member

Dr. @hosseinfani, We are done with the code for the toy dataset, But the provided test set file has 0.0 as values for everything. I am not sure what's wrong. But I have created a fork of the project to have pulled every code available.

Please do a code review and let me know of all the changes required.
Attached code

Thanks,
Yogeswar

@yogeswarl
Copy link
Member

yogeswarl commented May 13, 2022

@

I have reranked the predicted list of members based on rows provided from the splits.json https://github.com/yogeswarl/fair-team-formation/blob/fc29c7fe2eabf398928a8e2365dde3f581e3abef/main.py#L21. This is from the prediction where I rerank by row. so every test column is reranked and their NDKL is returned based after ranking and before ranking.

The problem with NDCG is that I ran into an error when I tried to run the pytrec eval code after reranking. Screen Shot 2022-05-03 at 5 32 03 PM

I am trying to solve it since last night, to only be met by this error I would really like more details on this reranking if you are available anytime during the week, I would be happy to drop by for half an hour to your office based on your availability, else I will also wait until our meeting this week. I believe I have lost myself in a confusion about understanding the requirements. Apologies for delaying the work.

Thank you. Yogesh

Dr @hosseinfani,
This problem was solved, The reason behind the occurrence of this error was that my Python version was 3.10 instead of 3.8. I reinstalled everything and used the opeNTF requirements.txt file to replicate all the versions of the project.

@hosseinfani
Copy link
Member Author

hosseinfani commented May 14, 2022

@yogeswarl and @Rounique the code is in our lab's github. Please do the followings by our next meeting:

@yogeswarl

  1. In the data folder, clearly create/put the following files:
  • fnn.f*.test.pred: the original prediction list of an external method, e.g., fnn or bnn. This file should already exist.
  • fnn.f*.test.pred.eval: the IR evaluation of the prediction including e.g., ndcg, pr, .... This file should already exist too.
  • fnn.f*.test.pred.fair: the fairness of the prediction including ndkl or any other fairness metric. This file should be generated.
  • fnn.f*.test.pred.rrk: the reranked version of the prediction list by any baseline. This file should be generated.
  • fnn.f*.test.pred.rrk.eval: the IR evaluation of the reranked prediction including e.g., ndcg, pr, .... This file should be generated.
  • fnn.f*.test.pred.rrk.fair: the fairness of the reranked prediction including ndkl or any other fairness metric. This file should be generated.

@Rounique
2- Carefully read the codelines and make it readable by restructuring them into modular functions
3- Come up with the best figure/chart to show the comparative result of fnn (before) vs. rrk (after) reranking for the fairness-accuracy trade-off.

@Rounique
Copy link
Contributor

The Problem:
Having a subset of skills -------> we want to recommend a subset of people who have these skills.
The Naive algorithm could be to choose randomly between the subset of people with those skills.

For recommendation, we actually order the whole list but usually, we need top-k of those items/people.

What we aim to do is, having an already given T' (list of recommended people), re-rank that list in a way that the fairness increases, but we still need to make sure that compared to the list T (which is our Golden Standard), the metrics do not drop(or at least not considerably).

To start, we need to show there is already bias in our T or T'(which??), then recommend a T'' list, that has less bias but is still close to the Golden Standard.

@hosseinfani
Copy link
Member Author

T or T' (which??) ==> based on our base paper, we're dealing with algorithmic bias. So, we don't need to show there is a bias in T. But we can show there is a bias in T'

@hosseinfani
Copy link
Member Author

@Rounique
Also, include the steps to calculate the skew and ndkl for T and random baseline based on our conversation in the meeting

@Rounique
Copy link
Contributor

@hosseinfani
Thank you for the comment.
Yes, I am now writing those steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants