(II. Mitigating the Bias) Learning to Rank (Rerank) the Recommended Team Members to Mitigate Popularity Bias #19

hosseinfani · 2022-03-18T20:58:23Z

This is the second step of mitigating the popularity bias using a reranking method.

Rounique · 2022-03-19T04:19:02Z

In order to have a more fair recommendation and form teams with members from different levels in work( a new researcher for example with a reasonable qualification compared to a more experienced person gets less chance to be recommended) we need to solve the long-tail problem. That means we need to present more long-tail items (people in dblp dataset) since these are the ones who are less likely to be chosen as a team member but might be qualified. Otherwise, we will have a few top researchers who are being recommended most of the time.

Rounique · 2022-03-19T05:00:21Z

I'm still not sure if we can apply what they have done in this paper to the team formation popularity bias. Because they firstly measure gender and age bias in the recommendation and then present algorithms for computing fairness-aware re-ranking.
So I am trying to figure out how we can relate these two.

Rounique · 2022-04-08T03:23:34Z

Read these two papers again and made changes in our proposed method document accordingly.

Rounique · 2022-04-15T03:09:11Z

Adding Skew measurement for evaluating bias in the recommendation.
A negative SKEWai @k indicates that candidates with value ai are underrepresented in the top k outcomes, whereas a positive SKEWai @k indicates that such candidates are favoured.

Rounique · 2022-04-15T03:33:36Z

Updating the readme file and completing it.

hosseinfani · 2022-04-15T04:53:50Z

@Rounique where do you do these changes?

Rounique · 2022-04-15T14:50:32Z

@hosseinfani We wanted to push it after discussing it. But it is added now.

hosseinfani · 2022-04-15T20:02:05Z

@Rounique @yogeswarl
you mean the changes here: yogeswarl/fair-team-formation@ee3a3ce

I'm not sure I can see any changes about what you said (skew). Can you?

yogeswarl · 2022-04-15T20:27:49Z

Hi Dr. @hosseinfani, The skew and infeasible metric were worked. It is the commit done on 14th April

yogeswarl · 2022-04-21T02:56:16Z

Good evening Dr. @hosseinfani, We are currently trying to run our code on the large dataset. But we keep running into errors, Also I spoke with Karan to get some help regarding the predictions stored in opeNTF to compare results on both our projects which is being currently worked on.
Could you help us by trying to run our code to see if it runs on the DBLP dataset. Our latest work is stored in the dev branch located here. If that does not work, we will rework to see our mistake in loading the files.

thanks,
Yogeswar

hosseinfani · 2022-04-21T04:00:54Z

Hi @yogeswarl

Can you post here what errors you face? If the code works on the toy dataset, it should work on the main dataset as they have same file structure. Unless it is related to the load (e.g., oom)
I'm sorry, I'm busy with the last week of classes and exam preps.

hosseinfani · 2022-04-21T04:01:57Z

Hi Dr. @hosseinfani, The skew and infeasible metric were worked. It is the commit done on 14th April

As I said, I could not find any lines of change regarding the skew metric. The link I posted is for the 14th April change.

yogeswarl · 2022-04-21T04:21:29Z

Hi Dr. @hosseinfani, The skew and infeasible metric were worked. It is the commit done on 14th April

As I said, I could not find any lines of change regarding the skew metric. The link I posted is for the 14th April change.

Thank you very much for your response Dr. @hosseinfani
Below I attach an image that shows what the skew and infeasible metric is .

Explaining further, the Skew metric is the logarithmic ratio of the proportion of candidates having the attribute value among the top K results corresponding to the desired proportion of attribute value.

A negative skew is less than desired, A positive skew is more than desired.

The infeasible index and count is also worked on. They were available from the library.

Infeasible index : number of indices of a list which violate a given criteria
Infeasible count : number of (attribute value and indices) pair of a list which violate a given criteria

yogeswarl · 2022-04-21T04:26:52Z

Hi @yogeswarl

Can you post here what errors you face? If the code works on the toy dataset, it should work on the main dataset as they have same file structure. Unless it is related to the load (e.g., oom)

I'm sorry, I'm busy with the last week of classes and exam preps.

The problem is indeed related to the out of memory error, My self and Roonak are getting it fixed to give final results before the submission of the evaluation!

hosseinfani · 2022-04-21T04:46:27Z

@yogeswarl
Thank you. I can see it now.

So, can you explain, based on codelines (you can put links to codelines to show where sth is done):

given a set of skill, e.g., S={s1,s4,s6}, we have golden truth that there is a team T={m3, m10}.
Based on our method, we find a team T'={m8, m10, m23}.
Based on the method you're using, you find a team T''={m3, m8, m10}. How does the reranker make the prediction better?
You calculate metrics for T, T' and T''. What are the metrics and what are the meanings with respect to our project

When you're explaining, please explain wordings that I may not familiar with. For example, 'attribute value' means 'skill'? If so, what 'desired proportion of attribute value' mean?

Same, when you explain 'infeasible', what 'violate a given criteria' mean in our project.

hosseinfani · 2022-04-21T04:48:10Z

Hi @yogeswarl

Can you post here what errors you face? If the code works on the toy dataset, it should work on the main dataset as they have same file structure. Unless it is related to the load (e.g., oom)

I'm sorry, I'm busy with the last week of classes and exam preps.

The problem is indeed related to the out of memory error, My self and Roonak are getting it fixed to give final results before the submission of the evaluation!

please post the error stacktrace here.

hosseinfani · 2022-04-21T05:08:23Z

@yogeswarl
please also refactor your code and make it readable/understandable to a general user without your direct consultation.

yogeswarl · 2022-04-21T05:53:02Z

Hi @yogeswarl

Can you post here what errors you face? If the code works on the toy dataset, it should work on the main dataset as they have same file structure. Unless it is related to the load (e.g., oom)

I'm sorry, I'm busy with the last week of classes and exam preps.

The problem is indeed related to the out of memory error, My self and Roonak are getting it fixed to give final results before the submission of the evaluation!

please post the error stacktrace here.

The Code simply stops execution, There are no errors that show up.

yogeswarl · 2022-04-21T05:53:38Z

@yogeswarl
please also refactor your code and make it readable/understandable to a general user without your direct consultation.

I will indeed do that with Comments and better variable naming following our lab's standards.

hosseinfani · 2022-04-21T05:56:33Z

Hi @yogeswarl

Can you post here what errors you face? If the code works on the toy dataset, it should work on the main dataset as they have same file structure. Unless it is related to the load (e.g., oom)

I'm sorry, I'm busy with the last week of classes and exam preps.

The problem is indeed related to the out of memory error, My self and Roonak are getting it fixed to give final results before the submission of the evaluation!

please post the error stacktrace here.

The Code simply stops execution, There are no errors that show up.

Weird! How do you know that it's due to oom then? Do you monitor your mem usage? Take a snapshot of your screen while running ur code then.

yogeswarl · 2022-04-21T07:04:42Z

First screenshot!

Second Screenshot of peak usage

third screenshot of shell killing the program

I even asked soroush to run this code, but it seems to behave the same way even in his workstation

hosseinfani · 2022-04-21T07:12:21Z

you have to:

check the shell's log. usually there is a system-wide log that provide more explanation behind the killing
put log lines and see up to which part of ur code everything is fine and where it's resource intensive

try to find the error source instead of a random guess.

yogeswarl · 2022-04-23T05:12:26Z

Hello Dr Fani,
Thank you for the tip on debugging errors occurring in large dataset.
I have fixed the issue.
Explanation on the issue:
Apparently there occurs 2 problems from my exploration of the error,

When you use the pandas version of read json, it expects the json file to be an array of json objects that are without any hierarchy i.e 1 level json objects. Since the DBLP dataset is > 1. It causes this issue, Although I don't understand why it didn't happen on the test dataset but happens on the large dataset.
The second is obviously the error on creating a single data frame from the JSON and looping it. In our toy json file this was too small to even be an issue, but on the large dataset it occurs, creating multiple lists to append and push the data indeed creates a lot of problem.

This was solved by reading the json file line by line and appending it to the array for creation of objects and lists for the reranking.
Although I have run it, the results are yet to come back. I will update on this once they are ready.

hosseinfani · 2022-04-23T05:21:33Z

@yogeswarl @Rounique
There is no need to load the actual json file. We have a sparse matrix of the whole dataset ready available. Have a look at fani-lab/OpeNTF#79

Also, read the readme of the opentf project for more info. I upload the pickle in your channel.

It reduced the size to 20M!

hosseinfani · 2022-04-23T05:26:52Z

@yogeswarl
also, where did you fix the problem? there should be some code change, right?

yogeswarl · 2022-04-23T21:40:43Z

Dr @hosseinfani the commit to fix it can be found here. yogeswarl/fair-team-formation@a2828f1

yogeswarl · 2022-04-23T21:41:21Z

@yogeswarl @Rounique There is no need to load the actual json file. We have a sparse matrix of the whole dataset ready available. Have a look at fani-lab/OpeNTF#79

Also, read the readme of the opentf project for more info. I upload the pickle in your channel.

It reduced the size to 20M!

Thank you for this! I will use this for comparison

hosseinfani · 2022-04-23T22:13:17Z

not sure what u mean by comparison. please change ur code to directly work with the pickle file.

yogeswarl · 2022-04-23T22:54:02Z

I meant the pickle file compared with the result,

yogeswarl · 2022-04-25T21:18:27Z

Hi Dr. @hosseinfani,
I am having difficulties understanding the pickle file. After loading and reading the file, I can see there are different json objects which are mapped to candidates and skills.
Our fair team formation projects requires reranking based on the number of times the candidate has published a paper on the skill as a count to decide if he is a popular author or not. In the pickle file I am not able to find any such thing.
Could you please advice how to decide in this scenario. Only this logic remains for me to feed the pickle file into the reranking!.

Thanks.

hosseinfani · 2022-04-25T22:07:41Z

@yogeswarl have you had a look at fani-lab/OpeNTF#79
In the pickle file, we have a sparse matrix, each row is a team.

yogeswarl · 2022-04-25T22:59:11Z

Thank you very much for this!. I will use this to implement.

yogeswarl · 2022-04-28T22:42:39Z

Update : Dr.@hosseinfani thank you for the issue suggestion and the pickle files posted on teams, I have gotten contents from the pickle file and converted it into a list to pass them to our reranker. Will update tonight on the final output received from the reranking. made it work on the toy dataset. Integrating it with my code so it can run on the final dataset!.

yogeswarl · 2022-05-02T23:37:00Z

Update: Dr. @hosseinfani
I am having great difficulties understanding the f1.test.pred file which was posted in our teams group 'fair team formation'
to get an understanding I looked into the toy dataset. Still of no use. I was able to extract the splits.json and work on reranking the tests alone. But could you please explain a bit more on prediction . I.E using the evaluations
Below I attach a screenshot on our NDKL value. The before reranking is the OpenTF values and after reranking is Fair team formation.

hosseinfani · 2022-05-03T00:09:49Z

not sure I understood "OpenTF values"?
Ask @VaghehDashti about the format of f1.test.pred. He can help.

yogeswarl · 2022-05-03T00:27:12Z

using the OpenTF test data from the splits.json, I ran the NDKL from the vector of members. After reranking the vector of members I ran the NDKL metric again. the results are posted above. thank you I will reach out to Arman.

yogeswarl · 2022-05-03T00:31:04Z

Just one last Query. Since me and Roonak are running out of time. Can we please write our evaluation for the NLP project based on this metric and continue working further later on as there is a lot to write in a short amount of time and honestly I want enough time to write everything we have found to get the maximum marks out of our evaluation phase. please let us know with your suggestions.

Yours Truly,

hosseinfani · 2022-05-03T00:31:16Z

using the OpenTF test data from the splits.json, I ran the NDKL from the vector of members => from where did you get the vector of members?

hosseinfani · 2022-05-03T00:32:26Z

Just one last Query. Since me and Roonak are running out of time. Can we please write our evaluation for the NLP project based on this metric and continue working further later on as there is a lot to write in a short amount of time and honestly I want enough time to write everything we have found to get the maximum marks out of our evaluation phase. please let us know with your suggestions.

Yours Truly,

this is totally up to you.

yogeswarl · 2022-05-03T00:47:25Z

using the OpenTF test data from the splits.json, I ran the NDKL from the vector of members => from where did you get the vector of members?

Y_test = vecs['member'][splits['test']]

yogeswarl · 2022-05-03T00:47:57Z

Just one last Query. Since me and Roonak are running out of time. Can we please write our evaluation for the NLP project based on this metric and continue working further later on as there is a lot to write in a short amount of time and honestly I want enough time to write everything we have found to get the maximum marks out of our evaluation phase. please let us know with your suggestions.
Yours Truly,

this is totally up to you.

Thank you.

hosseinfani · 2022-05-03T00:51:57Z

using the OpenTF test data from the splits.json, I ran the NDKL from the vector of members => from where did you get the vector of members?

Y_test = vecs['member'][splits['test']]

this is the golden list of members. So, you showed that ur result is more fair than the original list of members.

the opentf predictions is in the pred file.

yogeswarl · 2022-05-03T05:11:30Z

@hosseinfani
Here is the image of comparison between the prediction file and the reranker. Now we completely understood the assignment and are working on completing it!

Right now this is running on toy's test dataset. I will run it now on the given prediction's dataset and provide the results

hosseinfani · 2022-05-03T21:03:50Z

@yogeswarl
I am not sure I understood the numbers in the figure. Why we have 4 rows. Ideally, we should have two rows:

Golden list of members before and after reranking
Pred list of members before and after reranking

Also, you are supposed to calculate the ndcg for after reranking.

yogeswarl · 2022-05-03T21:36:26Z

I have reranked the predicted list of members based on rows provided from the splits.json https://github.com/yogeswarl/fair-team-formation/blob/fc29c7fe2eabf398928a8e2365dde3f581e3abef/main.py#L21. This is from the prediction where I rerank by row. so every test column is reranked and their NDKL is returned based after ranking and before ranking.

The problem with NDCG is that I ran into an error when I tried to run the pytrec eval code after reranking.

I am trying to solve it since last night, to only be met by this error
I would really like more details on this reranking if you are available anytime during the week, I would be happy to drop by for half an hour to your office based on your availability, else I will also wait until our meeting this week. I believe I have lost myself in a confusion about understanding the requirements. Apologies for delaying the work.

Thank you.
Yogesh

hosseinfani · 2022-05-04T04:27:48Z

@yogeswarl
I'm available tomorrow 8-9am.

Re. the error, you need to see what are the inputs to the library.

yogeswarl · 2022-05-04T19:15:01Z

Hello Dr. @hosseinfani, sincerely apologies for not being able to make it today. I just reached Windsor. Is there another time slot available today or tomorrow.

thank you for consideration.
Yogeswar

yogeswarl · 2022-05-12T03:32:33Z

Dr. @hosseinfani, We are done with the code for the toy dataset, But the provided test set file has 0.0 as values for everything. I am not sure what's wrong. But I have created a fork of the project to have pulled every code available.

Please do a code review and let me know of all the changes required.
Attached code

Thanks,
Yogeswar

yogeswarl · 2022-05-13T21:22:16Z

@

I have reranked the predicted list of members based on rows provided from the splits.json https://github.com/yogeswarl/fair-team-formation/blob/fc29c7fe2eabf398928a8e2365dde3f581e3abef/main.py#L21. This is from the prediction where I rerank by row. so every test column is reranked and their NDKL is returned based after ranking and before ranking.

The problem with NDCG is that I ran into an error when I tried to run the pytrec eval code after reranking.

I am trying to solve it since last night, to only be met by this error I would really like more details on this reranking if you are available anytime during the week, I would be happy to drop by for half an hour to your office based on your availability, else I will also wait until our meeting this week. I believe I have lost myself in a confusion about understanding the requirements. Apologies for delaying the work.

Thank you. Yogesh

Dr @hosseinfani,
This problem was solved, The reason behind the occurrence of this error was that my Python version was 3.10 instead of 3.8. I reinstalled everything and used the opeNTF requirements.txt file to replicate all the versions of the project.

hosseinfani · 2022-05-14T03:14:23Z

@yogeswarl and @Rounique the code is in our lab's github. Please do the followings by our next meeting:

@yogeswarl

In the data folder, clearly create/put the following files:

fnn.f*.test.pred: the original prediction list of an external method, e.g., fnn or bnn. This file should already exist.
fnn.f*.test.pred.eval: the IR evaluation of the prediction including e.g., ndcg, pr, .... This file should already exist too.
fnn.f*.test.pred.fair: the fairness of the prediction including ndkl or any other fairness metric. This file should be generated.
fnn.f*.test.pred.rrk: the reranked version of the prediction list by any baseline. This file should be generated.
fnn.f*.test.pred.rrk.eval: the IR evaluation of the reranked prediction including e.g., ndcg, pr, .... This file should be generated.
fnn.f*.test.pred.rrk.fair: the fairness of the reranked prediction including ndkl or any other fairness metric. This file should be generated.

@Rounique
2- Carefully read the codelines and make it readable by restructuring them into modular functions
3- Come up with the best figure/chart to show the comparative result of fnn (before) vs. rrk (after) reranking for the fairness-accuracy trade-off.

Rounique · 2022-06-10T17:38:01Z

The Problem:
Having a subset of skills -------> we want to recommend a subset of people who have these skills.
The Naive algorithm could be to choose randomly between the subset of people with those skills.

For recommendation, we actually order the whole list but usually, we need top-k of those items/people.

What we aim to do is, having an already given T' (list of recommended people), re-rank that list in a way that the fairness increases, but we still need to make sure that compared to the list T (which is our Golden Standard), the metrics do not drop(or at least not considerably).

To start, we need to show there is already bias in our T or T'(which??), then recommend a T'' list, that has less bias but is still close to the Golden Standard.

hosseinfani · 2022-06-10T19:18:20Z

T or T' (which??) ==> based on our base paper, we're dealing with algorithmic bias. So, we don't need to show there is a bias in T. But we can show there is a bias in T'

hosseinfani · 2022-06-10T19:19:14Z

@Rounique
Also, include the steps to calculate the skew and ndkl for T and random baseline based on our conversation in the meeting

Rounique · 2022-06-10T19:36:11Z

@hosseinfani
Thank you for the comment.
Yes, I am now writing those steps.

hosseinfani added the enhancement New feature or request label Mar 18, 2022

hosseinfani assigned Rounique Mar 18, 2022

hosseinfani mentioned this issue Jun 1, 2022

Make Plot to Compare the metrics(NDKL and NDCG) #24

Open

(II. Mitigating the Bias) Learning to Rank (Rerank) the Recommended Team Members to Mitigate Popularity Bias #19

(II. Mitigating the Bias) Learning to Rank (Rerank) the Recommended Team Members to Mitigate Popularity Bias #19

Comments

hosseinfani commented Mar 18, 2022

Rounique commented Mar 19, 2022

Rounique commented Mar 19, 2022

Rounique commented Apr 8, 2022

Rounique commented Apr 15, 2022

Rounique commented Apr 15, 2022

hosseinfani commented Apr 15, 2022

Rounique commented Apr 15, 2022

hosseinfani commented Apr 15, 2022

yogeswarl commented Apr 15, 2022

yogeswarl commented Apr 21, 2022

hosseinfani commented Apr 21, 2022

hosseinfani commented Apr 21, 2022

yogeswarl commented Apr 21, 2022

yogeswarl commented Apr 21, 2022

hosseinfani commented Apr 21, 2022

hosseinfani commented Apr 21, 2022

hosseinfani commented Apr 21, 2022

yogeswarl commented Apr 21, 2022

yogeswarl commented Apr 21, 2022

hosseinfani commented Apr 21, 2022

yogeswarl commented Apr 21, 2022 • edited Loading

hosseinfani commented Apr 21, 2022

yogeswarl commented Apr 23, 2022

hosseinfani commented Apr 23, 2022 • edited Loading

hosseinfani commented Apr 23, 2022

yogeswarl commented Apr 23, 2022

yogeswarl commented Apr 23, 2022

hosseinfani commented Apr 23, 2022

yogeswarl commented Apr 23, 2022

yogeswarl commented Apr 25, 2022

hosseinfani commented Apr 25, 2022

yogeswarl commented Apr 25, 2022

yogeswarl commented Apr 28, 2022 • edited Loading

yogeswarl commented May 2, 2022 • edited Loading

hosseinfani commented May 3, 2022

yogeswarl commented May 3, 2022 • edited Loading

yogeswarl commented May 3, 2022

hosseinfani commented May 3, 2022

hosseinfani commented May 3, 2022

yogeswarl commented May 3, 2022

yogeswarl commented May 3, 2022

hosseinfani commented May 3, 2022

yogeswarl commented May 3, 2022 • edited Loading

hosseinfani commented May 3, 2022

yogeswarl commented May 3, 2022

hosseinfani commented May 4, 2022

yogeswarl commented May 4, 2022

yogeswarl commented May 12, 2022

yogeswarl commented May 13, 2022 • edited by hosseinfani Loading

hosseinfani commented May 14, 2022 • edited Loading

Rounique commented Jun 10, 2022

hosseinfani commented Jun 10, 2022

hosseinfani commented Jun 10, 2022

Rounique commented Jun 10, 2022

yogeswarl commented Apr 21, 2022 •

edited

Loading

hosseinfani commented Apr 23, 2022 •

edited

Loading

yogeswarl commented Apr 28, 2022 •

edited

Loading

yogeswarl commented May 2, 2022 •

edited

Loading

yogeswarl commented May 3, 2022 •

edited

Loading

yogeswarl commented May 3, 2022 •

edited

Loading

yogeswarl commented May 13, 2022 •

edited by hosseinfani

Loading

hosseinfani commented May 14, 2022 •

edited

Loading