-
Notifications
You must be signed in to change notification settings - Fork 840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download LawSchool dataset directly from SEAPHE #492
Conversation
Signed-off-by: Ekta Bhaskar <bhaskar.ekta@ufl.edu Signed-off-by: Ekta <ekta12bhaskar@gmail.com>
"from sklearn.metrics import mean_absolute_error\n", | ||
"from sklearn.preprocessing import MinMaxScaler\n", | ||
"current_script_path=os.path.abspath('demo_grid_search_reduction_regression_sklearn.ipynb')\n", | ||
"print(current_script_path)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we'd like to clean this part up for review 👍
Also, it looks like there are a few warnings noticed in https://github.com/EktaBhaskar/AIF360/blob/master/examples/sklearn/demo_grid_search_reduction_regression_sklearn.ipynb, do you also see those warnings locally as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yehee I'm not sure why but the notebook was giving error to me on simply importing like:-
from aif360.sklearn.datasets.seaphe_datasets import load_lawschool_data from aif360.sklearn.inprocessing import GridSearchReduction from aif360.sklearn.metrics import difference
I assumed it might be because of some dependancy issues.
So I had to include sys.path.
Do you want me to write import statements as above? It might work on proper setup, just won't work on mine.
Yes, I can see warnings but not sure why it's there as I did not make any changes related to AdversarialDebiasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you replace the line in __init__.py
with the seaphe_datasets
file, it should import normally. The warning is fine but there's also an error later. It looks like you commented out
from aif360.sklearn.metrics import difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried putting in init.py of sklearn folder as well. But it gives me error:-
this is init.py:-
from aif360.sklearn.datasets.utils import standardize_dataset, NumericConversionWarning
from aif360.sklearn.datasets.openml_datasets import fetch_adult, fetch_german, fetch_bank
from aif360.sklearn.datasets.compas_dataset import fetch_compas
from aif360.sklearn.datasets.meps_datasets import fetch_meps
from aif360.sklearn.datasets.tempeh_datasets import fetch_lawschool_gpa
from aif360.sklearn.datasets.seaphe_datasets import load_lawschool_data
error:-
ImportError: cannot import name 'load_lawschool_data' from 'aif360.sklearn.datasets' (/opt/homebrew/lib/python3.10/site-packages/aif360/sklearn/datasets/__init__.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to comment :-
from aif360.sklearn.metrics import difference
Because it was giving me error even before I made any changes and was just trying to run this notebook to see how it works. And later on also, did not make any changes in this method/class:-
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In [28], line 14
12 from aif360.sklearn.inprocessing import GridSearchReduction
13 # from aif360.sklearn.datasets import load_lawschool_data
---> 14 from aif360.sklearn.metrics import difference
File /opt/homebrew/lib/python3.10/site-packages/aif360/sklearn/metrics/__init__.py:7
1 """
2 ``aif360.sklearn`` implements a number of fairness metrics for group fairness
3 and individual fairness. For guidance on which metric to use for a given
4 application, see our
5 `Guidance <http://aif360.mybluemix.net/resources#guidance>`_ page.
6 """
----> 7 from aif360.sklearn.metrics.metrics import *
File /opt/homebrew/lib/python3.10/site-packages/aif360/sklearn/metrics/metrics.py:14
11 from sklearn.utils.deprecation import deprecated
13 from aif360.sklearn.utils import check_inputs, check_groups
---> 14 from aif360.detectors.mdss.ScoringFunctions import BerkJones, Bernoulli
15 from aif360.detectors.mdss.MDSS import MDSS
17 __all__ = [
18 # meta-metrics
19 'difference', 'ratio', 'intersection', 'one_vs_rest',
(...)
38 'false_positive_rate_error'
39 ]
File /opt/homebrew/lib/python3.10/site-packages/aif360/detectors/mdss/ScoringFunctions/__init__.py:5
3 from aif360.detectors.mdss.ScoringFunctions.Poisson import Poisson
4 from aif360.detectors.mdss.ScoringFunctions.BerkJones import BerkJones
----> 5 from aif360.detectors.mdss.ScoringFunctions.Gaussian import Gaussian
File /opt/homebrew/lib/python3.10/site-packages/aif360/detectors/mdss/ScoringFunctions/Gaussian.py:1
----> 1 from turtle import pen
2 from aif360.detectors.mdss.ScoringFunctions.ScoringFunction import ScoringFunction
3 from aif360.detectors.mdss.ScoringFunctions import optim
File /opt/homebrew/Cellar/python@3.10/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/turtle.py:107
103 _ver = "turtle 1.1b- - for Python 3.1 - 4. 5. 2009"
105 # print(_ver)
--> 107 import tkinter as TK
108 import types
109 import math
File /opt/homebrew/Cellar/python@3.10/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/tkinter/__init__.py:37
34 import sys
35 import types
---> 37 import _tkinter # If this fails your Python may not be configured for Tk
38 TclError = _tkinter.TclError
39 from tkinter.constants import *
ModuleNotFoundError: No module named '_tkinter'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to install the code with pip install --editable .
from the project root
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the SEAPHE website is currently down so we can't download the data but they're looking into it. I'll comment here when it's resolved.
Once you change the name to fetch_lawschool_gpa
, I think the test case might fail since it checks if it exactly matches the version from tempeh so maybe you can double check that all the processing is the same.
from sklearn.model_selection import train_test_split | ||
|
||
|
||
def load_lawschool_data(target, subset="all", usecols=None, dropcols=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep this named fetch_lawschool_gpa
and have it replace the existing function in aif360/sklearn/datasets/tempeh_datasets.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified it's name to "fetch_lawschool_gpa" and have placed in "aif360/sklearn/datasets/tempeh_datasets.py" file. However, there's some issue in imports(which I have explained below) so, it's still referring to older "fetch_lawschool_gpa" function format and definition.
I tested multiple times in my local and "fetch_lawschool_gpa" always redircts to the updated method but when I run the notebook, it gives an error that "target is not a defined positional argument" which means it is referring to older "fetch_lawschool_gpa" method as older one didn't have "target" argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant keep the tempeh_datasets.py file as is just rename the function in seaphe_datasets.py as fetch_lawschool_gpa
then change the __init__.py
to import from seaphe_datasets instead of tempeh_datasets. Once we know it's working, you can delete the tempeh_datasets.py file.
Did you test this after installing with the --editable
flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh Sorry! I have reverted changes in" tempeh_datasets.py" and renamed the method in "seaphe_datasets" and changed the init.py file to import the method from "seaphe_datasets".
It is importing "fetch_lawschool_gpa" fine now, but I think the "seaphe_datasets" download issue is still not fixed. It is not returning zip file as expected in this call :-
requests.get("http://www.seaphe.org/databases/LSAC/LSAC_SAS.zip")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did "pip install --editable ." at root AIF360. And then launched jupyter-notebook from root.
But these 2 imports :-
from aif360.sklearn.metrics import difference
from aif360.sklearn.inprocessing import GridSearchReduction
are still giving this error:_
ModuleNotFoundError: No module named 'inFairness'
"from sklearn.metrics import mean_absolute_error\n", | ||
"from sklearn.preprocessing import MinMaxScaler\n", | ||
"current_script_path=os.path.abspath('demo_grid_search_reduction_regression_sklearn.ipynb')\n", | ||
"print(current_script_path)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you replace the line in __init__.py
with the seaphe_datasets
file, it should import normally. The warning is fine but there's also an error later. It looks like you commented out
from aif360.sklearn.metrics import difference
"source": [ | ||
"Datasets are formatted as separate `X` (# samples x # features) and `y` (# samples x # labels) DataFrames. The index of each DataFrame contains protected attribute values per sample. Datasets may also load a `sample_weight` object to be used with certain algorithms/metrics. All of this makes it so that aif360 is compatible with scikit-learn objects.\n", | ||
"\n", | ||
"For example, we can easily load the law school gpa dataset from tempeh with the following line:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change the text to reflect that we're fetching it from SEAPHE now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@@ -0,0 +1,66 @@ | |||
# Copyright (c) Microsoft Corporation. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a line explaining where the code is copied from and which parts this covers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the code from :-
https://github.com/microsoft/tempeh/blob/main/tempeh/datasets/seaphe_datasets.py
This was mentioned on the issue assigned to me :-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant write it in the source file :)
I just want it to be clear which parts are covered by that copyright and which are written by us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Mentioned it in "seaphe_datasets" before the "fetch_lawschool_gpa" method.
@@ -144,12 +123,12 @@ | |||
"source": [ | |||
"Datasets are formatted as separate `X` (# samples x # features) and `y` (# samples x # labels) DataFrames. The index of each DataFrame contains protected attribute values per sample. Datasets may also load a `sample_weight` object to be used with certain algorithms/metrics. All of this makes it so that aif360 is compatible with scikit-learn objects.\n", | |||
"\n", | |||
"For example, we can easily load the law school gpa dataset from tempeh with the following line:" | |||
"For example, we can easily download and use the law school gpa dataset from SEAPHE direclty with the following line:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"For example, we can easily download and use the law school gpa dataset from SEAPHE direclty with the following line:" | |
"For example, we can easily download and use the Law School GPA dataset from SEAPHE directly with the following line:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
from sklearn.model_selection import train_test_split | ||
|
||
|
||
def load_lawschool_data(target, subset="all", usecols=None, dropcols=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant keep the tempeh_datasets.py file as is just rename the function in seaphe_datasets.py as fetch_lawschool_gpa
then change the __init__.py
to import from seaphe_datasets instead of tempeh_datasets. Once we know it's working, you can delete the tempeh_datasets.py file.
Did you test this after installing with the --editable
flag?
closing in favor of #510 |
Signed-off-by: Ekta Bhaskar bhaskar.ekta@ufl.edu
Closes #359