Merge pull request #38 from dccuchile/develop

Version 0.4.0
dccuchile · Sep 30, 2022 · e3193ef · e3193ef
2 parents ec75b4f + 4cc3722
commit e3193ef
Show file tree

Hide file tree

Showing 90 changed files with 6,990 additions and 4,503 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -0,0 +1,37 @@
+name: Tests
+on:
+  push:
+    branches:
+      - "master"
+      - "develop"
+  pull_request:
+    branches:
+      - "master"
+      - "develop"
+jobs:
+  pytest:
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v3
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.7"
+          cache: "pip"
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install flake8 pytest
+          if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
+          if [ -f requirements-dev.txt ]; then pip install -r requirements-dev.txt; fi
+      - name: Lint with flake8
+        run: |
+          # stop the build if there are Python syntax errors or undefined names
+          flake8 wefe --count --select=E9,F63,F7,F82 --show-source --statistics
+          # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
+          flake8 wefe --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
+      - name: Test with pytest
+        run: |
+          pytest tests
diff --git a/.gitignore b/.gitignore
@@ -7,11 +7,12 @@ __pycache__/
 *.so
 
 # scikit-learn specific
-doc/_build/
-doc/auto_examples/
-doc/modules/generated/
-doc/datasets/generated/
-doc/api/generated/
+docs/_build/
+docs/_build/*
+docs/auto_examples/
+docs/modules/generated/
+docs/datasets/generated/
+docs/api/generated/
 
 # Distribution / packaging
 
@@ -62,8 +63,9 @@ coverage.xml
 *.log
 
 # Sphinx documentation
-doc/_build/
-doc/generated/
+docs/_build/
+docs/generated/
+docs/results/
 
 # PyBuilder
 target/
@@ -74,19 +76,21 @@ target/
 # jupyter
 .ipynb_checkpoints/
 
-.results/*
-.results
+# notebook execution results
+results/*
+results
+docs/user_guide/gender_debiased_glove.kv
 
+# mypy cache
 .mypy_cache
 
-./doc/results/
-
-develop.ipynb
-
+# conda deploy
 conda-deploy/
 conda_deploy/
 
 *.csv
 *.xls
 
-doc/user_guide/gender_debiased_glove.kv
+# coverage files
+cov.xml
+test-results/junit.xml
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -0,0 +1,16 @@
+version: 2
+
+formats:
+  - epub
+  - pdf
+
+sphinx:
+  configuration: docs/conf.py
+
+python:
+  version: "3.7"
+  install:
+    - requirements: requirements.txt
+    - requirements: requirements-dev.txt
+    - method: pip
+      path: .
diff --git a/.readthedocs.yml b/.readthedocs.yml
diff --git a/LICENSE b/LICENSE
@@ -1,27 +1,21 @@
-Copyright (c) 2016, Vighnesh Birodkar and scikit-learn-contrib contributors
-All rights reserved.
+MIT License
 
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions are met:
+Copyright (c) 2022 WEFE Team
 
-* Redistributions of source code must retain the above copyright notice, this
-  list of conditions and the following disclaimer.
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
 
-* Redistributions in binary form must reproduce the above copyright notice,
-  this list of conditions and the following disclaimer in the documentation
-  and/or other materials provided with the distribution.
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
 
-* Neither the name of project-template nor the names of its
-  contributors may be used to endorse or promote products derived from
-  this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.rst b/README.rst
@@ -1,28 +1,24 @@
 .. -*- mode: rst -*-
 
-|ReadTheDocs|_ |CircleCI|_ |Conda|_ |CondaLatestRelease|_ |CondaVersion|_
+|License|_ |GithubActions|_ |ReadTheDocs|_ |Downloads|_ |Pypy|_ |CondaVersion|_
 
+.. |License| image:: https://img.shields.io/github/license/dccuchile/wefe
+.. _License: https://github.com/dccuchile/wefe/blob/master/LICENSE
 
 .. |ReadTheDocs| image:: https://readthedocs.org/projects/wefe/badge/?version=latest
 .. _ReadTheDocs: https://wefe.readthedocs.io/en/latest/?badge=latest
 
+.. |GithubActions| image:: https://github.com/dccuchile/wefe/actions/workflows/ci.yaml/badge.svg?branch=master
+.. _GithubActions: https://github.com/dccuchile/wefe/actions
 
-.. |CircleCI| image:: https://circleci.com/gh/dccuchile/wefe.svg?style=shield 
-.. _CircleCI: https://circleci.com/gh/dccuchile/wefe.svg?style=shield 
-
-
-.. |Conda| image:: https://anaconda.org/pbadilla/wefe/badges/installer/conda.svg
-.. _Conda: https://anaconda.org/pbadilla/wefe/badges/installer/conda.svg
-
-
-.. |CondaLatestRelease| image:: https://anaconda.org/pbadilla/wefe/badges/latest_release_date.svg
-.. _CondaLatestRelease: https://anaconda.org/pbadilla/wefe/badges/latest_release_date.svg
+.. |Downloads| image:: https://pepy.tech/badge/wefe
+.. _Downloads: https://pepy.tech/project/wefe
 
+.. |Pypy| image:: https://badge.fury.io/py/wefe.svg
+.. _Pypy: https://pypi.org/project/wefe/
 
 .. |CondaVersion| image:: https://anaconda.org/pbadilla/wefe/badges/version.svg
-.. _CondaVersion: https://anaconda.org/pbadilla/wefe/badges/version.svg
-
-
+.. _CondaVersion: https://anaconda.org/pbadilla/wefe
 
 
 WEFE: The Word Embedding Fairness Evaluation Framework
@@ -133,38 +129,57 @@ To compile the documentation, run:
 Changelog
 =========
 
-NEW DEVELOP VERSION
+Version 0.4.0
 -------------------
+- 3 new bias mitigation methods (debias) implemented: Double Hard Debias, Half
+  Sibling Regression and Repulsion Attraction Neutralization.
+- The library documentation of the library has been restructured. 
+  Now, the documentation is divided into user guide and theoretical framework
+  The user guide does not contain theoretical information. 
+  Instead, theoretical documentation can be found in the conceptual guides. 
+- Improved API documentation and examples. Added multilingual examples contributed 
+  by the community.
+- The user guides are fully executable because they are now on notebooks.
+- There was also an important improvement in the API documentation and in metrics and
+  debias examples.
+- Improved library testing mechanisms for metrics and debias methods.
 - Fixed wrong repr of query. Now the sets are in the correct order.
-- Greatly improved library testing mechanisms.
-- Improved project documentation. Now, the documentation is divided into user guide and
-  theoretical framework. In addition, the user guides are fully executable because they
-  are now on notebooks.
+- Implemented repr for WordEmbeddingModel.
+- Testing CI moved from CircleCI to GithubActions.
+- License changed to MIT.
 
 Version 0.3.2
 -------------
-- Fixed RNSB bug where the classification labels were interchanged and could produce erroneous results when the attributes are of different sizes.
+- Fixed RNSB bug where the classification labels were interchanged and could produce
+  erroneous results when the attributes are of different sizes.
 - Fixed RNSB replication notebook 
 - Update of WEFE case study scores. 
 - Improved documentation examples for WEAT, RNSB, RIPA.
-- Holdout parameter added to RNSB, which allows to indicate whether or not a holdout is performed when training the classifier.
+- Holdout parameter added to RNSB, which allows to indicate whether or not a holdout
+  is performed when training the classifier.
 - Improved printing of the RNSB evaluation.
 
 Version 0.3.1
 -------------
 - Update WEFE original case study
 - Hotfix: Several bug fixes for execute WEFE original Case Study.
 - fetch_eds top_n_race_occupations argument set to 10.
-- Preprocessing: get_embeddings_from_set now returns a list with the lost preprocessed words instead of the original ones.
+- Preprocessing: get_embeddings_from_set now returns a list with the lost
+  preprocessed words instead of the original ones.
 
 Version 0.3.0
 -------------
 - Implemented Bolukbasi et al. 2016 Hard Debias.
 - Implemented  Thomas Manzini et al. 2019 Multiclass Hard Debias.
 - Implemented a fetch function to retrieve gn-glove female-male word sets.
-- Moved the transformation logic of words, sets and queries to embeddings to its own module: preprocessing
-- Enhanced the preprocessor_args and secondary_preprocessor_args metric preprocessing parameters to an list of preprocessors `preprocessors` together with the parameter `strategy` indicating whether to consider all the transformed words (`'all'`) or only the first one encountered (`'first'`).
-- Renamed WordEmbeddingModel attributes ```model``` and ```model_name```  to ```wv``` and ```name``` respectively.
+- Moved the transformation logic of words, sets and queries to embeddings to its own
+  module: preprocessing
+- Enhanced the preprocessor_args and secondary_preprocessor_args metric
+  preprocessing parameters to an list of preprocessors `preprocessors` together with
+  the parameter `strategy` indicating whether to consider all the transformed words
+  (`'all'`) or only the first one encountered (`'first'`).
+- Renamed WordEmbeddingModel attributes ```model``` and ```model_name```  to
+  ```wv``` and ```name``` respectively.
 - Renamed every run_query ```word_embedding``` argument to ```model``` in every metric.
 
 
@@ -179,21 +194,30 @@ Version 0.2.1
 
 - Compatibility fixes.
 
-
 Version 0.2.0
 --------------
 
-- Renamed optional ```run_query``` parameter  ```warn_filtered_words``` to `warn_not_found_words`.
-- Added ```word_preprocessor_args``` parameter to ```run_query``` that allow specifying transformations prior to searching for words in word embeddings.
-- Added ```secondary_preprocessor_args``` parameter to ```run_query``` which allows specifying a second pre-processor transformation to words before searching them in word embeddings. It is not necessary to specify the first preprocessor to use this one.
-- Implemented ```__getitem__``` function in ```WordEmbeddingModel```. This method allows obtaining an embedding from a word from the model stored in the instance using indexers. 
+- Renamed optional ```run_query``` parameter  ```warn_filtered_words``` to 
+  `warn_not_found_words`.
+- Added ```word_preprocessor_args``` parameter to ```run_query``` that allow specifying
+  transformations prior to searching for words in word embeddings.
+- Added ```secondary_preprocessor_args``` parameter to ```run_query``` which allows 
+  specifying a second pre-processor transformation to words before searching them in
+  word embeddings. It is not necessary to specify the first preprocessor to use this
+  one.
+- Implemented ```__getitem__``` function in ```WordEmbeddingModel```. This method
+  allows obtaining an embedding from a word from the model stored in the instance
+  using indexers. 
 - Removed underscore from class and instance variable names.
-- Improved type and verification exception messages when creating objects and executing methods.
-- Fix an error that appeared when calculating rankings with two columns of aggregations with the same name.
+- Improved type and verification exception messages when creating objects and executing
+  methods.
+- Fix an error that appeared when calculating rankings with two columns of aggregations
+  with the same name.
 - Ranking correlations are now calculated using pandas ```corr``` method. 
 - Changed metric template, name and short_names to class variables.
 - Implemented ```random_state``` in RNSB to allow replication of the experiments.
-- run_query now returns as a result the default metric requested in the parameters and all calculated values that may be useful in the other variables of the dictionary.
+- run_query now returns as a result the default metric requested in the parameters
+  and all calculated values that may be useful in the other variables of the dictionary.
 - Fixed problem with api documentation: now it shows methods of the classes.
 - Implemented p-value for WEAT