Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Losik committed Mar 2, 2021
0 parents commit 5591e3d
Show file tree
Hide file tree
Showing 64 changed files with 4,221 additions and 0 deletions.
25 changes: 25 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Release

on:
release:
types: [ published ]

jobs:
release:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.8
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install tox tox-gh-actions
- name: Run tox
env:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
run: |
python -m tox -e release
27 changes: 27 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Tests

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.7, 3.8, 3.9]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install tox tox-gh-actions
- name: Run tox with Python ${{ matrix.python-version }}
run: |
python -m tox
20 changes: 20 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Ignore all
*

# Unignore dirs
!*/

# Unignore specific files without extensions
!AUTHORS
!LICENSE
!py.typed
!.gitignore

# Unignore useful extensions
!*.in
!*.ini
!*.md
!*.py
!*.pyi
!*.toml
!*.yml
6 changes: 6 additions & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
The following authors have created the source code of "crowd-kit" published and distributed by YANDEX LLC as the owner:

Dmitry Ustalov dustalov@yandex-team.ru
Evgeny Tulin tulinev@yandex-team.ru
Nikita Pavlichenko pavlichenko@yandex-team.ru
Vladimir Losev losev@yandex-team.ru
35 changes: 35 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Notice to external contributors


## General info

Hello! In order for us (YANDEX LLC) to accept patches and other contributions from you, you will have to adopt our Yandex Contributor License Agreement (the “**CLA**”). The current version of the CLA can be found here:
1) https://yandex.ru/legal/cla/?lang=en (in English) and
2) https://yandex.ru/legal/cla/?lang=ru (in Russian).

By adopting the CLA, you state the following:

* You obviously wish and are willingly licensing your contributions to us for our open source projects under the terms of the CLA,
* You have read the terms and conditions of the CLA and agree with them in full,
* You are legally able to provide and license your contributions as stated,
* We may use your contributions for our open source projects and for any other project too,
* We rely on your assurances concerning the rights of third parties in relation to your contributions.

If you agree with these principles, please read and adopt our CLA. By providing us your contributions, you hereby declare that you have already read and adopt our CLA, and we may freely merge your contributions with our corresponding open source project and use it further in accordance with terms and conditions of the CLA.

## Provide contributions

If you have already adopted terms and conditions of the CLA, you are able to provide your contributions. When you submit your first pull request, please add the following information into it:

```
I hereby agree to the terms of the CLA available at: [link].
```

Replace the bracketed text as follows:
* [link] is the link to the current version of the CLA: https://yandex.ru/legal/cla/?lang=en (in English) or https://yandex.ru/legal/cla/?lang=ru (in Russian).

It is enough to provide this notification only once.

## Other questions

If you have any questions, please mail us at opensource@yandex-team.ru.
13 changes: 13 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Copyright 2020 YANDEX LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
6 changes: 6 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Legal
include LICENSE AUTHORS CONTRIBUTING.md

# Stubs
recursive-include src py.typed
recursive-include src *.pyi
27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Crowd-kit

[![GitHub Tests][github_tests_badge]][github_tests_link]

[github_tests_badge]: https://github.com/Toloka/crowdlib/workflows/Tests/badge.svg?branch=main
[github_tests_link]: https://github.com/Toloka/crowdlib/actions?query=workflow:Tests


`crowd-kit` is a Python module for crowdsourcing distributed under the Apache-2.0 license. We strive to implement functionality that eases working with crowd-sourced data. Currently module contains:
* Implementations of commonly used aggregation methods
* A set of metrics

The module is currenly in a heavy development state and interfaces are subject to change.

Install
--------------
Installing crowdlib is as easy as `pip install crowd-kit`


Questions and bug reports
--------------
For reporting bugs please use the [Toloka/bugreport](https://github.com/Toloka/crowdlib/issues) page.


License
-------
© YANDEX LLC, 2020-2021. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
27 changes: 27 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env python
# coding: utf8

from setuptools import setup, find_packages

PREFIX = 'crowdkit'

setup(
name='crowd-kit',
package_dir={PREFIX: 'src'},
packages=[f'{PREFIX}.{package}' for package in find_packages('src')],
version='0.0.1',
description='Python libraries for crowdsourcing',
license='Apache 2.0',
author='Vladimir Losev',
author_email='losev@yandex-team.ru',
python_requires='>=3.7.0',
install_requires=[
'attrs',
'numpy',
'pandas',
'tqdm',
'scikit-learn',
'nltk',
],
include_package_data=True,
)
9 changes: 9 additions & 0 deletions src/aggregation/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from .dawid_skene import DawidSkene
from .gold_majority_vote import GoldMajorityVote
from .majority_vote import MajorityVote
from .m_msr import MMSR
from .wawa import Wawa
from .zero_based_skill import ZeroBasedSkill
from .hrrasa import HRRASA, RASA

__all__ = ['DawidSkene', 'MajorityVote', 'MMSR', 'Wawa', 'GoldMajorityVote', 'ZeroBasedSkill', 'HRRASA', 'RASA']
124 changes: 124 additions & 0 deletions src/aggregation/annotations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
"""
This module contains reusable annotations that encapsulate both typing
and description for commonly used parameters. These annotations are
used to automatically generate stub files with proper docstrings
"""

import inspect
import textwrap
from io import StringIO
from typing import ClassVar, Dict, Optional, Type, get_type_hints

import attr
import pandas as pd


@attr.s
class Annotation:
type: Optional[Type] = attr.ib(default=None)
title: Optional[str] = attr.ib(default=None)
description: Optional[str] = attr.ib(default=None)

def format_google_style_attribute(self, name: str) -> str:
type_str = f' ({getattr(self.type, "__name__", str(self.type))})' if self.type else ''
title = f' {self.title}\n' if self.title else '\n'
description_str = textwrap.indent(f'{self.description}\n', ' ' * 4).lstrip('\n') if self.description else ''
return f'{name}{type_str}:{title}{description_str}'

def format_google_style_return(self):
type_str = f'{getattr(self.type, "__name__", str(self.type))}' if self.type else ''
title = f' {self.title}\n' if self.title else '\n'
description_str = textwrap.indent(f'{self.description}\n', ' ' * 4).lstrip('\n') if self.description else ''
return f'{type_str}:{title}{description_str}'


def manage_docstring(obj):

attributes: Dict[str, Annotation] = {}
new_annotations = {}

for key, value in get_type_hints(obj).items():
if isinstance(value, Annotation):
attributes[key] = value
if value.type is not None:
new_annotations[key] = value.type
else:
new_annotations[key] = value

return_section = attributes.pop('return', None)

sio = StringIO()
sio.write(inspect.cleandoc(obj.__doc__ or ''))

if attributes:
sio.write('\nArgs:\n' if inspect.isfunction(obj) else '\nAttributes:\n')
for key, ann in attributes.items():
sio.write(textwrap.indent(ann.format_google_style_attribute(key), ' ' * 4))

if return_section:
sio.write('Returns:\n')
sio.write(textwrap.indent(return_section.format_google_style_return(), ' ' * 4))

obj.__annotations__ = new_annotations
obj.__doc__ = sio.getvalue()
return obj


PERFORMERS_SKILLS = Annotation(
type=pd.Series,
title='Predicted skills for each performer',
description=textwrap.dedent("A series of performers' skills indexed by performers"),
)

PROBAS = Annotation(
type=pd.DataFrame,
title='Estimated label probabilities',
description=textwrap.dedent('''
A frame indexed by `task` and a column for every label id found
in `data` such that `result.loc[task, label]` is the probability of `task`'s
true label to be equal to `label`.
'''),
)

PRIORS = Annotation(
type=pd.Series,
title='A prior label distribution',
description="A series of labels' probabilities indexed by labels",
)

TASKS_LABELS = Annotation(
type=pd.DataFrame,
title='Estimated labels',
description=textwrap.dedent('''
A pandas.DataFrame indexed by `task` with a single column `label` containing
`tasks`'s most probable label for last fitted data, or None otherwise.
'''),
)

ERRORS = Annotation(
type=pd.DataFrame,
title="Performers' error matrices",
description=textwrap.dedent('''
A pandas.DataFrame indexed by `performer` and `label` with a column for every
label_id found in `data` such that `result.loc[performer, observed_label, true_label]`
is the probability of `performer` producing an `observed_label` given that a task's
true label is `true_label`
'''),
)

DATA = Annotation(
type=pd.DataFrame,
title='Input data',
description='A pandas.DataFrame containing `task`, `performer` and `label` columns',
)


def _make_opitonal_classlevel(annotation: Annotation):
return attr.evolve(annotation, type=ClassVar[Optional[annotation.type]])


OPTIONAL_CLASSLEVEL_PERFORMERS_SKILLS = _make_opitonal_classlevel(PERFORMERS_SKILLS)
OPTIONAL_CLASSLEVEL_PROBAS = _make_opitonal_classlevel(PROBAS)
OPTIONAL_CLASSLEVEL_PRIORS = _make_opitonal_classlevel(PRIORS)
OPTIONAL_CLASSLEVEL_TASKS_LABELS = _make_opitonal_classlevel(TASKS_LABELS)
OPTIONAL_CLASSLEVEL_ERRORS = _make_opitonal_classlevel(ERRORS)
Loading

0 comments on commit 5591e3d

Please sign in to comment.