Skip to content

Commit

Permalink
Moving V1 example scripts to example/datasets folder (#369)
Browse files Browse the repository at this point in the history
* Moving V1 example scripts to example/datasets folder

* Separate mypy pre-commit check for examples/datasets folder
  • Loading branch information
y27choi authored Dec 20, 2023
1 parent 98aaa1c commit 90ccc2f
Show file tree
Hide file tree
Showing 7 changed files with 72 additions and 3 deletions.
10 changes: 8 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,14 @@ repos:
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.910
hooks:
- id: mypy
additional_dependencies: [types-all, "pydantic<2.0"]
- id: mypy
name: mypy-default
exclude: ^examples/datasets
additional_dependencies: [ types-all, "pydantic<2.0" ]
- id: mypy
name: mypy-examples-dataset
files: ^examples/datasets
additional_dependencies: [ types-all, "pydantic<2.0" ]
- repo: meta
hooks:
- id: check-hooks-apply
Expand Down
26 changes: 26 additions & 0 deletions examples/datasets/question_answering/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Example Integration: Question Answering

This example integration uses the [TruthfulQA (open-domain)](https://github.com/sylinrl/TruthfulQA) and the
[HaluEval (closed-domain)](https://github.com/RUCAIBox/HaluEval/tree/main/evaluation) datasets and OpenAI's GPT models
to demonstrate the question answering workflow in Kolena.

## Setup

This project uses [Poetry](https://python-poetry.org/) for packaging and Python dependency management. To get started,
install project dependencies from [`pyproject.toml`](./pyproject.toml) by running:

```shell
poetry update && poetry install
```

## Usage

The data for this example integration lives in the publicly accessible S3 bucket `s3://kolena-public-datasets`.

First, ensure that the `KOLENA_TOKEN` environment variable is populated in your environment. See our
[initialization documentation](https://docs.kolena.io/installing-kolena/#initialization) for details.

This project defines two scripts that perform the following operations:

1. [`register_dataset.py`](question_answering/register_dataset.py) registers both datasets by default. You can also
select the dataset to register by specifying `--datasets`.
20 changes: 20 additions & 0 deletions examples/datasets/question_answering/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
[tool.poetry]
name = "question_answering"
version = "0.1.0"
description = " Kolena Datasets Example integration for question answering"
authors = ["Kolena Engineering <eng@kolena.io>"]
license = "Apache-2.0"

[tool.poetry.dependencies]
python = ">=3.8,<3.11"
kolena = ">=0.99.0,<1"
s3fs = "^2022.7.1"

[tool.poetry.group.dev.dependencies]
pre-commit = "^2.17"
pytest = "^7"
pytest-depends = "^1.0.1"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright 2021-2023 Kolena Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,7 @@
BUCKET = "kolena-public-datasets"
TRUTHFULQA = "TruthfulQA"
HALUEVALQA = "HaluEval-QA"
MODELS = [
"gpt-3.5-turbo",
"gpt-4-1106-preview",
]
2 changes: 1 addition & 1 deletion examples/question_answering/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ license = "Apache-2.0"

[tool.poetry.dependencies]
python = ">=3.8,<3.11"
kolena = ">=0.99.0,<1"
kolena = ">=0.94.0,<1"
s3fs = "^2022.7.1"
torch = [
{markers = "sys_platform == 'darwin' and platform_machine == 'arm64'", url = "https://download.pytorch.org/whl/cpu/torch-2.0.1-cp39-none-macosx_11_0_arm64.whl"},
Expand Down

0 comments on commit 90ccc2f

Please sign in to comment.