GitHub - cl-tohoku/ngo_hh_gender_bias_dataset: Dataset for NLP2025 Japan

This dataset contains translations structured as I am [demonym] in English, with corresponding translations into German (deu), Spanish (spa), French (fra), and Italian (it). The dataset is organized by language in the data/ folder.

Source

The translations were sourced from the following references:

File Format

Each file contains the following columns:

Column Name	Description
`eng`	The source sentence in English
`<lang>_m`	The masculine form of the translation (if applicable)
`<lang>_f`	The feminine form of the translation (if applicable)
`<lang>_n`	The neuter form of the translation (if applicable)

Example

eng	it_m	it_f	it_n
I am Austrian.	Sono austriaco.	Sono austriaca.
I am Belgian.			Sono belga.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Source

File Format

Example

About

Releases

Packages

cl-tohoku/ngo_hh_gender_bias_dataset

Folders and files

Latest commit

History

Repository files navigation

Source

File Format

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages