Skip to content

cl-tohoku/ngo_hh_gender_bias_dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

This dataset contains translations structured as I am [demonym] in English, with corresponding translations into German (deu), Spanish (spa), French (fra), and Italian (it). The dataset is organized by language in the data/ folder.

Source

The translations were sourced from the following references:

  1. English https://github.com/mledoze/countries/tree/master
  2. French https://github.com/mledoze/countries/tree/master
  3. German https://deutsch.lingolia.com/en/vocabulary/laender-nationalitaeten
  4. Italian https://www.theintrepidguide.com/nationalities-in-italian/?utm_source=chatgpt.com
  5. Spanish https://espanol.lingolia.com/en/vocabulary/countries

File Format

Each file contains the following columns:

Column Name Description
eng The source sentence in English
<lang>_m The masculine form of the translation (if applicable)
<lang>_f The feminine form of the translation (if applicable)
<lang>_n The neuter form of the translation (if applicable)

Example

eng it_m it_f it_n
I am Austrian. Sono austriaco. Sono austriaca.
I am Belgian. Sono belga.

About

Dataset for NLP2025 Japan

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published