Skip to content

Code and data for: Temperature shapes language sonority: Revalidation from a large dataset

Notifications You must be signed in to change notification settings

EL-CL/temperature-sonority

Repository files navigation

Temperature and Language Sonority

Code and data for Temperature shapes language sonority: Revalidation from a large dataset.

DOI

Usage

The following 5 steps can be run separately as the output of each step is already provided in this repository (see Data below). Steps 1 and 2 require a local storage of the ASJP dataset and the FLDAS dataset, but you can skip these two steps so you do not need to download full datasets.

1. Extract geometry and sonority data from ASJP

Run python get_sonority.py [raw_path], where [raw_path] is the path to raw folder in the local ASJP dataset (e.g. python get_sonority.py C:/ASJP/raw/). Results will be saved as sonorities.csv, phones.csv, word_structures.csv, and word_lengths.csv in the data folder.

2. Extract temperature data from FLDAS

Run python get_temperature.py [FLDAS_path] to extract monthly temperature data of all doculects in sonorities.csv, where [FLDAS_path] is the path to FLDAS_NOAH01_C_GL_M.001 folder of the local FLDAS dataset (e.g. python get_temperature.py C:/FLDAS/FLDAS_NOAH01_C_GL_M.001/). Results will be saved as data/temperatures.csv.

Run python get_temperature_global.py [FLDAS_path] to extract global monthly mean temperature data. Result will be saved as temperature_global.csv.

3. Plot global distribution of temperature and sonority

Run python plot_global.py. Plot will be saved as figure/global.png.

4. Combine and process temperature and linguistic data

Run python process.py. Results will be saved as data.csv, data_genus.csv, data_family.csv, and data_macroarea.csv in the data folder.

5. Generate distribution and correlation plots, and more

Run corresponding code blocks in process.r in R.

6. Compare vowel length solutions

Run python test_vowel_length_solutions.py [raw_path]. Results will be saved as data/vowel_length_solutions.csv. Then, run code block of “Plot correlations between vowel length solutions” in process.r to plot correlations.

Data

All extracted data files are in the data folder.

Temperature Data

Linguistic Data

Combined Data

  • data.csv: Data for each filtered doculect, with temperature data and linguistic data combined
    • WL: Mean word length
    • Index0 to Index6: MSIs in 7 methods
    • T: Mean annual temperature
    • T_max: Max of 41-year mean monthly temperatures
    • T_min: Min of 41-year mean monthly temperatures
    • T_sd: Standard deviation of monthly temperatures over 41 years
    • T_diff: Mean annual range of temperature
    • Index0_trans, etc.: Transformated above data
  • data_genus.csv: Data for each language “genus” classified by WALS
  • data_family.csv: Data for each language family classified by WALS
  • data_macroarea.csv: Data for each macroarea (North America, South America, Eurasia, Africa, Greater New Guinea, and Australia)

Figures

All saved figure files are in the figures folder.

About

Code and data for: Temperature shapes language sonority: Revalidation from a large dataset

Resources

Stars

Watchers

Forks