Requirements

Repository for parsing childes transcriptions, preparing data for speech act prediction. Also included: speech act prediction using CRF.

Requirements

xmltodict
python-crfsuite

Generating data for classification

Data is downloaded from Childes then converted to XML:

$ java -cp chatter.jar org.talkbank.chatter.App -inputFormat cha -outputFormat xml -tree -outputDir [outdirname] [inputdir]

Data from annotation platform MACANNOT can also be used as input for the last steps.

Extraction pipelines:

raw XML to raw JSON - either in the same or a separate folder
raw (XML/JSON) to individual files (JSON) with extracted data
extracted data to individual DSV with selected features
extracted data to aggregated train/test/valid DSV with selected features

Extracted features:

Uttered sentence (main words, no fillers, without correction)
Lemmas and POS tags
Speech act if exists

Organisation:

/data
    /NewEngland
    /Bates
    ... transcripts in xml format
/formatted
    /NewEngland
    /Bates
    ... json/xml individual files with extracted features
/ttv
    newEngland_train.tsv
    ... train/test/valid files
xml_to_json.py: raw XML to raw JSON (1)
format_data.py: raw to formatted JSON (2)
extract_data.py: formatted JSON to desired columnar format (3)
utils.py: useful functions for extraction from raw data
crf_train.py: training/testing crf annotation.

Sources

Childes - Download and transform to xml: https://talkbank.org/share/data.html
Speech Acts: https://talkbank.org/manuals/CHAT.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
README.md		README.md
analysis_dendrogram.py		analysis_dendrogram.py
analysis_models.py		analysis_models.py
crf_test.py		crf_test.py
crf_train.py		crf_train.py
exp_over_db.py		exp_over_db.py
exp_over_features.py		exp_over_features.py
exp_train_percent.py		exp_train_percent.py
extract_data.py		extract_data.py
format_data.py		format_data.py
illocutionary_force_code.csv		illocutionary_force_code.csv
utils.py		utils.py
xml_to_json.py		xml_to_json.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Requirements

Generating data for classification

Sources

About

Releases

Packages

Languages

ejmaes/childes_sentences

Folders and files

Latest commit

History

Repository files navigation

Requirements

Generating data for classification

Sources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages