Class for parsing CLAN's CHA file.
Made by Leandro Garber from CIIPME-CONICET
- Utterances as a list of strings
- MOR tier as objects
- Easily add more custom tiers
- Count tokens and types of words, utterances, nouns, verbs and adjectives. Filter by child directed, child produced and overheard speech.
- Count main verbs, either referring to physical or mental actions. Auxiliary verbs present in periphrastic verbs are excluded. (spanish only)
import sys
sys.path.insert(0, '<path_to_cloned_repo>')
from ChaFile import *
cha = ChaFile(<path_to_cha_file>)
Options
cha = ChaFile(<path_to_cha_file> )
lines = cha.getLines()
Each line is an object with:
- LINE_UTTERANCE : The text of the utterance
- LINE_NUMBER
- LINE_SPEAKER
- LINE_ADDRESSEE
- LINE_BULLET : Timestamp
- TIER_MOR : A list of objects with MOR data: MOR_UNIT_LEXEMA and MOR_UNIT_CATEGORIA
- ... any other tier
Garber, L. (2019). CHA file python parser. Zenodo. https://doi.org/10.5281/zenodo.3364020