Skip to content

Materials for LOT School 2023, "Language Learning: A Data-Driven Approach"

License

Notifications You must be signed in to change notification settings

mcfrank/lot-language-learning-2023

Repository files navigation

lot-language-learning-2023

Materials for LOT School 2023, "Language Learning: A Data-Driven Approach"

Instructor: Michael C. Frank (Stanford)

Course Description

In this course, we will examine early language learning through the lens of new data resources that facilitate quantitative studies. Our framework will be the "Standard Model" of Kachergis, Marchman, and Frank (2022) that links language input to processing and learning outcomes, and we will consider the strengths and weaknesses of this model for describing vocabulary learning as well as the learning of some morphology and syntax. Our hands-on approach will involve learning the use of CHILDES and childes-db for studying language input, Wordbank for studying language outcomes, and Peekbank for studying processing.

Prerequisite: Some knowledge of R sufficient to manipulate datasets from these resources.

Additional useful tools: familiarity with github for version control and the tidyverse for data manipulation and visualization.

Learning Goals

  • Discuss the "standard model" framework for early word learning, focusing on input, processing, and uptake constructs,
  • Compare different instruments and approaches for measuring child language,
  • Learn a reproducible workflow for exploring language acquisition data in R, and
  • Explore data from Wordbank, CHILDES, and Peekbank as a source of insights into language learning.

Software

Before we start, please ensure you have installed a recent version of:

If these are not working on your computer, you won't be able to do any of the in-class assignments, which will make up the bulk of the course.

Course Schedule

Day 1: Foundations and workflow

Readings:

Agenda:

  • Introduce the data-driven perspective on early language learning acquisition
  • Discuss the three instruments/data sources used in the course: the MacArthur-Bates CDI, CHILDES, and the Looking While Listening paradigm
  • Practice the toolset (github, RMarkdown, and the Tidyverse) that we will use for the remainder of the course

Day 2: Characterizing vocabulary growth with Wordbank

Readings:

Agenda:

  • Take an in-depth look at Wordbank and the use of CDI data
  • Explore how to create reproducible pipelines with Wordbank data
  • Reproduce analyses on grammar/lexicon correspondences from Bates & Goodman (1997)

Day 3: Accessing language input with CHILDES and childes-db

We'll be using CHILDES and accessing it via childes-db. You can install childesr from CRAN via install.packages("childesr").

Readings:

Goals:

  • Learn about CHILDES and the CHAT format
  • Discuss issues of frequency and frequency estimation from corpus data
  • Reproduce analyses of the development of disjunction from Jasbi, Jaggi, Clark, & Frank (2022).

Day 4: Exploring online processing using Peekbank

We'll be working with data from Peekbank and using the peekbankr package, which can be installed via remotes::install_github("langcog/peekbankr").

Readings:

Agenda:

  • Introduce the looking-while-listening paradigm
  • Discuss the role of online language processing in language learning
  • Reproduce and extend results from Swingley & Aslin (2002).

We will devote ten minutes at the end of class to talking about the group projects on Friday. By the end of the day, please form a group and send me an email with the names of the people in your group and a paragraph about what you hope to do; I'll try to get you comments.

Day 5: Group projects

On the final day of the course, we will primarily be doing group projects. The goal of a group project is to work together to develop some of the ideas we have discussed.

Groups will be 2 - 3 people (more makes it impossible to code together all looking at the same screen).

You are encouraged to come up with your own project idea, and I am happy to talk with students about how to use these resources to explore your own interests. Here are a few "starter ideas".

Easier:

  • Estimate the frequencies of color terms (or some other interesting set of words) in speech to children over age (CHILDES)
  • Explore cohort effects on vocabulary size using the date_of_test field (Wordbank)
  • Look at grammar/lexicon relationships within specific lexical subcategories, perhaps for languages beyond English (Wordbank) - this was the challenge problem for Day 3

Harder:

  • Explore effects of maternal education on the growth of vocabulary in different categories (Wordbank)
  • Characterize the developmental trajectory of children's lexical diversity (e.g., MTLD) and how it differs by gender (CHILDES)
  • Measure whether there are sex differences in vocabulary variability (MADM)
  • Check on the presence of a noun bias in the new ASL CDI

About

Materials for LOT School 2023, "Language Learning: A Data-Driven Approach"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages