07-week7.Rmd

# Week 7: Unsupervised learning (word embedding)

This week we will be discussing a second form of "unsupervised" learning---word embeddings. If previous weeks allowed us to characterize the complexity of text, or cluster text by potential topical focus, word embeddings permit us a more expansive form of measurement. In essence, we are producing here a matrix representation of an entire corpus.

The reading by @rodriguez_word_2022 provides an effective overview of the technical dimensions of this technique. The articles by @garg_word_2018 and @kozlowski_geometry_2019 are two substantive articles that use word embeddings to provide insights into prejudice and bias as manifested in language over time.

**Required reading**:

-   @garg_word_2018
-   @kozlowski_geometry_2019
-   @waller2021

**Further reading**:

-   @rodriguez_word_2021
-   @rodriguez_word_2022
-   @osnabrugge_playing_2021
-   @rheault_word_2020
-   @jurafsky_speech_2021 [ch.6]: <https://web.stanford.edu/~jurafsky/slp3/>]

**Slides**:

-   Week 7 [Slides](https://docs.google.com/presentation/d/1sS3xk0NqpaGLuvrrHVgYpW-yATxUwPuEBt-Ke_sK-Eo/edit?usp=sharing)