03-week3.Rmd

# Week 3: Dictionary-based techniques

An extension of word frequency analyses, which we covered last week, are so-called "dictionary-based" techniques. In their most basic form, these analyses use an index of target terms and classify the corpus of interest based on their presence or absence. The technical dimensions of this type of analysis are covered in the chapter section by @krippendorff_content_2004, and some of the issues attending them in the article by - @loughran_when_2011. The article by @brooke2021trouble provides an outstanding illustration of the use of text analysis techniques to make inferences about larger questions of bias.

We will also be reading two examples of the application of these techniques by @martins_rise_2020 and @young_affective_2012. Here, we will be discussing how successful the authors are in measuring the phenomenon of interest ("prosociality" and "tone" respectively). Questions about sampling and representativeness will again be relevant here, and will naturally inform our assessments of this work.

Questions:

1.  Are *general* dictionaries possible; or do they have to be domain-specific?
2.  How do we know if our dictionary is accurate?
3.  How could we enhance/supplement dictionary-based techniques?

**Required reading**:

-   @martins_rise_2020
-   @voigt_language_2017
-   @brooke2021trouble

**Further reading**:

-   @tausczik_psychological_2010
-   @krippendorff_content_2004 (pp.283-289)
-   @brier_computer_2011
-   @bonikowski2015
-   @barbera_automated_2021
-   @young_affective_2012

**Slides**:

-   Week 3 [Slides](https://docs.google.com/presentation/d/1rgYCYGtZ7resCd7oVsGnaCpwWKduunl8s1famjtGtBY/edit?usp=sharing)