-
Notifications
You must be signed in to change notification settings - Fork 38
/
Copy pathtext_as_data.Rmd
65 lines (31 loc) · 4.11 KB
/
text_as_data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
title: ""
output: html_document
---
![](Text_As_Data_Cover.png)
<br>
<br>
## Welcome to Text as Data
Welcome to my course entitled "Text as Data." On this page, you will find an overview of the course, a description of each topic covered in the course, and a series of instructions about how to access all of the software and materials necessary for the course.
## What is Text as Data?
The past decade has witnessed an explosion of data produced by websites such as Twitter, Facebook, Google, and Wikipedia, but also the mass digitization of historical archives and administrative records. Though these new data sources hold enormous potential to address a range of pressing problems within industry and academia, collecting and analyzing text-based data presents unique challenges. Fortunately, the widespread availability of text-based data coincides with major advances in the fields of computer science and natural language processing. This course will provide students with an overview of popular techniques for collecting, processing, and analyzing text-based data—including screen-scraping, mining data from application programming interfaces or APIs, topic modeling, text networks, and advanced text classifiers.
## What Subjects are Covered in this Class?
This class covers a range of different topics that build on top of each other. For example, in the first tutorial, you will learn how to collect data from Twitter, and in subsequent tutorials you will learn how to analyze those data using automated text analysis techniques. For this reason, you may find it difficult to jump towards one of the most advanced issues before covering the basics.
<br>
**[Introduction: Strengths and Weaknesses of Text as Data](https://cbail.github.io/textasdata/strengths-weaknesses/rmarkdown/Strengths_and_Weaknesses.html)**
**[Application Programming Interfaces](https://cbail.github.io/textasdata/apis/rmarkdown/Application_Programming_interfaces.html)**
**[Screen-Scraping](https://cbail.github.io/textasdata/screenscraping/rmarkdown/Screenscraping_in_R.html)**
**[Basic Text Analysis](https://cbail.github.io/textasdata/basic-text-analysis/rmarkdown/Basic_Text_Analysis_in_R.html)**
**[Dictionary-Based Text Analysis](https://cbail.github.io/textasdata/dictionary-methods/rmarkdown/Dictionary-Based_Text_Analysis.html)**
**[Topic Modeling](https://cbail.github.io/textasdata/topic-modeling/rmarkdown/Topic_Modeling.html)**
**[Text Networks](https://cbail.github.io/textasdata/text-networks/rmarkdown/Text_Networks.html)**
**[Word Embeddings](https://cbail.github.io/textasdata/word2vec/rmarkdown/word2vec.html)**
<br>
## Who am I?
I am a Professor of Sociology, Public Policy, and Data Science at Duke University who studies political polarization on social media. You can learn more about my research [here](http://www.chrisbail.net) or follow me on twitter [here](http://www.twitter.com/chris_bail). Much of the material in the tutorials above draws upon my own research and text analysis techniques I've developed. Yet I also draw heavily on a number of other excellent tutorials by a range of different people who I tried to remember to thank in each tutorial above---if I forgot to recognize your work, please email me!
## How can I Access the Course Materials?
All of the materials for this course are available on my [Github page](http://www.github.com/cbail). There you will find datasets used in the tutorials above as well as all of the source files necessary to produce the tutorials above. If you notice problems, or other limitations of the tutorials above, kindly submit a "pull" request (if you know how to use Github).
## How can I get started?
This course assumes basic familiarity with the R software. If you are new to R, I recommend the sequence of online courses described on [this website](https://compsocialscience.github.io/summer-institute/2018/#pre-arrival) to get you started.
## What if the Code in the Tutorials does not work?
I will do my best to update the tutorials above as often as possible-- but in the world of open source software it is inevitable that problems will arise that I may not be able to address quickly.