-
Notifications
You must be signed in to change notification settings - Fork 2
/
data-collection.html
139 lines (132 loc) · 5.54 KB
/
data-collection.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
<!doctype html>
<html>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="chrome=1">
<h2>➡️ Data collection</h2>
<h4>Collecting Links</h4>
<ul>
<li>
<a href="https://www.are.na/">Are.na</a> (✨📍): "Mood board"-like. Has mobile app and browser extension. No desktop app.
</li>
<li>
<a href="https://getpocket.com/">Pocket</a> (💵): Desktop app and mobile app.
</li>
<li>
<a href="https://raindrop.io/">Raindrop.io</a> (💵📍✨): Desktop and mobile app.
</li>
<li>
<a href="https://tab-session-manager.sienori.com/">Tab Session Manager</a> (✨📍): Browser extension. Available for Chrome.
</li>
<li>
<a href="https://wallabag.org/en">Wallabag</a> (✨): Can highlight directly on saved pages. No app.
</li>
<li>
<a href="https://archivebox.io/">ArchiveBox.io</a> (🥨✨): Stores all data locally. Can read lists from other archiving apps.
</li>
</ul>
<h4>Web-scraping</h4>
<ul>
<li>
<a href="https://webscraper.io/">Webscraper.io</a> (💵📍): Chrome extension. No coding
</li>
<li>
<a href="https://www.parsehub.com/">Parsehub</a>: Chrome extension. No coding.
</li>
<li>
<a href="https://www.parsehub.com/quickstart">Parsehub</a> (💵): Desktop app. No coding.
</li>
<li>
<a href="https://scrapy.org/">Scrapy</a> (✨): Some coding required.
</li>
<li>
<a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/">BeautifulSoup</a> (🥨📍✨): Python library. Good documentation.
</li>
<li>
<a href="https://www.octoparse.com/">Octoparse</a> (💰): Free demo.
</li>
<li>
<a href="https://tools.digitalmethods.net/beta/googleNews/">Google News Scraper</a> (🧭): Requires Firefox.
</li>
</ul>
<h4>Word Counting</h4>
<ul>
<li>
<a href="https://www.databasic.io/en/wordcounter/">Word Counter</a> (✨): No coding needed. Generates list of frequently-used words.
</li>
<li>
<a href="https://tools.digitalmethods.net/beta/tagcloud/">Tag Cloud Generator</a>: Similar function.
</li>
</ul>
<h4>Miscellaneous</h4>
<ul>
<li>
<a href="https://screenotate.com/">Screennotate</a> (📍): Automatically grabs texts from screenshots, along with the URL and the title of the place where you took the screenshot (where possible).
</li>
<li>
<a href="https://grain.co/">Grain</a> (💵): Allows userts to record, transcribes, and share highlights of Zoom calls.
</li>
<li>
<a href="https://breadboard.yale.edu/">Breadboard</a> (🥨): For online network experiments.
</li>
<li>
<a href="https://github.com/NCSU-Libraries/lentil">Lentil</a> (✨): Harvested photos from Instagram based on hashtags. No longer supported.
</li>
<li>
<a href="https://voyant-tools.org/">Voyant Tools</a> (✨): Environment for analysing digital texts.
</li>
<li>
<a href="https://www.juxtasoftware.org/about/">Juxta</a> (✨): tool for comparing and collating multiple witnesses to a single textual work. Desktop app.
</li>
<li>
<a href="https://tools.digitalmethods.net/beta/internetArchiveWaybackMachineLinkRipper/">Internet Archive Wayback Machine Link Ripper</a> (✨): Grabs links from archive.
</li>
<li>
<a href="https://4cat.oilab.nl/login/?next=%2F">4CAT - Capture and Analysis Toolkit</a> (✨): Create datasets from web forums.
</li>
<li>
<a href="https://medialab.sciencespo.fr/en/tools/minet/">Minet</a> (✨): By SciencePo MédiaLab. Mass web-mining software. Requires command line.
</li>
<li>
<a href="https://convertio.co/">Convertio</a> (💵): Converts html files to doc files. Chrome extension and web app.
</li>
<li>
<a href="https://www.connectedpapers.com/">Connected Papers</a>: Visual tool for exploring connected papers and scholarship.
</li>
<li>
<a href="https://dribdat.cc/">Dribdrat</a> (✨): Organize hackathons to crowdsource ideas, patches & prototypes.
</li>
</ul>
<h4>Resources</h4>
<ul>
<li>
<a href="https://docs.google.com/document/d/1clGjGABB2h2qbduTgfqribHmog9B6P0NvMgVuiHZCl8/edit#heading=h.sgkfqaqunzq0">Doing Fieldwork in a Pandemic</a>: Collective google-doc on methods, approaches, and ethical concerns during a pandemic.
</li>
<li>
<a href="https://buildinglltdm.org/">Buildling LLTDM</a>: Legal Literacies for Text Data Mining.
</li>
<li>
<a href="https://gwu-libraries.github.io/sfm-ui/">Social Feed Manager</a>: Tools to help researchers and archivists to build social media collections.
</li>
<li>
<a href="https://wiki.digitalmethods.net/Dmi/DmiAbout">Digital Methods Initiatives</a>: Has many tools for researchers. Runs winter and summer school.
</li>
<li>
<a href="https://medialab.sciencespo.fr/">MédiaLab</a>: Research lab at SciencePo in Paris, France. Has list of tools.
</li>
<li>
<a href="https://www.wordsinspace.net/designingmethods/spring2018/2018/01/07/toolkit-digital-ethics">Toolkit: Digital Ethics</a>: Collected by Shannon Mattern.
</li>
<li>
<a href="https://aoir.org/ethics">AoIR Ethics</a>: Association of Internet Researchers’ ethics resources.
</li>
<li>
<a href="https://www.wordsinspace.net/designingmethods/spring2018/2018/01/07/toolkit-digital-ethnography/">Toolkit: Digital Ethnography</a>: Collected by Shannon Mattern.
</li>
<li>
<a href="https://www.angelechristin.com/?page_id=611">Critical Data Studies Resources</a>: Reading lists and organizations. By Angèle Christin.
</li>
<li>
<a href="https://tinyurl.com/fpy2c8hz">Digital Ethnography Collective Reading list</a> (✨): LSE's Digital Ethnography Collective Reading List, hosted by Zoë Glatt.
</li>
</ul>
</html>