File tree Expand file tree Collapse file tree 1 file changed +1
-18
lines changed
pombola/south_africa/data/members-interests Expand file tree Collapse file tree 1 file changed +1
-18
lines changed Original file line number Diff line number Diff line change @@ -7,7 +7,7 @@ There are several files in this directory:
7
7
The scraper currently scrapes ` .docx ` files.
8
8
To prepare the file:
9
9
10
- 1 . Split the ` PDF ` into seperate files small enough to open in Google Docs. PDF Arranger works well https://github.com/pdfarranger/pdfarranger
10
+ 1 . Split the ` PDF ` into seperate files small enough to open in Google Docs. [ PDF Arranger] ( https://github.com/pdfarranger/pdfarranger ) works well
11
11
2 . Open the files in Google Docs and download each in ` .docx ` format
12
12
3 . Store the these files in ` ./docx_files/ `
13
13
@@ -20,23 +20,6 @@ Run the script `html_to_json.py` to scrape the HTML and compile into an easy to
20
20
21
21
The output should be ` register.json `
22
22
23
- ## Raw data
24
-
25
- 2010.json
26
- 2011.json
27
- 2012.json
28
- 2013.json
29
- 2014.json
30
- 2015.json
31
- 2016.json
32
- 2017.json
33
- 2018.json
34
-
35
- These are the JSON files provided to us by Geoff. They are unchanged and are (I
36
- believe) generated by scraping code that he has from the PDFs mentioned in
37
- them. For me these PDF urls 404ed so I was not able to look at the original
38
- source material.
39
-
40
23
41
24
## Conversion script
42
25
You can’t perform that action at this time.
0 commit comments