Skip to content

Commit

Permalink
Moved result files to a directory
Browse files Browse the repository at this point in the history
  • Loading branch information
jvilaplana committed Apr 27, 2017
1 parent 3d79f17 commit 4683b83
Show file tree
Hide file tree
Showing 29 changed files with 25 additions and 4 deletions.
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,28 @@ To execute the program run the following commands in your terminal:
source venv/bin/activate
python vademecum_export.py
```
If the executing is successful, you should see something like:
```
$ python vademecum_export.py
Going for letter a
Getting drug 614537 (a-1)
Getting drug 686580 (a-1)
Getting drug 614560 (a-2)
Getting drug 672905 (a-2)
Getting drug 712249 (a-3)
...
```

## Analyze results
To check how many drugs were successfully retrieved run:
```
cd results/
find . -name 'vademecum-*' | xargs wc -l
```

To combine all the CSV files into a single one run:
```
cd results/
cat vademecum-* > vademecum.csv
```
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
15 changes: 11 additions & 4 deletions vademecum_export.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,26 @@
# -*- coding: utf-8 -*-

import urllib2
from bs4 import BeautifulSoup
import re
import string
import os
import csv
import time
import string
import urllib2
from bs4 import BeautifulSoup


# We are going to iterate through all leters (a - z).
letter_list = string.lowercase[:26]

# We check if the results directory exists
if not os.path.exists('results'):
# If it's not there, we create it
os.makedirs('results')

# Each letter has its own page with its drug list.
# We will be saving a CSV file for each starting letter.
for letter in letter_list:
with open('vademecum-' + str(letter) + '.csv', 'wb') as csvfile:
with open('results/vademecum-' + str(letter) + '.csv', 'wb') as csvfile:
# We will be saving the drug code, name and URL.
fieldnames = ['cod_nacion', 'nombre', 'url']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
Expand Down

0 comments on commit 4683b83

Please sign in to comment.