Skip to content

Commit

Permalink
Merge pull request #103 from Crinibus/dev
Browse files Browse the repository at this point in the history
Rename scraping.py to scraper.py
Crinibus authored Oct 10, 2020

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
2 parents 923ddf4 + 269fd99 commit db1177b
Showing 4 changed files with 20 additions and 20 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -23,14 +23,14 @@ The tech scraper can scrape prices on products from Komplett.dk, Proshop.dk, Com
## Scrape products <a name="scrape-products"></a>
To scrape prices of products run this in the terminal:

python3 scraping.py
python3 scraper.py

## Start from scratch <a name="start-scratch"></a>
If you want to start from scratch with no data in the records.json file, then just delete all the content in records.json apart from two curly brackets:

{}

Then delete the lines under the last if-statement in scraping.py.
Then delete the lines under the last if-statement in scraper.py.

Then just add products like described [here](#add-products).

@@ -42,7 +42,7 @@ e.g.

python3 add_product.py gpu https://www.komplett.dk/product/1135037/hardware/pc-komponenter/grafikkort/msi-geforce-rtx-2080-super-gaming-x-trio

This adds the category (if new) and the product to the records.json file, and adds a line at the end of the scraping.py file so the script can scrape price of the new product.
This adds the category (if new) and the product to the records.json file, and adds a line at the end of the scraper.py file so the script can scrape price of the new product.

**OBS**: The category can only be one word, so add a underscore instead of a space if needed.<br/>
**OBS**: The url must have the "https://www." part.<br/>
6 changes: 3 additions & 3 deletions tech_scraping/README.md
Original file line number Diff line number Diff line change
@@ -19,14 +19,14 @@ If you want to start from scratch with no data in the records.json file, then ju

{}

Then delete the lines under the last if-statement in scraping.py.
Then delete the lines under the last if-statement in scraper.py.

Then just add products like described [here](#add-products).

## Scrape products <a name="scrape-products"></a>
To scrape prices of products run this in the terminal:

python3 scraping.py
python3 scraper.py

## Add products <a name="add-products"></a>
Before scraping a new product, run a similar line to this:
@@ -36,7 +36,7 @@ e.g.

python3 add_product.py gpu https://www.komplett.dk/product/1135037/hardware/pc-komponenter/grafikkort/msi-geforce-rtx-2080-super-gaming-x-trio

This adds the category (if new) and the product to the records.json file, and adds a line at the end of the scraping.py file so the script can scrape price of the new product.
This adds the category (if new) and the product to the records.json file, and adds a line at the end of the scraper.py file so the script can scrape price of the new product.

**OBS**: The category can only be one word, so add a underscore instead of a space if needed.<br/>
**OBS**: The url must have the "https://www." part.<br/>
28 changes: 14 additions & 14 deletions tech_scraping/add_product.py
Original file line number Diff line number Diff line change
@@ -283,16 +283,16 @@ def check_arguments():
return json_object


def save_json(kategori, produkt_navn):
def save_json(category, product_name):
"""Save (category and) product-name in JSON-file."""
with open('records.json', 'r') as json_file:
data = json.load(json_file)

with open('records.json', 'w') as json_file:
if kategori not in data.keys():
data[kategori] = {}
if category not in data.keys():
data[category] = {}

data[kategori][produkt_navn] = check_arguments()
data[category][product_name] = check_arguments()

json.dump(data, json_file, indent=2)

@@ -327,30 +327,30 @@ def find_domain(domain):
return 'Sharkgaming'


def add_to_scraper(kategori, link, url_domain):
def add_to_scraper(category, link, url_domain):
"""Add line to scraping.py, so scraping.py can scrape the new product."""
domain = find_domain(url_domain)

with open('scraping.py', 'a+') as python_file:
python_file.write(f' {domain}(\'{kategori}\', \'{link}\')\n')
print(f'{kategori}\n{link}')
python_file.write(f' {domain}(\'{category}\', \'{link}\')\n')
print(f'{category}\n{link}')


def main(kategori, link):
def main(category, link):
URL_domain = link.split('/')[2]

produkt_navn = get_product_name(link)
product_name = get_product_name(link)

if not produkt_navn:
if not product_name:
print(f'Sorry, but I can\'t scrape from this domain: {URL_domain}')
return

# Change æ, ø and/or å
kategori = change_æøå(kategori)
produkt_navn = change_æøå(produkt_navn)
category = change_æøå(category)
product_name = change_æøå(product_name)

save_json(kategori, produkt_navn)
add_to_scraper(kategori, link, URL_domain)
save_json(category, product_name)
add_to_scraper(category, link, URL_domain)


if __name__ == '__main__':
File renamed without changes.

0 comments on commit db1177b

Please sign in to comment.