This repository contains two Python scripts that perform web scraping on HTML files and convert the extracted data to JSON and CSV formats.
-
csv_output.csv
: This file is the output of theapp.py
script. It contains the scraped data from a single HTML file in CSV format. -
json_output.json
: This file is the output of theapp(2).py
script. It contains the scraped data from multiple HTML files in JSON format. -
app.py
: This Python script performs web scraping on a single HTML file (index.html
) and extracts specific information such as the image link, title, rating, and price of a product. The scraped data is then saved to a JSON file (data.json
). -
app(2).py
: This Python script performs web scraping on multiple HTML files stored in thePages Snapshots
directory. It extracts similar information to that inapp.py
from each file and saves the scraped data to a JSON file (all_data.json
).
- Python 3.x
json
libraryos
libraryBeautifulSoup
library
- Ensure that you have Python 3.x installed on your system.
- Install the required libraries by running the following command:
pip install beautifulsoup4
- Run the
app.py
script to perform web scraping on theindex.html
file and save the data to a JSON file (data.json
).python app.py
- Place the HTML files you want to scrape in the
Pages Snapshots
directory for theapp(2).py
script. - Run the
app(2).py
script to perform web scraping on the HTML files and save the data to a JSON file (all_data.json
).python app(2).py
-
After running
app.py
, the scraped data from theindex.html
file will be saved asdata.json
. -
After running
app(2).py
, the scraped data from the HTML files in thePages Snapshots
directory will be saved asall_data.json
. -
The
data.json
andall_data.json
files can be further processed or used as desired.
Yes you can !
You'll find me
- Posting memes or talking about data on Twitter
- Writing articles about complex data concepts and making them digestible on Medium
- Posting data vizualizations inspiration and data infographics on Instagram
Distributed under the no License. See LICENSE.txt for more information.
Please ⭐️ this repository if this project helped you or buy me coffee!