Article Scraping and Processing System

Introduction

This system provides a tool for scraping articles from the web using user queries and date ranges. It utilizes Python's selenium library to interact with web pages and BeautifulSoup for parsing HTML content. The articles are processed and stored in both JSON and CSV formats for subsequent use.

Features

Unzip necessary driver files for web scraping.
Scrape articles based on user input and specified date range.
Process and clean the scraped data.
Save the data in JSON and CSV formats for easy consumption.

Installation

Ensure you have the required Python environment and dependencies installed:

pip install pandas requests bs4 selenium logging zipfile

Usage

To use the script, run it from the command line and follow the interactive prompts:

python main_script.py

Replace main_script.py with the actual filename of the script.

Configuration

Ensure you have the following setup before running the script:

A zip file named firefox.zip containing the Firefox driver for selenium.
A drivers/ directory with executable paths for the required web drivers.
A cookies.pkl file, if available, to reuse authentication states and speed up the scraping process.

Contributing

We encourage contributions to this project. To contribute:

Fork the repository.
Create a new branch for your feature (git checkout -b feature/fooBar).
Commit your changes (git commit -am 'Add some fooBar').
Push to the branch (git push origin feature/fooBar).
Create a new Pull Request.

Acknowledgments

Heartfelt thanks to all contributors of the open-source packages used in this project.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
scrapers		scrapers
README.md		README.md
web-crawler.png		web-crawler.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Article Scraping and Processing System

Introduction

Features

Installation

Usage

Configuration

Contributing

Acknowledgments

About

Releases

Packages

Languages

OmarJabri7/news_scrapers

Folders and files

Latest commit

History

Repository files navigation

Article Scraping and Processing System

Introduction

Features

Installation

Usage

Configuration

Contributing

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages