News Article Scraper and Downloader

This Flask application scrapes news articles from the Hindustan Times website and provides a downloadable JSON file containing the article details, including title, summary, publish date, and image URL.

Features

Article Scraping: Scrapes articles from categories like Technology, Sports, Education, and Latest News.
Article Summarization: Uses the newspaper3k library to summarize articles.
JSON Export: Allows users to download the scraped articles as a JSON file.

Installation

Clone the repository:

git clone https://github.com/yourusername/news-article-scraper.git
cd news-article-scraper

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required packages:

pip install requests flask newspaper3k beautifulsoup4 nltk

Download NLTK resources:
```
import nltk
nltk.download('punkt')
```

Usage

Run the Flask application:
```
python app.py
```
Open your web browser and navigate to http://127.0.0.1:5000/ to view the scraped articles.
Download the JSON file containing article details by navigating to http://127.0.0.1:5000/download.

Project Structure

news-article-scraper/

app.py # Main Flask application script
requirements.txt # List of dependencies
news.html # HTML template for displaying articles
README.md # This file

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

If you have suggestions or improvements, please fork the repository and create a pull request. For major changes, please open an issue first to discuss what you would like to change.

Contact

For any questions or feedback, please reach out to vishnuvardhanv046@gmail.com.

Dependencies

Flask
Requests
Newspaper3k
BeautifulSoup4
NLTK

To install these dependencies, use the requirements.txt file:

Flask==2.2.3
requests==2.28.2
newspaper3k==0.2.8
beautifulsoup4==4.12.2
nltk==3.8.1

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
__pycache__		__pycache__
templates		templates
README.md		README.md
news.py		news.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Article Scraper and Downloader

Features

Installation

Usage

Project Structure

License

Contributing

Contact

Dependencies

About

Releases

Packages

Languages

Vishnu8299/new-s-web-scraping-

Folders and files

Latest commit

History

Repository files navigation

News Article Scraper and Downloader

Features

Installation

Usage

Project Structure

License

Contributing

Contact

Dependencies

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages