You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Python command-line tool for scraping news articles from websites using the `newspaper3k` library. The tool can extract individual articles or all articles from a news website and export them to JSON or CSV format.
4
+
5
+
## Features
6
+
7
+
-**Single Article Scraping**: Extract content from a specific article URL
8
+
-**Bulk Article Scraping**: Scrape all articles linked from a news website homepage
9
+
-**Multiple Export Formats**: Export data as JSON or CSV
- The tool uses the `newspaper3k` library which may not work with all websites, especially those with heavy JavaScript rendering or anti-scraping measures
109
+
- Some news sites may block automated scraping attempts
110
+
- The quality of extracted content depends on the website's structure and the `newspaper3k` library's parsing capabilities
111
+
- For sites with many articles, using `--all-articles` may take considerable time
112
+
113
+
## Error Handling
114
+
115
+
- If scraping fails, the tool will display an error message
116
+
- Empty results will be indicated with appropriate messages
117
+
- Network issues and parsing errors are caught and reported
118
+
119
+
## License
120
+
121
+
This tool is provided for educational and personal use. Please respect website terms of service and robots.txt files when scraping.
0 commit comments