news-please - an integrated web crawler and information extractor for news that just works
-
Updated
Oct 14, 2024 - Python
news-please - an integrated web crawler and information extractor for news that just works
ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key points.
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
从html中提取正文,用于新闻类网页
The Python-based web app extracts and summarizes news using NewsAPI, newspaper3k, spacy, Pegasus and T5 from Hugging Face. It categorizes news articles and uses a graph-based summary feature to summarize multiple documents. The app works with news in any language supported by NewsAPI.
News Extractor
Final Year Project. News Extraction and Summarization
API designed to extract large amounts of articles from any URL or website supported the use of CSS selectors documented with Swagger (OpenAPI 3).
Final Year Project. News Extraction and Summarization
Add a description, image, and links to the news-extractor topic page so that developers can more easily learn about it.
To associate your repository with the news-extractor topic, visit your repo's landing page and select "manage topics."