article-extracting

Star

Here are 28 public repositories matching this topic...

fivefilters / ftr-site-config

Star

Site-specific article extraction rules to aid content extractors, feed readers, and 'read later' applications.

xpath article-extracting extract-rules

Updated Nov 10, 2024

Strumenta / SmartReader

Star

SmartReader is a library to extract the main content of a web page, based on a port of the Readability library by Mozilla

csharp readability article-extracting readable article-extractor

Updated Oct 9, 2024
C#

artiomn / markdown_articles_tool

Star

Parse markdown article, download images and replace images URL's with local paths

html markdown pdf downloader article images python-library markdown-parser image-manipulation toolset markdown-to-html articles markdown-converter article-extracting md markdown-to-pdf article-extractor markdown-articles

Updated May 22, 2024
Python

myifeng / article-parser

Star

Extract article or news by url or html, parse the title and content, output in markdown format.

python news article extractor extract beautifulsoup article-extracting article-parser article-extractor extract-article

Updated Aug 12, 2024
Python

johnbumgarner / newshound

Star

This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.

data-science text-mining data-mining news news-aggregator python3 datascience web-scraping data-extraction webscraping news-crawler article-extracting article-extractor newspaper-crawler python-newspaper

Updated Mar 14, 2023

woojubb / html-article-extractor

Star

A web page content extractor

crawler extractor crawling extraction article-extracting article-extractor

Updated Aug 13, 2024
JavaScript

lord-alfred / dnlp

Star

📚 Сборник полезных штук из Natural Language Processing: Определение языка текста, Разделение текста на предложения, Получение основного содержимого из html документа

nlp language-detection nltk readability text-processing fasttext nlp-parsing sentence-tokenizer article-extracting language-recognition article-extractor

Updated Mar 7, 2023
Python

Sathish-Vasudev / Article-Scraper

Star

The program can be used to scrape the content from an article from web by an input of a set of URLs in a text file or a URL. This project uses newspaper3k and python-docx libraries. The output of this program will give a neatly modified Word Document in '.docx' format with the contents of the article.

python3 python-docx article-extracting article-extractor literature-mining newspaper3k article-scraper

Updated Aug 5, 2020
Python

mitica / ascrape-js

Star

Extracts article content from a web page.

cheerio article-extracting

Updated Feb 25, 2017
JavaScript

EmailThis / readability

Star

Readability is Elixir library for extracting and curating articles.

elixir readability article-extracting

Updated Feb 18, 2017
Elixir

ghostdogpr / readability4s

Sponsor

Star

Scala library to extract relevant content from an article HTML

scala readability article-extracting

Updated Jun 20, 2018
Scala

KashmereLabs / permalink_web_archiver

Star

Allows any article on the web to be parsed into a readable format and archived into the permanent web

storage dapp blockchain summarization article-extracting arweave-permaweb

Updated Dec 10, 2022
JavaScript

absingh31 / MercuryAPI_Client

Star

Python wrapper for Mercury API and get the JSON and html output, using your key. Using which anyone can denoise a online article and view the same without any adds or external links or content.

html api json json-serialization api-client python3 python-wrapper api-wrapper mercury article-extracting html-output mercury-api mercury-parser mercury-client mercuryapi-client

Updated Jan 9, 2018
HTML

ivanovishado / NewsScraper

Star

Article scraper for Mexican news websites. My terminal project at Universidad de Guadalajara - CUCEI 2018.

flask news mongodb news-websites article-extracting

Updated Dec 8, 2022
Python

0x01h / yozdil-article-scraper-generator

Star

Scrape Yılmaz Özdil articles and create Markov model to generate newspaper articles like Yılmaz Özdil. Turkish text dataset creator for data science and NLP projects.

markov-model scraper markov-chain markov article-extracting article-extractor yilmaz-ozdil