Skip to content

nischaybikramthapa/medium-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Medium article web scraper

This repository is a python engine that allows you to scrape medium archived articles based on tags. The final output is a CSV file containing the following fields.

{
    "id": "unique_id",
    "title": "title",
    "subtitle": "subtitle",
    "author": "author",
    "publication": "publication",
    "claps": "claps",
    "reading_time": "reading_time",
    "content": "content",
    "link": "article_link",
    "comments": "comments",
    "published_date": "published_date",
    "retrieved_date": "retrieved_date",
        }

Project Structure

├── medium_scraper
│   ├── __init__.py
│   ├── article.py
│   ├── commons.py
│   ├── scraper.py
│   └── settings.py
├── Dockerfile
├── main.py
├── pyproject.toml
└── README.md

Prerequisites

Dependencies

poetry install

Instructions

Scraping articles

To retrieve data from medium, simply run main.py by passing a tag. The results will be saved in your local directory as medium_{article_tag}_{date}.csv

About

Scraper engine for medium archived articles using tags

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published