Skip to content

new ways for scraping data. #2

@hsfzxjy

Description

@hsfzxjy

Due to the weird design of WikiMedia, Igem Parts Registry may use non-semantic tags for documentation rendering (e.g. use <table> for typography). This may lead to incorrect parsing by html2markdown, and mess up the final result. Comparatively, directly accessing the Edit page (take BBa_K2042000 as an example) can fetch the raw page code in WikiText format, which can make the parsing simpler and preciser.

Besides, the History page in wikitools will list out recent changes on the page, so there is no need to re-fetch the whole page entirely during upgrading.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions