🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
-
Updated
Apr 7, 2021 - Shell
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
A command-line toolkit to extract text content and category data from Wikipedia dump files
Corpus creator for Chinese Wikipedia
Reading the data from OPIEC - an Open Information Extraction corpus
Wikipedia-based Explicit Semantic Analysis, as described by Gabrilovich and Markovitch
Downloads and imports Wikipedia page histories to a git repository
Extracting useful metadata from Wikipedia dumps in any language.
Python package for working with MediaWiki XML content dumps
A simple utility to index wikipedia dumps using Lucene.
Collects a multimodal dataset of Wikipedia articles and their images
A Python toolkit to generate a tokenized dump of Wikipedia for NLP
A library that assists in traversing and downloading from Wikimedia Data Dumps and their mirrors.
Node.js module for parsing the content of wikipedia articles into javascript objects
Scripts to download the Wikipedia dumps (available at https://dumps.wikimedia.org/ )
Research for master degree, operation projizz-I/O
Convert WIKI dumped XML (Chinese) to human readable documents in markdown and txt.
Contains code to build a search engine by creating an index and perform search over Wikipedia data.
Java tool to Wikimedia dumps into Java Article pojos for test or fake data.
Convert Wikipedia XML dump files to JSON or Text files
Add a description, image, and links to the wikipedia-dump topic page so that developers can more easily learn about it.
To associate your repository with the wikipedia-dump topic, visit your repo's landing page and select "manage topics."