Skip to content

Latest commit

 

History

History
39 lines (29 loc) · 954 Bytes

README.md

File metadata and controls

39 lines (29 loc) · 954 Bytes

Wikipedia Data Store for Fess Java CI with Maven

Overview

Wikipedia Data Store crawls Wikipedia pages from a dump file.

Download

See Maven Repository.

Installation

See Plugin of Administration guide.

Crawling Setting

# Parameter
url=http://download.wikimedia.org/jawiki/latest/jawiki-latest-pages-articles.xml.bz2
limit=10000

# Script
lang="ja"
filetype=format
filename=title
url="https://ja.wikipedia.org/wiki/" + encodedTitle
host="ja.wikipedia.org"
site="ja.wikipedia.org"
title=title
content=content
digest=digest
anchor=
content_length=content.length()
last_modified=timestamp
timestamp=timestamp