Skip to content

Commit

Permalink
Merge pull request #8 from TheScienceMuseum/develop
Browse files Browse the repository at this point in the history
v0.3.2: add labels_aliases field for faster querying
  • Loading branch information
kdutia authored Sep 15, 2020
2 parents d3abae2 + 7907805 commit 162de4b
Show file tree
Hide file tree
Showing 4 changed files with 20 additions and 5 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

All notable changes documented below.

## 0.3.2

- **enhancement:** add `labels_aliases` field for faster text search of both labels and aliases using an [Elasticsearch match query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html)

## 0.3.1

- **fix:** property values without types are ignored
Expand Down
3 changes: 2 additions & 1 deletion cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ def check_es_credentials(credentials: dict):
# path="../wikidata/all_no_articles.ndjson",
# properties="p31,p279",
# config="./config.ini",
# index='wikidump',
# index='wikidump2',
# cluster=None,
# user=None,
# password=None,
Expand All @@ -195,5 +195,6 @@ def check_es_credentials(credentials: dict):
# page_size=100,
# language='en',
# timeout=6,
# disable_refresh=True
# )
main()
14 changes: 12 additions & 2 deletions elastic_wikidata/dump_to_es.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,21 @@ def start_elasticsearch(self):
print("Connecting to Elasticsearch on localhost")
self.es = Elasticsearch()

self.es.indices.create(index=self.index_name, ignore=400)
mappings = {
"mappings": {
"properties": {
"labels": {"type": "text", "copy_to": "labels_aliases"},
"aliases": {"type": "text", "copy_to": "labels_aliases"},
"labels_aliases": {"type": "text", "store": "true"},
}
}
}

self.es.indices.create(index=self.index_name, ignore=400, body=mappings)

if self.disable_refresh_on_index:
print(
"Temporary disabling refresh for the index. Will reset refresh interval for the default (1s) after load is complete."
"Temporary disabling refresh for the index. Will reset refresh interval to the default (1s) after load is complete."
)
self.es.indices.put_settings({"index": {"refresh_interval": -1}})

Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@

setuptools.setup(
name="elastic-wikidata",
version="0.3.1",
version="0.3.2",
author="Science Museum Group",
description="elastic-wikidata",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/TheScienceMuseum/elastic-wikidata",
download_url="https://github.com/TheScienceMuseum/elastic-wikidata/archive/v0.2.0.tar.gz",
download_url="https://github.com/TheScienceMuseum/elastic-wikidata/archive/v0.3.2.tar.gz",
classifiers=[
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
Expand Down

0 comments on commit 162de4b

Please sign in to comment.