-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scientific Metadata Search Engine (Fulltext) implementation for PostgreSQL 🔎 #640
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very excited to see this!
tiled/catalog/migrations/versions/1cd99c02d0c7_create_index_for_fulltext_search.py
Outdated
Show resolved
Hide resolved
Also: our There are tradeoffs in insert speed and database size that make me question whether case sensitive search is useful and important enough to justify that. Lines 42 to 50 in fca4330
|
@@ -168,6 +171,9 @@ def cm(): | |||
cm = nullcontext | |||
with cm(): | |||
assert list(client.search(FullText("z"))) == ["z", "does_contain_z"] | |||
# plainto_tsquery fails to find certain words, weirdly, so it is a useful | |||
# test that we are using tsquery | |||
assert list(client.search(FullText("purple"))) == ["full_text_test_case"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how much to expect of this index, but could we do partial word search? e.g. urple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think support is limited, but robust fuzzy text search is next up for @Kezzsim.
Current C.I. issues are caused by the caching we set up previously, ORM and alembic need to be able to handle if an index already exists in that cache somehow. |
I will remove the case sensitive flag from both the source code and the documentation, minimizing the API surface impact by ignoring any other kwargs sent to query other than The postgresql documentation writes:
|
This feature adds a rudimentary scientific metadata search engine, implemented purely through Postgresql's native
ts_vector
andts_query
operations. Documented as part of the Tiled client's FullText query.Tasks:
adapter.py
to enable functionalitymetadata_search
for storingjsonb_to_tsvector
dataResolves #457