-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Magic service query for text search. #1732
base: master
Are you sure you want to change the base?
Magic service query for text search. #1732
Conversation
…so means there are now 2 new Magic Service Queries. This solution works but can be hard to work with in practice. Further refinement necessary for an easy to use MagicServiceQuery that combines both searches.
Conformance check passed ✅No test result changes. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1732 +/- ##
==========================================
- Coverage 89.92% 89.49% -0.44%
==========================================
Files 393 397 +4
Lines 37601 37772 +171
Branches 4231 4258 +27
==========================================
- Hits 33814 33803 -11
- Misses 2486 2664 +178
- Partials 1301 1305 +4 ☔ View full report in Codecov by Sentry. |
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An initial round of reviews, please read the comments first, some are of a rather structural nature about the general design.
static_assert(std::is_same_v<T, p::TransPath> || | ||
std::is_same_v<T, p::PathQuery> || | ||
std::is_same_v<T, p::Describe> || | ||
std::is_same_v<T, p::SpatialQuery> || | ||
std::is_same_v<T, p::WordSearchQuery> || | ||
std::is_same_v<T, p::EntitySearchQuery>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can refactor this large thing use
ad_utility::SameAsAny<T,p::TransPath, p::PathQuery, ...>
(Then we don't repeat the ||
and the is_same_v
auto config = entitySearchQuery.toTextIndexScanForEntityConfiguration(); | ||
std::vector<SubtreePlan> candidatesOut; | ||
auto plan = makeSubtreePlan<TextIndexScanForEntity>(qec_, config); | ||
candidatesOut.push_back(std::move(plan)); | ||
visitGroupOptionalOrMinus(std::move(candidatesOut)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as for the other visit
function.
} else { | ||
variableColumns_[textRecordVar_.getEntityScoreVariable(fixedEntity())] = | ||
makeAlwaysDefinedColumn(index); | ||
} | ||
} else { | ||
addDefinedVar(entityVariable()); | ||
addDefinedVar(textRecordVar_.getEntityScoreVariable(entityVariable())); | ||
variableColumns_[entityVariable()] = makeAlwaysDefinedColumn(index); | ||
index++; | ||
if (config_.varToBindScore_.has_value()) { | ||
variableColumns_[config_.varToBindScore_.value()] = | ||
makeAlwaysDefinedColumn(index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is really hard to read here.
An easier interface would be if the class has a member
std::optional<Variable> textScoreVariable
that is initialized in the constructors (nullopt means " don't add the text score" ,
then you get a simple if-else
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think if we want the opportunity to have no score variable at all (if it is not used in the magic service query), then we have to also change other functions (computeResult, getResultWidth) etc.
Otherwise the optional
makes no sense.
TextIndexScanForEntityConfiguration config_; | ||
VariableToColumnMap variableColumns_; | ||
Variable textRecordVar_; | ||
VarOrFixedEntity varOrFixed_; | ||
string word_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These members are redundant.
Either you only store the config_
or you store the old three seperate members. The conversion is then done in the Constructor (one of the formats has to be converted)
|
||
// _____________________________________________________________________________ | ||
VariableToColumnMap TextIndexScanForWord::computeVariableToColumnMap() const { | ||
return variableColumns_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this class the same comments as for the TextIndexScanForEntity
also apply.
|
||
// ____________________________________________________________________________ | ||
void EntitySearchQuery::addParameter(const SparqlTriple& triple) { | ||
auto sipmleTriple = triple.getSimple(); | ||
TripleComponent predicate = sipmleTriple.p_; | ||
TripleComponent object = sipmleTriple.o_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't look at this class in detail, as I first have a general qustion:
Shoult thi bea separate SERVICE, or is the contains-entity
stuff not part of the Fulltext search query?
So you have a single query (syntax imagined)
SERVICE textSearch: {
[] <textVar> ?text ;
<wordMatch> [<word> "walk"; <score> ?scoreWalk];
<entityMention> [<entity> <Flixtastic>; <score> ?scoreFelix] .
}
Of course then the configToTextIndexScans
function always returns a vector of plans
(n WordScans, and m EntityScans), but we have a single class that manages the text index. Note: Not every word search has an entity search, but every entity search has a word search.
Maybe think about this, and we can also discuss this 1-1 next week some time.
std::optional<std::string> word_; | ||
std::optional<Variable> textVar_; | ||
std::optional<Variable> matchVar_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As hinted above I would heavily suggest a design, where all the words and prefixes of a single ?text
variable and all their entity mentions are specified in a single SERVICE
query.
WordSearchQuery(WordSearchQuery&& other) noexcept = default; | ||
WordSearchQuery(const WordSearchQuery& other) noexcept = default; | ||
WordSearchQuery& operator=(const WordSearchQuery& other) = default; | ||
WordSearchQuery& operator=(WordSearchQuery&& a) noexcept = default; | ||
~WordSearchQuery() noexcept override = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think none of those is explicitly needed.
// See MagicServiceQuery | ||
void addParameter(const SparqlTriple& triple) override; | ||
|
||
TextIndexScanForWordConfiguration toTextIndexScanForWordConfiguration() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name is very redundant.
The name toConfig()
would absolutely suffice.
|
||
GraphPatternOperation Visitor::visitEntitySearchQuery( | ||
Parser::ServiceGraphPatternContext* ctx) { | ||
auto parseEntitySearchQuery = [ctx](parsedQuery::EntitySearchQuery& | ||
textSearchQuery, | ||
const parsedQuery::GraphPatternOperation& | ||
op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm, that basically all our magic service visitors (2 preexisting plus those you add) use basically exactly the same code, so we can reduce the code duplication by introducing a common template (or even a helper function that uses a reference to the base class, as the MagicServiceQuery
uses inheritance and virtual functions.
Add the possibility to do text seach with a magic service query. This leads to two new service queries, one for ql:contains-word and one for ql:contains-entity.