Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Magic service query for text search. #1732

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Flixtastic
Copy link
Contributor

Add the possibility to do text seach with a magic service query. This leads to two new service queries, one for ql:contains-word and one for ql:contains-entity.

@sparql-conformance
Copy link

Copy link

codecov bot commented Jan 24, 2025

Codecov Report

Attention: Patch coverage is 20.46512% with 171 lines in your changes missing coverage. Please review.

Project coverage is 89.49%. Comparing base (432cd64) to head (b968bb7).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/parser/EntitySearchQuery.cpp 0.00% 52 Missing ⚠️
src/parser/sparqlParser/SparqlQleverVisitor.cpp 0.00% 45 Missing and 2 partials ⚠️
src/parser/WordSearchQuery.cpp 0.00% 30 Missing ⚠️
src/engine/QueryPlanner.cpp 22.22% 14 Missing ⚠️
src/engine/TextIndexScanForEntity.cpp 68.96% 7 Missing and 2 partials ⚠️
src/engine/TextIndexScanForWord.cpp 68.96% 7 Missing and 2 partials ⚠️
src/engine/CheckUsePatternTrick.cpp 0.00% 6 Missing ⚠️
src/parser/EntitySearchQuery.h 0.00% 2 Missing ⚠️
src/parser/WordSearchQuery.h 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1732      +/-   ##
==========================================
- Coverage   89.92%   89.49%   -0.44%     
==========================================
  Files         393      397       +4     
  Lines       37601    37772     +171     
  Branches     4231     4258      +27     
==========================================
- Hits        33814    33803      -11     
- Misses       2486     2664     +178     
- Partials     1301     1305       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An initial round of reviews, please read the comments first, some are of a rather structural nature about the general design.

Comment on lines +75 to +80
static_assert(std::is_same_v<T, p::TransPath> ||
std::is_same_v<T, p::PathQuery> ||
std::is_same_v<T, p::Describe> ||
std::is_same_v<T, p::SpatialQuery> ||
std::is_same_v<T, p::WordSearchQuery> ||
std::is_same_v<T, p::EntitySearchQuery>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can refactor this large thing use
ad_utility::SameAsAny<T,p::TransPath, p::PathQuery, ...>
(Then we don't repeat the || and the is_same_v

Comment on lines +2551 to +2555
auto config = entitySearchQuery.toTextIndexScanForEntityConfiguration();
std::vector<SubtreePlan> candidatesOut;
auto plan = makeSubtreePlan<TextIndexScanForEntity>(qec_, config);
candidatesOut.push_back(std::move(plan));
visitGroupOptionalOrMinus(std::move(candidatesOut));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as for the other visit function.

Comment on lines +69 to +78
} else {
variableColumns_[textRecordVar_.getEntityScoreVariable(fixedEntity())] =
makeAlwaysDefinedColumn(index);
}
} else {
addDefinedVar(entityVariable());
addDefinedVar(textRecordVar_.getEntityScoreVariable(entityVariable()));
variableColumns_[entityVariable()] = makeAlwaysDefinedColumn(index);
index++;
if (config_.varToBindScore_.has_value()) {
variableColumns_[config_.varToBindScore_.value()] =
makeAlwaysDefinedColumn(index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is really hard to read here.
An easier interface would be if the class has a member
std::optional<Variable> textScoreVariable that is initialized in the constructors (nullopt means " don't add the text score" ,
then you get a simple if-else here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think if we want the opportunity to have no score variable at all (if it is not used in the magic service query), then we have to also change other functions (computeResult, getResultWidth) etc.
Otherwise the optional makes no sense.

Comment on lines +59 to +63
TextIndexScanForEntityConfiguration config_;
VariableToColumnMap variableColumns_;
Variable textRecordVar_;
VarOrFixedEntity varOrFixed_;
string word_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These members are redundant.
Either you only store the config_ or you store the old three seperate members. The conversion is then done in the Constructor (one of the formats has to be converted)


// _____________________________________________________________________________
VariableToColumnMap TextIndexScanForWord::computeVariableToColumnMap() const {
return variableColumns_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this class the same comments as for the TextIndexScanForEntity also apply.

Comment on lines +10 to +15

// ____________________________________________________________________________
void EntitySearchQuery::addParameter(const SparqlTriple& triple) {
auto sipmleTriple = triple.getSimple();
TripleComponent predicate = sipmleTriple.p_;
TripleComponent object = sipmleTriple.o_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't look at this class in detail, as I first have a general qustion:
Shoult thi bea separate SERVICE, or is the contains-entity stuff not part of the Fulltext search query?
So you have a single query (syntax imagined)

SERVICE textSearch: {
[] <textVar> ?text ;
<wordMatch> [<word> "walk"; <score> ?scoreWalk];
<entityMention> [<entity> <Flixtastic>; <score> ?scoreFelix] .
}

Of course then the configToTextIndexScans function always returns a vector of plans
(n WordScans, and m EntityScans), but we have a single class that manages the text index. Note: Not every word search has an entity search, but every entity search has a word search.
Maybe think about this, and we can also discuss this 1-1 next week some time.

Comment on lines +16 to +18
std::optional<std::string> word_;
std::optional<Variable> textVar_;
std::optional<Variable> matchVar_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As hinted above I would heavily suggest a design, where all the words and prefixes of a single ?text variable and all their entity mentions are specified in a single SERVICE query.

Comment on lines +22 to +26
WordSearchQuery(WordSearchQuery&& other) noexcept = default;
WordSearchQuery(const WordSearchQuery& other) noexcept = default;
WordSearchQuery& operator=(const WordSearchQuery& other) = default;
WordSearchQuery& operator=(WordSearchQuery&& a) noexcept = default;
~WordSearchQuery() noexcept override = default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think none of those is explicitly needed.

// See MagicServiceQuery
void addParameter(const SparqlTriple& triple) override;

TextIndexScanForWordConfiguration toTextIndexScanForWordConfiguration() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name is very redundant.
The name toConfig() would absolutely suffice.

Comment on lines +954 to +960

GraphPatternOperation Visitor::visitEntitySearchQuery(
Parser::ServiceGraphPatternContext* ctx) {
auto parseEntitySearchQuery = [ctx](parsedQuery::EntitySearchQuery&
textSearchQuery,
const parsedQuery::GraphPatternOperation&
op) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm, that basically all our magic service visitors (2 preexisting plus those you add) use basically exactly the same code, so we can reduce the code duplication by introducing a common template (or even a helper function that uses a reference to the base class, as the MagicServiceQuery uses inheritance and virtual functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants