-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
MDEV-28730 Remove internal parser usage from InnoDB fulltext #4036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 11.8
Are you sure you want to change the base?
Conversation
|
|
7e56f12 to
aadf724
Compare
dr-m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some initial comments. I did not review all of this yet.
I welcome the reduction of dependencies on the internal SQL parser, but we must do it carefully and add debug assertions to document the assumptions.
storage/innobase/fts/fts0fts.cc
Outdated
| dberr_t FTSQueryExecutor::read_all(dict_table_t *table, | ||
| read_record* func, void *arg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++ way would be to pass a functor object that would also be compatible with a lambda expression. Can you replace `func,arg´ with that?
| ib::error() << "(" << err << ") during optimize, while adding a" | ||
| << " word to the FTS index."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please use sql_print_error in new code?
The last operator<< invocation is redundant here; it would concatenating two compile-time constant strings at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we write something like the following here?
sql_print_error("(%s) during optimize for table %.*sQ.%sQ",
ut_strerr(err),
int(table->name.dblen()), table->name.m_name,
table->name.basename());3c73bf2 to
247b016
Compare
247b016 to
dc23f97
Compare
dc23f97 to
a9c6fc6
Compare
| dberr_t fts_index_fetch_nodes_low( | ||
| dict_table_t *aux_table, const fts_string_t *word, fts_fetch_t *fetch, | ||
| FTSQueryRunner *sqlRunner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not a member function of FTSQueryRunner?
Why is there no noexcept?
Alternatively, why is sqlRunner the first parameter, so that we can avoid shuffling registers at least for the first statement, which is invoking a member function on sqlRunner?
| err= sqlRunner->record_executor(index, READ, match_op, PAGE_CUR_GE, | ||
| fetch->callback, fetch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the callee determine fetch->callback by itself? We don’t need to pass that as a parameter, right?
| FTSQueryRunner sqlRunner(trx); | ||
| fetch->heap= sqlRunner.heap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to be assigning fetch->heap to something that is allocated from our local stack. Where do we reset fetch->heap before returning from this function? We surely wouldn’t want to leave a dangling pointer to something that should have been destroyed, right?
Do we need to add the data member fts_fetch_t::heap at all? How was the memory managed before these changes?
- Introduce a class FTSQueryRunner to handle queries for fulltext
internal tables. Basically it creates a query, prepares the
fulltext internal table for read or write process. Build a tuple
based on the given table and assign the FTS_CONFIG, FTS_COMMON_TABLES
and FTS_AUX_TABLE fields based on the given value. This class
also handles INSERT, DELETE, REPLACE, UPDATE and
execute the function for each record (record_executor).
FTSQueryRunner::create_query_thread(): Create a query thread to execute
the statement on internal FULLTEXT tables
FTSQueryRunner::build_tuple(): Build a tuple for the operation
FTSQueryRunner::build_clust_ref(): Build a clustered index reference
for clustered index lookup for the secondary index record
FTSQueryRunner::assign_config_fields(): Assign the tuple for the
FTS CONFIG internal table
FTSQueryRunner::assign_common_table_fields(): Assign the tuple for
FTS_DELETED, FTS_DELETED_CACHE, FTS_BEGIN_DELETED,
FTS_BEGIN_DELETED_CACHE common tables
FTSQueryRunner::assign_aux_table_fields(): Assign the tuple for
FTS_PREFIX_INDEX tables.
FTSQueryRunner::handle_error(): Handling error for DB_LOCK_WAIT,
retry the operation
FTSQueryRunner::open_table(): Open the table based on the fulltext
auxiliary table name and FTS common table name
FTSQueryRunner::prepare_for_write(): Lock the table for write
process by taking Intention Exclusive lock
FTSQueryRunner::prepare_for_read(): Lock the table for read
process by taking Intention Shared lock
FTSQueryRunner::write_record(): Insert the tuple into the given table
FTSQueryRunner::lock_or_sees_rec(): Lock the record in case of
DELETE, SELECT_UPDATE operation. Fetch the correct version of record in
case of READ operation. It also does clustered index lookup in case
of search is on secondary index
fts_cmp_rec_dtuple_prefix(): Compare the record with given tuple field
for tuple field length
FTSQueryRunner::record_executor(): Read the record of the given index and
do call the callback function for each record
FTSQueryRunner::build_update_config(): Build the update vector for
FULLTEXT CONFIG table
FTSQueryRunner::update_record(): Update the record with update vector
exist in FTSQueryRunner
Removed the fts_parse_sql(), fts_eval_sql(), fts_get_select_columns_str()
and fts_get_docs_clear().
Moved fts_get_table_id() & fts_get_table_name() from fts0sql.cc to
fts0fts.cc and deleted the file fts0sql.cc
Removed ins_graph, sel_graph from fts_index_cache_t
Changed the callback function default read function parameter for
each clustered index record to
bool fts_sql_callback(dict_index_t*, const rec_t *, const rec_offs*,
void *);
Following parameters are changed to default read function parameter:
fts_read_stopword()
fts_fetch_store_doc_id()
fts_query_expansion_fetch_doc()
fts_read_count()
fts_get_rows_count()
fts_init_doc_id()
fts_init_recover_doc()
read_fts_config()
fts_optimize_read_node()
fts_optimize_index_fetch_node()
fts_index_fetch_nodes()
fts_fetch_index_words()
fts_index_fetch_words()
fts_fetch_doc_ids()
fts_table_fetch_doc_ids()
fts_read_ulint()
fts_copy_doc_ids()
fts_optimize_create_deleted_doc_id_snapshot()
fts_query_index_fetch_nodes()
fts_query_fetch_document()
fts_query_index_fetch_nodes()
row_upd_clust_rec_low(): Function does updates a clustered
index record of a row when ordering fields don't change.
Function doesn't have dependency on row_prebuilt_t. This can be
used by fulltext internal table update operation
Row_sel_get_clust_rec_for_mysql::operator(): Removed the
parameter row_prebuilt_t and caller does pass the prebuilt
related variables
Removed the parser usage and execute the query directly on
fulltext internal tables in the following function:
fts_read_stopword()
fts_fetch_store_doc_id()
fts_query_expansion_fetch_doc()
fts_read_count()
fts_get_rows_count()
fts_init_doc_id()
fts_init_recover_doc()
read_fts_config()
fts_optimize_read_node()
fts_optimize_index_fetch_node()
fts_index_fetch_nodes()
fts_fetch_index_words()
fts_index_fetch_words()
fts_fetch_doc_ids()
fts_table_fetch_doc_ids()
fts_read_ulint()
fts_copy_doc_ids()
fts_optimize_create_deleted_doc_id_snapshot()
fts_query_index_fetch_nodes()
fts_query_fetch_document()
fts_query_index_fetch_nodes()
i_s_fts_deleted_generic_fill()
i_s_fts_index_table_fill_selected()
a9c6fc6 to
dfa7fe3
Compare
| fts_write_node(dict_table_t *table, | ||
| fts_string_t *word, fts_node_t* node, | ||
| FTSQueryRunner *sqlRunner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my comment on fts_index_fetch_nodes_low() yesterday: Why is this not a noexcept member function of FTSQueryRunner? That would allow us to avoid some register shuffling around the first call.
While we need to declare all member functions in one place, we can define them in different compilation units. We already do this for member functions of mtr_t, log_t or buf_pool_t. This practice could be followed also for FTSQueryRunner.
|
|
||
| std::map<ulint, dict_table_t*> aux_tables; | ||
| trx_t *trx = optim->trx; | ||
| dict_table_t *aux_table= nullptr; | ||
| FTSQueryRunner sqlRunner(trx); | ||
| /* Setup the callback to use for fetching the word ilist etc. */ | ||
| fetch.read_arg = optim->words; | ||
| fetch.read_record = fts_optimize_index_fetch_node; | ||
| fetch.callback = &fts_optimize_index_fetch_node; | ||
| fetch.heap= sqlRunner.heap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added code needs to be indented with TAB, just like the existing code. It could also make sense to reformat this entire function to use indentation by two spaces instead of TAB.
The formatting around = is inconsisetnt here. In the old InnoDB formatting style, there would always be a space before and after =.
| ib::error() << "(" << err << ") while updating last doc id for table" | ||
| << table->name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please avoid ib::logger in new code?
sql_print_error("(%s) while updating last doc id for table %.*sQ.%sQ",
ut_strerr(err),
int(table->name.dblen()), table->name.m_name,
table->name.basename());| trx->free(); | ||
| } | ||
| func_exit: | ||
| if (local_trx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does !local_trx hold? In that case, which error would we report and where, in case err != DB_SUCCESS?
| ib::error() << "(" << err << ") during optimize, while adding a" | ||
| << " word to the FTS index."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we write something like the following here?
sql_print_error("(%s) during optimize for table %.*sQ.%sQ",
ut_strerr(err),
int(table->name.dblen()), table->name.m_name,
table->name.basename());| static | ||
| dberr_t fts_copy_doc_ids(dict_table_t *from, dict_table_t *to, | ||
| FTSQueryRunner *sqlRunner) | ||
| { | ||
| std::vector<doc_id_t> doc_ids; | ||
| dberr_t error= sqlRunner->prepare_for_read(from); | ||
| if (error == DB_SUCCESS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a member function of FTSQueryRunner, or could that be the first parameter of the function? I’m not sure if we’d want ´noexcepthere because thestd::vector` without a custom allocator could throw an exception. In any case, I would rewrite this a bit to simplify the code that is generated for the first call:
static
dberr_t fts_copy_doc_ids(FTSQueryRunner *sqlRunner,
dict_table_t *from, dict_table_t *to)
{
if (dberr_t error= sqlRunner->prepare_for_read(from))
return error;
std::vector<doc_id_t> doc_ids;Do we actually need to buffer everything at once, or could we handle each doc_id one by one? Could we use multiple batches over a fixed-size stack-allocated array here to avoid the heap memory allocation? For example, see how btr_search_build_page_hash_index() can process multiple batches of fr[] that are allocated from the stack.
| dberr_t error= sqlRunner->prepare_for_read(from); | ||
| if (error == DB_SUCCESS) | ||
| { | ||
| ut_ad(UT_LIST_GET_LEN(from->indexes) == 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this assertion be at the start of this function? Can we assert the same about to->indexes? Can we assert anything about the table names?
dr-m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to refactor the fts_sql_callback interface to use a function object.
| dict_table_t *table= sqlRunner.open_table(fts_table, &err); | ||
| if (table) | ||
| { | ||
| ut_ad(UT_LIST_GET_LEN(table->indexes) == 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could FTSQueryRunner::open_table() include this assertion, as well as assert something on the table name?
| /* In case of empty table, error can be DB_END_OF_INDEX | ||
| and key doesn't exist, error can be DB_RECORD_NOT_FOUND */ | ||
| if (err == DB_END_OF_INDEX || err == DB_RECORD_NOT_FOUND) | ||
| err= DB_SUCCESS; | ||
| } | ||
| table->release(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The err adjustment could be done after table->release(). That could lead to simpler code, depending on how smart the optimizer in the compiler is.
| *value->f_str = '\0'; | ||
| ut_a(value->f_len > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the assertion be moved to the start of the function, like they were before this refactoring? Here it is inside a conditional execution path.
Some comment about value->f_len would be useful here. Apparently, it initially specifies the allocated size of the buffer. In read_fts_config() it would be transformed to the actual length. Shouldn’t we also set value->f_len=0 in case no record was fetched (err is one of the two values that we map to DB_SUCCESS)?
| err= sqlRunner.record_executor(clust_index, READ, MATCH_UNIQUE, | ||
| PAGE_CUR_GE, &read_fts_config, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pass a functor object that would in this case wrap both read_fts_config and value? Currently we are using a type-unsafe method of passing void* to read_fts_config().
| FTSQueryRunner sqlRunner(trx); | ||
| dict_table_t *fts_config= sqlRunner.open_table(&fts_table, &err); | ||
| if (fts_config) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this whole section of code be moved to a new function in fts0config.cc? In that way, read_fts_config could be defined as static in that file.
|
This has apparently been superceded by #4443. |
Description
How can this PR be tested?
./mtr --suite=innodb_fts
If the changes are not amenable to automated testing, please explain why not and carefully describe how to test manually.
Basing the PR against the correct MariaDB version
mainbranch.PR quality check