-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
MDEV-28730 Remove internal parser usage from InnoDB fts #4443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Introduce QueryExecutor to perform direct InnoDB record scans with a callback interface and consistent-read handling. Also handles basic DML operation on clustered index of the table Newly Added file row0query.h & row0query.cc QueryExecutor class the following apis read(): iterate clustered index with RecordCallback read_by_index(): scan secondary index and fetch clustered row lookup_clustered_record(): resolve PK from secondary rec process_record_with_mvcc(): build version via read view and skip deletes insert_record(): Insert tuple into table's clustered index select_for_update(): Lock the record which matches with search_tuple update_record(): Update the currently selected and X-locked clustered record. delete_record(): Delete the clustered record identified by tuple delete_all(): Delete all clustered records in the table replace_record(): Tries update via select_for_update() + update_record(); if not found, runs insert_record.
Add FTSQueryExecutor class as a thin abstraction over QueryExecutor.
This class takes care of open, lock, read, insert, delete
for all auxiliary tables INDEX_[1..6], common FTS tables
(DELETED, DELETED_CACHE, BEING_DELETED, CONFIG..)
FTSQueryExecutor Class which has the following function:
Auxiliary table functions : insert_aux_record(), delete_aux_record(),
read_aux(), read_aux_all()
FTS common table functions : insert_common_record(), delete_common_record(),
delete_all_common_records(), read_all_common()
FTS CONFIG table functions : insert_config_record(), update_config_record(),
delete_config_record(), read_config(),
read_all_config(), read_config_with_lock()
Introduce CommonTableReader callback to collect doc_id_t from fulltext common tables (DELETED, BEING_DELETED, DELETED_CACHE, BEING_DELETED_CACHE). These table share the same schema strucutre. Simplified all function which uses DELETED, BEING_DELETED, DELETED_CACHE, BEING_DELETED_CACHE table. These functions uses executor.insert_common_record(), delete_common_record(), delete_all_common_records() instead of SQL or query graph. fts_table_fetch_doc_ids(): Changed the signature of the function to pass the table name instead of fts_table_t.
Introduce ConfigReader callback to extract key, value from fulltext config common table (CONFIG). This table has <key, value> schema. Simplifield all function which uses CONFIG tale. These functions uses executor.insert_config_record(), update_config_record() instead of SQL or query graph.
Introduce AuxCompareMode and AuxRecordReader to scan FTS auxiliary
indexes with compare+process callbacks.
Replace legacy SQL-graph APIs with typed executor-based ones:
-Add fts_index_fetch_nodes(trx, index, word, user_arg,
FTSRecordProcessor,compare_mode).
-Redefine fts_write_node() to use FTSQueryExecutor and fts_aux_data_t.
Implement write path via delete_aux_record (or) insert_aux_record.
Keep lock-wait retry handling and memory limit checks.
Change fts_select_index{,_by_range,_by_hash} return type
from ulint to uint8_t and simplify return flow.
Include fts0exec.h in fts0priv.h and update declarations accordingly.
Refactor fetch, optimize to QueryExecutor and standardize processor API. Replaced legacy SQL-graph paths with QueryExecutor-based reads/writes: fts_query code now uses QueryExecutor::read(), read_by_index() with RecordCallback (updating fts_query_match_document(), fts_query_is_in_proximity_range(), and fts_expand_query() to call fts_query_fetch_document() instead of fts_doc_fetch_by_doc_id(), which was removed along with FTS_FETCH_DOC_BY_DOC_ID_* macros); Rewrote fts_optimize_write_word() to delete (or) insert via FTSQueryExecutor::delete_aux_record()/insert_aux_record() using fts_aux_data_t;
- Removed fts0sql.cc file. - Removed commented fts funtions - Removed fts_table_t from fts_query_t and fts_optimize_t
|
|
|
In addition to the CI failures needing correcting, does this mean Great to see the parser going away. |
386e026 to
04ec1e9
Compare
- Fix compilaton issue - delete_all() moves the cursor to user record once you open the left leaf page
04ec1e9 to
ff6a64d
Compare
dr-m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some quick initial comments.
| @@ -0,0 +1,634 @@ | |||
| /***************************************************************************** | |||
| Copyright (c) 2025, MariaDB Corporation. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such legal entity anymore. I think it's MariaDB plc.
| dberr_t QueryExecutor::read_by_index(dict_table_t *table, | ||
| dict_index_t *sec_index, | ||
| dtuple_t *search_tuple, | ||
| page_cur_mode_t mode, | ||
| RecordCallback& callback) noexcept | ||
| { | ||
| ut_ad(table); | ||
| ut_ad(sec_index); | ||
| ut_ad(sec_index->table == table); | ||
| ut_ad(!dict_index_is_clust(sec_index)); | ||
|
|
||
| dict_index_t *clust_index= dict_table_get_first_index(table); | ||
| if (!clust_index) return DB_ERROR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function should be declared with __attribute__((nonnull)) and the related assertions removed. Why is search_tuple not a pointer to const? There is a space before each unary * but after the unary &. That is inconsistent.
Do we really need the if condition here? There should be no code path that would add a table to dict_sys such that it does not have any indexes.
| if (!lookup_clustered_record(table, sec_index, clust_index, | ||
| sec_rec, callback, match_count)) | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not distinguish corruption and "not found". Shouldn’t the return type be dberr_t?
| dtuple_t *clust_tuple= row_build_row_ref(ROW_COPY_DATA, sec_index, | ||
| sec_rec, m_heap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not ROW_COPY_POINTERS? At least add a comment about that.
| /* Extract primary key from secondary index record */ | ||
| dtuple_t *clust_tuple= row_build_row_ref(ROW_COPY_DATA, sec_index, | ||
| sec_rec, m_heap); | ||
| if (!clust_tuple) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like dead code. If ref = dtuple_create(heap, ref_len); returned nullptr to row_build_row_ref(), the subsequent statement dict_index_copy_types(ref, clust_index, ref_len); should crash. So, a nullptr return value should be impossible and this should be dead code.
| if (err == DB_SUCCESS) btr_pcur_move_to_next(&pcur, &mtr); | ||
| if (err != DB_SUCCESS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value of btr_pcur_move_to_next() is being ignored.
It is a bit harder to recognize the conditional branches when there is no line break between the if condition and the body.
| dberr_t QueryExecutor::delete_all(dict_table_t *table) noexcept | ||
| { | ||
| dict_index_t *index= dict_table_get_first_index(table); | ||
| btr_pcur_t pcur; | ||
| mtr_t mtr(m_trx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is mtr not a data member of QueryExecutor?
| /* >= comparison (range scan from word) */ | ||
| GREATER_EQUAL, | ||
| /* > comparison (exclude exact match) */ | ||
| GREATER, | ||
| /* LIKE pattern matching (prefix match) */ | ||
| LIKE, | ||
| /* = comparison (exact match) */ | ||
| EQUAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be Doxygen comments starting with /**.
| #ifndef INNOBASE_FTS0QUERY_H | ||
| #define INNOBASE_FTS0QUERY_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s just use #pragma once like we do in many other files.
| FTSQueryExecutor executor(trx, nullptr, table); | ||
| ConfigReader reader; | ||
| dberr_t err= executor.read_config(name, reader); | ||
| if (err == DB_SUCCESS) | ||
| { | ||
| if (reader.config_pairs.size() != 1) err= DB_ERROR; | ||
| else | ||
| { | ||
| const std::string& config_value= reader.config_pairs[0].second; | ||
| ulint max_len= ut_min(value->f_len - 1, config_value.length()); | ||
| memcpy(value->f_str, config_value.c_str(), max_len); | ||
| value->f_len= max_len; | ||
| value->f_str[value->f_len]= '\0'; | ||
| } | ||
| } | ||
| else value->f_str[0]= '\0'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use a different API where the executor mini-transaction would stay open and a page latch would protect the config page where the data resides? No std::string, just st_::span<const char> pointing straight to the index page? The config values cannot ever span multiple BLOB pages, right?
- Use btr_pcur_open_on_user_rec() instead of btr_pcur_open() in QueryExecutor::read() and QueryExecutor::read_by_index()
fts_optimize_table() : Assigns thd to transaction even it is called via user_thread or fulltext optimize thread. Acquires MDL_SHARED_NO_WRITE for the table to avoid any DDL/DML while doing fulltext optimization. Tweaked dict_acquire_mdl_shared to acquire MDL_SHARED_NO_WRITE based on the input paramter.
Description
Remove internal parser/SQL-graph usage and migrate FTS paths to QueryExecutor
Introduced QueryExecutor (row0query.{h,cc}) and FTSQueryExecutor abstractions for
clustered, secondary scans and DML.
Refactored fetch/optimize code to use QueryExecutor::read(), read_by_index()
with RecordCallback, replacing SQL graph flows
Added CommonTableReader and ConfigReader callbacks for common/CONFIG tables
Implemented fts_index_fetch_nodes(trx, index, word, user_arg, FTSRecordProcessor, compare_mode)
and rewrote fts_optimize_write_word() to delete/insert via executor with fts_aux_data_t
Removed fts_doc_fetch_by_doc_id() and FTS_FETCH_DOC_BY_ID_* macros, updating callers to
fts_query_fetch_document()
Tightened fts_select_index{,_by_range,by_hash} return type to uint8_t;
Removed fts0sql.cc and eliminated fts_table_t from fts_query_t/fts_optimize_t.*
Release Notes
Removed the sql parser usage from fulltext subsystem
How can this PR be tested?
For QA purpose, Run RQG testing involving Fulltext subsystem
Basing the PR against the correct MariaDB version
mainbranch.PR quality check