-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
MDEV-28302 Configurable DEFAULTs for CHANGE MASTER
#4430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
These iterable tuples build towards refactoring Master & Relay Log info with DRY loops – their persistence for MDEV-37530, and possibly SHOW REPLICA STATUS and even CHANGE MASTER in the future.
Link `MasterInfoFile` & `RelayLogInfoFile` structs by making corresponding `Master_info` & `Relay_log_info` properties references to the fields. Properties are not moved for now due to the large number of changes. Also adjust usages affected by the addition of fields with `DEFAULT`s in preparation for MDEV-28302 “CHANGE MASTER TO …=DEFAULT”, except for persistence code that will be separately adjusted for MDEV-37362.
This commit finishes expanding MDEV-28302 (WIP)’s `DEFAULT`-capable, loop-based implementation of Master & Relay Log info file persistence. This conclusion realizes MDEV-37530’s goal. This involves moving – and improving, while here – the string and ID array parsers to `InfoFile::Persistent::load_from()` implementations. They and the other overrides supercede the `init_*var_from_file()` helpers and are will not overflow `int`s as reported by MDEV-38020. Also while here, fix MDEV-38010 “trailing garbage does not error”, as the MDEV-37530 refactors will disconnect them from the patch for prior versions. On the topic of transitioning to iterable fields interface, this commit also removes these functions: * `Domain_id_filter::init_ids()`: unused starting from this commit * `Domain_id_filter::store_ids(THD *)` & `prot_store_ids()`: unused since the Information Schema reimplemented SHOW REPLICA STATUS. This commit also has a few minor adjustments and fixes for the previous two commits.
I don’t mind highlighting them; but for records’ sake, TODO: squash
DEFAULTs for CHANGE MASTER
ParadoxV5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large projects come with large discussions, like power and responsibility.
sql/rpl_relay_log_info_file.hh
Outdated
| /// Relay_Master_Log_File (of the event *group*) | ||
| StringField<> read_master_log_file; | ||
| /// Exec_Master_Log_Pos (of the event *group*) | ||
| IntField<my_off_t> read_master_log_pos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If (hypothetically) we add the IO thread’s Relay Log position (i.e., Gtid_IO_Pos but for file-position mode) to CM/SSS, what would we name it?
|
|
||
|
|
||
| /// Persistence interface for an unspecified item | ||
| struct Persistent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about what CHANGE MASTER would look like as a table that supports SQL, such as SELECT and INSERT.
I decided that these new structs are alright, since a system table does not permit DDL (e.g., ALTER TABLE).
I am mainly concerned about an “unnecessary performance tax” for using a full-blown storage engine such as Aria, MyISAM or Memory, particularly their ACID compliance mechanisms.
Speaking of system tables, perhaps some of them can utilize a “system” storage engine, too, where storage is C++-first and DDL is not supported.
SHOW REPLICA STATUS could use a virtual method in this class, too; I am not sure.
After all, we still need to write out info files, at least on demand, if we support downgrades.
What’s the point of their forward compatibility, after all?
sql/rpl_master_info_file.hh
Outdated
| /// }@ | ||
|
|
||
|
|
||
| struct MasterInfoFile: InfoFile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name conflict, I know.
But, Master_info & Relay_log_info also appear to contain runtime data for IO and SQL threads, respectively.
This inadequacy in organization makes maintenance difficult to know what belongs to whom, not to mention one of the 0b10 hardest problems in software development.
|
|
||
| if (init_intvar_from_file(&master_log_pos, &mi->file, 4) || | ||
| init_strvar_from_file(mi->host, sizeof(mi->host), &mi->file, 0) || | ||
| init_strvar_from_file(mi->user, sizeof(mi->user), &mi->file, "test") || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test Who?
| if (c == my_b_EOF) | ||
| return true; | ||
| buf[i]= c; | ||
| if (c == /* End of Count */ ' ' || c == /* End of Line */ '\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: extend my_b_gets() to accept multiple delimiters
| ../sql/filesort_utils.cc ../sql/sql_digest.cc | ||
| ../sql/filesort.cc ../sql/grant.cc | ||
| ../sql/gstream.cc ../sql/slave.cc | ||
| ../sql/gstream.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when moving these functions to mysys, don't forget to remove slave.cc from libmysqld/CMakeLists.txt
⸺init_intvar_from_file()’s code documentation
| gtid_supported ? | ||
| enum_master_use_gtid::SLAVE_POS : enum_master_use_gtid::NO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the code, it looks like if the primary is before MariaDB 10 (back when GTIDs were not available), the replica sets its master_use_gtid to NO and remembers this as its new default for RESET REPLICA.
IMO, it would be nice if the replica would automatically default back to Slave_Pos if the connection changes to a MariaDB 10+ primary.
- Here, this “remember
NOas its newDEFAULT” does not override the option-configured default, since the primary’s version is not known until the replica (re)connects to it.
Should it? - Or, maybe we should silently fall back to
NOwithout affecting this config.- Currently, if the primary is MariaDB 10+,
RESET REPLICAalso warns about changingUsing_Gtid(sic) ifmaster_use_gtidisn’t alreadySlave_Pos.
- Currently, if the primary is MariaDB 10+,
- In fact, how old a MariaDB/MySQL version do we maintain the backward compatibility for anyway?
| m_row.ssl_allowed= mi->master_ssl ? | ||
| #ifdef HAVE_OPENSSL | ||
| PS_SSL_ALLOWED_YES | ||
| #else | ||
| PS_SSL_ALLOWED_IGNORED | ||
| #endif | ||
| : PS_SSL_ALLOWED_NO; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PerfSchema has its own copies of enum_rpl_yes_no, enum_ssl_allowed and enum_using_gtid, but with each value one more than the corresponding Master_info enum.
It could be that the enums in the PerfSchema storage engine start from 1 instead of 0.
What should we do about them?
Not to mention the rest of the Replica Status fields duplicated between the InfoSchema and the PerfSchema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am gonna take a short break from this project.
Known issues with master_heartbeat_period in the draft so far
- It was reset not with RESET REPLICA but with CHANGE MASTER.
And now I removed this unintuitive CHANGE MASTER reset, but forgot to add it to RESET REPLICA. - MDEV-35879 Slave_heartbeat_period is imprecise:
I am getting4294967.040. (We might actually make this a blocker.)
bnestere
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ParadoxV5 !
I gave the patch a read-through. Not super in-depth b/c it is still a draft (thanks for the early publish!), but enough to get the idea, and I think it is good so far. I left some notes.
A general concern I have is that it is effectively un-loop-unrolling and adding in additional function calls, and I'd worry it would make MDEV-37684 worse. So I think we should run some benchmarks with this to make sure it isn't incurring any additional overhead.
sql/rpl_master_info_file.hh
Outdated
| bool load_from(IO_CACHE *file) override | ||
| { | ||
| buf[1]= false; // not default | ||
| return StringField<>::load_from(file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overwrites buf, yet you use buf to track "defaultness". That can make things hard to debug, I'd suggest to just have a variable to track default.
sql/rpl_master_info_file.hh
Outdated
| static constexpr const char END_MARKER[]= "END_MARKER"; | ||
| /// An iterable for the `key=value` section of `@@master_info_file` | ||
| // C++ default allocator to match that `mysql_execute_command()` uses `new` | ||
| inline static const std::unordered_map<std::string_view, mem_fn> FIELDS_MAP= { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why unordered? As the FIELDS_MAP is iterated over when saving the KV pairs, wouldn't that make it non-deterministic? The fields should be written in the same order as pre-MDEV-37530 so users can downgrade without breaking anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pre-MDEV-37530 code checks keys, too, so the order is not important.
I can see that the predecessor who added this key-value section in the first place had order independence in mind.
(That code is a chain of if (got_eq && !seen_X && !strcmp(buf, "X")), however, which will look ridiculous under my (backward-compatible!) plan of stuffing the DEFAULT choices in that section, not to mention performance comparisons with hash lookups.)
| MariaDB 10.0 does not have the `END_MARKER` before any left-overs at | ||
| the end of the file, so ignore any non-first occurrences of a key. | ||
| */ | ||
| auto seen= std::unordered_set<const char *>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this used for? I see it is inserted into, but never used after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else if (seen.insert(key).second) // if no previous insertionThe second member of the return of std::unordered_set::insert(value_type&&) is true if and only if the set doesn’t already have the element.
I’ll expand the comment to describe this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, my blunder 🤦. Though with that said, we should prefer MariaDB data types (when available) to promote a standard way of doing things. Here, you should be able to just use HASH (and it should be used in the same way as sql_hset.h's Hash_set, though I'm not sure if you can just use Hash_set because you're just using const char *.).
And another question - do we need to track seen? Would it otherwise just overwrite the last value seen (where there shouldn't be duplicates anyway)? That's how config options in my.cnf work, and I'd think we can keep that same behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, you should be able to just use
HASH(and it should be used in the same way assql_hset.h'sHash_set, though I'm not sure if you can just useHash_setbecause you're just usingconst char *.).
Indeed, though I prefer the C++ standard library to roll fewer things of my own.
Here, I can use Hash_set, but I would either have to specify a charset (for comparing contents) or a comparator of (constant) pointers by their face values.
I am actually looking forward to a future where we use PSI_memory_key-configured C++ allocators with the best C++ standard library implementation.
Would it otherwise just overwrite the last value seen (where there shouldn't be duplicates anyway)? That's how config options in
my.cnfwork, and I'd think we can keep that same behavior.
I asked this question too.
The “code document” of seen is slightly edited from the comment at the corresponding section in rpl_mi.cc, which was:
10.0 does not have the END_MARKER before any left-overs at the end
of the file. So ignore any but the first occurrence of a key.
That is, a file from 10.0 could have two entries of a key by chance and intend the first to be the real one.
(Of course, you could also argue that 10.0 is long EOL, nobody should have them around, and we can get rid of code that holds those compatibilities together.)
sql/rpl_master_info_file.hh
Outdated
| /// }@ | ||
|
|
||
|
|
||
| inline static const std::initializer_list<mem_fn> FIELDS_LIST= { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be combined with FIELDS_MAP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that once.
But nearly all of the MDEV-28302 fields are in the line-by-line section (FIELDS_LIST), and I want to write those fields both in this old-style section to preserve the defaulted values for downgrades, and in the key-value section to indicate the DEFAULT state.
The differences in the two sections encouraged me to simply “unwind this loop”.
Developing forward compatibility for a feature in older versions (where only the Enterprise Server receive features) sure is a task.
* Rename files and classes to match the MariaDB code style * Fix compile issues on different setups * Fix RESET REPLICA * Kludge-fix `4294967.040` due to MDEV-35879. This nit is becoming a blocker if I continue on this route…
sql/rpl_master_info_file.h
Outdated
| bool load_from_file() override | ||
| { | ||
| /// Repurpose the trailing `\0` spot to prepare for the `=` or `\n` | ||
| static constexpr size_t MAX_KEY_SIZE= sizeof("ssl_verify_server_cert"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Syntax error on Windows ARM64.
Google results say that it is because MAX_KEY_SIZE is an existing macro (defined outside of our repository).
MDEV-35879 Slave_heartbeat_period is imprecise
* Increase maximum to `std::numeric_limits<uint32_t>::max() ÷ 1000`,
i.e., `4294967.295` *exactly*
* Excludes its additional tests as they are exclusive to that issue fix.
* But not the `0.00049` case, which was *changed*, not *added*.
* Some changes are refactors and so are exclusive to the `main` branch.
* Delete tests in `rpl.rpl_heartbeat`
that duplicates `rpl.rpl_heartbeat_basic`
… is not `const char *`. No reason to use `string_view` for this, then, if that’s the case.
to better share code with server options without the runtime warnings deciding on how the code is shared. The previous decisions were short-sighted and, with that cherry-pick, became my footgun. Also, style change: use `reinterpret_cast<>` over C-cast
* `MAX_KEY_SIZE` → `LONGEST_KEY_SIZE`
* `MAX_KEY_SIZE` seems to be is an existing *externally
defined* macro in the GitHub Workflow `windows-arm64.yml`.
* `error` → `unexpected_error`
* These operations (round, shift and export) are assumed
(and asserted in debug builds) to never error.
* Fields now also directly accept `DEFAULT` via the implementation type (unless the caller wishes to stick with the effective type and `set_default()`) * Expose `trilean` from `Optional_bool_field` * Include guards * C++20 when, so I can has modules * `explicit operator bool()` doesn’t need `static_cast` when as an `if` condition.
This commit is the `DEFAULT` keyword and MTR tests part and also includes tweaks to the server options’ implementation.
* `std::optional` is not *trivially* `union`-safe, so use a `struct` stand-in, at least until I figure out a better solution. * Move `mi->connects_tried= 0;` back so I can go back to using `Master_info_file` over `Master_info` * So it’s not just me who cannot restart the server, huh. Also, add a test section that tests feeding invalid values to `--master-heartbeat-period`
I’m playing Whac-A-Mole, am I not…
ParadoxV5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buildbot/amd64-msan-clang-20 has been consistently segfaulting during server startup in the recent snapshots.
I am unable to tell from the logs and the first crashing snapshot’s logs; I would like your assistance.
Besides this problem and commit squashing, the project is ready for review, @bnestere & @andrelkin.
| sprintf(static_cast<char*>(buff), "%.3lf", | ||
| mi->master_heartbeat_period/1000.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, maybe the cancelled master_heartbeat_period → string method still has one (and only one) use.
Though the odd truth is that status variables are stuck with all values being strings, even though many (if not the majority) are numeric.
| mi->master_use_gtid.set_default(); | ||
| mi->master_heartbeat_period.set_default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was no “USE_GTID_DEFAULT” previously, so master_use_gtid was reset only conditionally.
master_heartbeat_period, OTOH, was reset in CHANGE MASTER rather than RESET REPLICA.
I wonder, though, why doesn’t RESET REPLICA ALL reset all CHANGE MASTER fields of the default unnamed connection, whereas it is a complete deletion for named (as in multi-source) connections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question for the future (no need to answer immediately):
Should other fields (the ones not listed) “recognize” the DEFAULT keyword as well?
Fields with trivial DEFAULTs
- Primary & Relay File/Pos:
MASTER_LOG_FILE:DEFAULTis empty, and empty means “first”.MASTER_LOG_POS/RELAY_LOG_POS:DEFAULTis 4- silently set to
DEFAULTif set to less
- silently set to
- ignored with a warning when using GTID
MASTER_PORT:DEFAULTis the MariaDB port- 0 is (technically) invalid.
IGNORE_SERVER_IDS/DO_DOMAIN_IDS/IGNORE_DOMAIN_IDS:DEFAULTis empty.MASTER_DELAY:DEFAULTis 0.int32_ton paper, unsigned in practice: errors if set larger thanMASTER_DELAY_MAX(0x7F'FF'FF'FF)
No DEFAULT – “recognizes” as an error
RELAY_LOG_FILE-
The
CHANGE MASTERstatement usually deletes all relay log files. However, if theRELAY_LOG_FILEand/orRELAY_LOG_POSoptions are specified, then existing relay log files are kept.
⸺ https://mariadb.com/docs/server/reference/sql-statements/administrative-sql-statements/replication-statements/change-master-to#relay-log-options - Empty is invalid.
- GTID preferences aside, what would “first” be useful for?
- also ignored with a warning when using GTID
-
MASTER_HOST&MASTER_USER: “DEFAULT” is empty, which can be set, but the IO thread will err during startup.
| (slave_net_timeout/2.0)); | ||
| if (lex_mi->heartbeat_period) | ||
| lex_mi->heartbeat_period(mi); | ||
| mi->received_heartbeats= 0; // counter lives until master is CHANGEd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the intent to have CHANGE MASTER always reset this?
Was it because master_heartbeat_period is always changed regardless of whether it was configured in the CHANGE MASTER?
Description
MDEV-28302 Configurable defaults for CHANGE MASTER
Many CHANGE MASTER fields typically have the same configurations between multi-source connections – namely:
When MDEV-25674 added
master_retry_countto CHANGE MASTER, it kept the server option--master-retry-countto be its default value.This commit back-adds corresponding server options for the defaults of the rest of those fields.
With them, the command line or config files can set up common configurations across replication sources (and even replicas).
CHANGE MASTER can override unset (defaulted) configurations per connection.
This commit also adds
DEFAULTkeyword support for all of those fields, so overridden configurations can reset to the default without RESET REPLICA.Side Effects
Supporting passing the
DEFAULTkeyword also enables settingmaster_connect_retryandmaster_retry_countto 0, which was previously disregarded.While here, the commit also increases
master_retry_countto 64-bit on LLP64 (e.g., Windows) to match LP64 (e.g., Linux).1. MDEV-37530 Refactor Master & Relay Log info to iterable tuples
The persistence of CHANGE MASTER is driven by loose or repetitive code.
This is not DRY and makes feature expansions (such as MDEV-28302
CHANGE MASTER …=DEFAULT) error-prone if not difficult.This commit refactors those blocks to loops over simple wrapper structs with a consistent interface.
This commit also includes preemptive support for preserving MDEV-28302’s
=DEFAULT.As such, unset fields (namely
master_connect_retry) now remember theirDEFAULTstates rather than whatever the default is at CHANGE MASTER time.Side Effects
For consistency’s sake,
master_heartbeat_periodis now reset at RESET REPLICA instead of CHANGE MASTER.(Yes, CHANGE MASTER previously resets it even when there is no
master_heartbeat_period=DEFAULT.)As this refactor will disconnect it from fixes for some open bugs in prior versions, this commit also:
Slave_heartbeat_periodis imprecisemaster_heartbeat_periodhas been increased to 4294967.295, i.e., (2³²-1)÷1000.master_heartbeat_periodnow rounds instead of truncates (rounds down).master_retry_count64-bit on LLP64 (e.g., Windows) to match LP64 (e.g., Linux)How can this PR be tested?
main.change_master_defaultthat runs its (unique) features.main.master_retry_count_basic(FIXME: this one is in a MDEV-28302 commit instead),rpl.rpl_heartbeatandrpl.rpl_heartbeat_basicto match the adjusted behaviors.PR quality check
orand a refactoring, and the PR is based against themainbranch.This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.