-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Avoid creating MoveTables
in case of non-empty target tables
#16826
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Noble Mittal <noblemittal@outlook.com>
6e16950
to
a9ea332
Compare
go/vt/vtctl/workflow/utils.go
Outdated
var selectQueries []string | ||
for _, t := range tables { | ||
selectQueries = append(selectQueries, fmt.Sprintf("(select '%s' from %s limit 1)", t, t)) | ||
} | ||
query := strings.Join(selectQueries, "union all") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this for every shard? Can't we build the query once and then use it on each shard?
Keep in mind that the size of this query is unbounded, so we could hit max_allowed_packet
. I want to say that there's also a limit on the number of UNIONs you can do in a single statement but I didn't find any docs on that in a quick search. For JOINs e.g.:
The maximum number of tables that can be referenced in a single join is 61. This includes a join handled by merging derived tables and views in the FROM clause into the outer query block (see [Section 10.2.2.4, “Optimizing Derived Tables, View References, and Common Table Expressions with Merging or Materialization”](https://dev.mysql.com/doc/refman/8.4/en/derived-table-optimization.html)).
So that may be what we bump up against here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Matt for finding this. Fixed that. But even I couldn't find out the limit of UNION
s on any docs or web search. What do you think should be the limit? Should we break the query down into multiple queries?
And defer that work to vitessio#16826 Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Noble Mittal <noblemittal@outlook.com>
c3b1995
to
3b7dc15
Compare
Signed-off-by: Noble Mittal <noblemittal@outlook.com>
3b7dc15
to
8393dea
Compare
re := regexp.MustCompile(qry) | ||
if re.MatchString(qry) { | ||
re := regexp.MustCompile(qry[1:]) | ||
if re.MatchString(query) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was probably a mistake.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16826 +/- ##
==========================================
- Coverage 69.51% 69.45% -0.06%
==========================================
Files 1569 1571 +2
Lines 202517 203121 +604
==========================================
+ Hits 140780 141083 +303
- Misses 61737 62038 +301 ☔ View full report in Codecov by Sentry. |
I am working on the failing |
alreadyExistingTables := make([]string, len(hasTargetTable)) | ||
i := 0 | ||
for t := range hasTargetTable { | ||
alreadyExistingTables[i] = t | ||
i++ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can instead use maps.Keys(hasTargetTable)
with validateEmptyTables
.
for _, t := range tables { | ||
selectQueries = append(selectQueries, fmt.Sprintf("(select '%s' from %s limit 1)", t, t)) | ||
} | ||
query := strings.Join(selectQueries, "union all") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a way to handle cases where there are many tables as the number of tables is unbounded. Two factors we'll have to consider:
- This one statement could be so long that it is beyond
max_allowed_packet
- This one statement could affect more than the maximum allowed tables
In either case, the user is stuck and there is no outlet. At least unless we add a new flag that skips this test / allows for tables with existing data OR splitting the workflow up (which itself might be a bit daunting to manually specify hundreds or thousands of tables).
There is one issue already in the existing code: We get the list of tables just for one shard in This extends to the new check we are adding. If a table is present in the first code then we will also check for data in the table in other shards. If the other shards don't have that table then the check for data will fail with a "table not found" In addition, the We may want to consider a different approach that fixes all the issues mentioned above:
For a workflow with 1000s of tables and 1000s of shards this will imply a large number of queries. We may want to add a --no-validations flag to bypass these checks in such cases. But since this is run during the @noble, I suggest you wait before coding further on this until we reach an agreement on the final approach. This approach avoids the issue of size of query/number of joined tables in a query that we have with the union approach. So that will also have to be batched and a query with several tables being joined even with a union may have performance impacts which might reduce the effect of the order of magnitude increase in queries. |
The changes were moved to: #16874. Closing this one. |
Description
This PR adds a validation check for empty tables in the target keyspace while creating MoveTables workflow.
Related Issue(s)
Screenshot
Checklist
Deployment Notes