Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(meta): simplify stream job table/source id assignment #19171

Merged
merged 3 commits into from
Nov 4, 2024

Conversation

xxchan
Copy link
Member

@xxchan xxchan commented Oct 29, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

We fill table_id/source_id in several places, which seems unnecessary. This PR tries to remove unnecessary codes to make it more understandable and maintainable.

We leave it with 2 steps:

  1. In create_job_catalog, we assign id to StreamingJob
  2. In StreamFragmentGraph::new (fill_job), we traverse the nodes, and set the id to corresponding plan nodes. (Note that previously we traverse the nodes multiple times)

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Copy link
Member Author

xxchan commented Oct 29, 2024

@xxchan xxchan changed the title - table with connector: filled when creating job catalog https://github.com/risingwavelabs/risingwave/blob/193e93fd8d9f9dbae717fe6a5b411e7f33382f27/src/meta/src/controller/streaming_job.rs#L247-L251 - stream node: filled in fill_job refactor(meta): simplify create stream job Oct 29, 2024
@xxchan xxchan marked this pull request as ready for review October 29, 2024 07:57
@xxchan xxchan force-pushed the xxchan/desirable-trout branch 2 times, most recently from fa620ec to 9f4c672 Compare October 29, 2024 14:23
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename table -> job

@xxchan xxchan changed the base branch from xxchan/desirable-trout to xxchan/format-encode October 29, 2024 16:01
Base automatically changed from xxchan/format-encode to main October 30, 2024 01:59
@xxchan xxchan mentioned this pull request Oct 30, 2024
9 tasks
@xxchan xxchan force-pushed the xxchan/refactor-ddl branch 2 times, most recently from d43598b to 0012e27 Compare November 4, 2024 05:50
Comment on lines 159 to 184
NodeBody::Source(source_node) => {
match job {
// Note: For table without connector, it has a dumb Source node.
// Note: For table with connector, it's source node has a source id different with the table id (job id), assigned in create_job_catalog.
StreamingJob::Table(source, _table, _table_job_type) => {
if let Some(source_inner) = source_node.source_inner.as_mut() {
if let Some(source) = source {
debug_assert_ne!(source.id, job_id);
source_inner.source_id = source.id;
}
}
}
StreamingJob::Source(source) => {
has_job = true;
if let Some(source_inner) = source_node.source_inner.as_mut() {
debug_assert_eq!(source.id, job_id);
source_inner.source_id = source.id;
}
}
// For other job types, no need to fill the source id, since it refers to an existing source.
_ => {}
}
}
NodeBody::StreamCdcScan(node) => {
if let Some(table_desc) = node.cdc_table_desc.as_mut() {
table_desc.table_id = job_id;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces the code in create_streaming_job

Comment on lines -921 to -936
match &mut streaming_job {
StreamingJob::Table(src, table, job_type) => {
// If we're creating a table with connector, we should additionally fill its ID first.
fill_table_stream_graph_info(src, table, *job_type, &mut fragment_graph);
}
StreamingJob::Source(src) => {
// set the inner source id of source node.
for fragment in fragment_graph.fragments.values_mut() {
visit_fragment(fragment, |node_body| {
if let NodeBody::Source(source_node) = node_body {
source_node.source_inner.as_mut().unwrap().source_id = src.id;
}
});
}
}
_ => {}
Copy link
Member Author

@xxchan xxchan Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do it in fill_job instead. (They already have some overlaps)

Comment on lines -87 to -99
fn extract_replace_table_info(change: ReplaceTablePlan) -> ReplaceTableInfo {
let job_type = change.get_job_type().unwrap_or_default();
let mut source = change.source;
let mut fragment_graph = change.fragment_graph.unwrap();
let mut table = change.table.unwrap();
if let Some(OptionalAssociatedSourceId::AssociatedSourceId(source_id)) =
table.optional_associated_source_id
{
source.as_mut().unwrap().id = source_id;
fill_table_stream_graph_info(&mut source, &mut table, job_type, &mut fragment_graph);
}
let table_col_index_mapping = change
.table_col_index_mapping
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: extract_replace_table_info is used in: create/drop sink & table schema change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some work is done in fill_job, and some is done in frontend generate_stream_graph_for_replace_table

Comment on lines +1453 to +1454
source.as_mut().unwrap().optional_associated_table_id =
Some(OptionalAssociatedTableId::AssociatedTableId(table.id))
Copy link
Member Author

@xxchan xxchan Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's strange not to set optional_associated_table_id here.


Note: for replace table, the Source catalog here were both created again. It looks error-prone, but do not intend to change it.

Signed-off-by: xxchan <xxchan22f@gmail.com>
@xxchan xxchan changed the title refactor(meta): simplify create stream job refactor(meta): simplify stream job id assignment Nov 4, 2024
src/meta/src/stream/stream_graph/fragment.rs Outdated Show resolved Hide resolved
@xxchan xxchan changed the title refactor(meta): simplify stream job id assignment refactor(meta): simplify stream job table/source id assignment Nov 4, 2024
Copy link
Member

@stdrc stdrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the work done in this PR👍

Co-authored-by: Bugen Zhao <i@bugenzhao.com>
@xxchan xxchan enabled auto-merge November 4, 2024 09:49
@xxchan xxchan added this pull request to the merge queue Nov 4, 2024
Merged via the queue into main with commit a9f8945 Nov 4, 2024
33 of 34 checks passed
@xxchan xxchan deleted the xxchan/refactor-ddl branch November 4, 2024 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants