-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up CodeGenerator
by moving compilation-global data and logic to Context
#1190
base: master
Are you sure you want to change the base?
Conversation
205a126
to
7a01e89
Compare
I would like this to be merged before I add an enum type registry, which is necessary to properly implement imports (and later with editions, the |
7a01e89
to
75d81e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see plenty of internal changes that I like: better naming, cleanup of boxed, more helper functions.
- Is it possible to separate those from the API changes? Then we can that merge that before doing a version number bump
- Is it possible to separate each improvement into a separate commit? That will make reviewing easier.
prost-build/src/code_generator.rs
Outdated
buf: &mut String, | ||
) { | ||
impl<'a, 'b> CodeGenerator<'a, 'b> { | ||
pub fn generate(context: &mut Context<'b>, file: FileDescriptorProto, buf: &mut String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an API breaking change, right? I don't understand how the Context
instance should be created by prost
users as there is no documentation changes included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, CodeGenerator
is private to the crate, as is the newly added Context
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see: Config::generate
is public, but CodeGenerator
is private. Can you confirm that no breaking change happened using cargo semver-checks
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. I also think it would help to make this method pub(crate)
to alleviate any confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes pub(crate)
is a great idea
pub struct CodeGenerator<'a> { | ||
config: &'a mut Config, | ||
pub struct CodeGenerator<'a, 'b> { | ||
context: &'a mut Context<'b>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why some fields are in CodeGenerator
, Context
or Config
. For example, depth
also feels like it could be part of Context
.
Can you explain when a field should be in each of those structs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Config
is the public settings configured by the API user.
Context
is a private helper that captures data global to the compilation requests: the message graph, the map of extern paths, and the reference to the Config
. I also plan to add a map of enum openness flags for a proper implementation of open enums in #1079.
It is populated before code generation and the instance is shared between successive calls to CodeGenerator::generate
for the requested members. This replaces several arguments that did the same thing with the disparate data structures.
The owned members of CodeGenerator
manage state specific to code generation for a proto file. So depth
belongs there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Context
fields are used for multiple files and CodeGenerator
fields are discarded after a single file. That makes sense.
Can you add this explanation as a doc comment to the types?
pub struct CodeGenerator<'a> { | ||
config: &'a mut Config, | ||
pub struct CodeGenerator<'a, 'b> { | ||
context: &'a mut Context<'b>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Config
is the public settings configured by the API user.
Context
is a private helper that captures data global to the compilation requests: the message graph, the map of extern paths, and the reference to the Config
. I also plan to add a map of enum openness flags for a proper implementation of open enums in #1079.
It is populated before code generation and the instance is shared between successive calls to CodeGenerator::generate
for the requested members. This replaces several arguments that did the same thing with the disparate data structures.
The owned members of CodeGenerator
manage state specific to code generation for a proto file. So depth
belongs there.
CodeGenerator
by moving compilation-global data and logic to Context
Move global context data and logic around it from CodeGenerator to a new Context struct. Also move the logic of can_message_derive_copy and can_field_derive_copy from MessageGraph where it caused some data duplication.
75d81e1
to
1218d3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not every line is improved, but I think this is an improvement overall. I have a few questions that I like to be answered before merging.
pub struct CodeGenerator<'a> { | ||
config: &'a mut Config, | ||
pub struct CodeGenerator<'a, 'b> { | ||
context: &'a mut Context<'b>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Context
fields are used for multiple files and CodeGenerator
fields are discarded after a single file. That makes sense.
Can you add this explanation as a doc comment to the types?
prost-build/src/code_generator.rs
Outdated
buf: &mut String, | ||
) { | ||
impl<'a, 'b> CodeGenerator<'a, 'b> { | ||
pub fn generate(context: &mut Context<'b>, file: FileDescriptorProto, buf: &mut String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see: Config::generate
is public, but CodeGenerator
is private. Can you confirm that no breaking change happened using cargo semver-checks
?
self.buf.push_str(&format!( | ||
"impl {}::Name for {} {{\n", | ||
self.config.prost_path.as_deref().unwrap_or("::prost"), | ||
"impl {prost_path}::Name for {} {{\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. It improves readability.
prost-build/src/code_generator.rs
Outdated
@@ -297,15 +285,16 @@ impl CodeGenerator<'_> { | |||
self.pop_mod(); | |||
} | |||
|
|||
if self.config.enable_type_names { | |||
if self.context.config().enable_type_names { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a decrease in readability. It adds an extra layer .context
and an extra indirection config()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a .config()
helper method to CodeGenerator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave it like this for now. I wanted to point to a concrete example where I think readability suffered. But I think this PR is a net positive.
.can_field_derive_copy(fq_message_name, &field.descriptor) | ||
self.context.can_oneof_field_derive_copy( | ||
fq_message_name, | ||
oneof.descriptor.name(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this needs the oneof name, while it previously didn't need that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the discussion on the Context::can_field_derive_copy_impl
method.
self.config | ||
} | ||
|
||
pub fn service_generator(&mut self) -> Option<&mut (dyn ServiceGenerator + 'static)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should be called service_generator_mut
to indicate it returns a mutable reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is never a need to obtain a non-mutable reference, so I felt that a _mut
suffix would be an unnecessary barb. These methods are private to the crate so they can be renamed at any time if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking this is the only place where a mut
reference is taken (as is the stated intention of the PR). So marking that spot makes it easier to find in a month time.
if self | ||
.message_graph | ||
.is_nested(field.type_name(), fq_message_name) | ||
{ | ||
return false; | ||
} | ||
let config_path = match oneof { | ||
None => Cow::Borrowed(fq_message_name), | ||
Some(oneof_name) => Cow::Owned(format!("{fq_message_name}.{oneof_name}")), | ||
}; | ||
if self | ||
.config | ||
.boxed | ||
.get_first_field(&config_path, field.name()) | ||
.is_some() | ||
{ | ||
false | ||
} else { | ||
self.can_message_derive_copy(field.type_name()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic has changed compared to MessageGraph::can_field_derive_copy
. I don't expect that in a PR that mentions just to move code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic currently in master
has a bug: it would fail to match a boxed
rule that named a oneof field, because it did not take the oneof part of the path into consideration. I will add a mention to the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a regression test to prove this implements correct behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For comparison, check the logic in Context::should_box_oneof_field
which I did not modify from how it works in the baseline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the matching behavior for fields in a oneof is undocumented, and logically, all named fields in a oneof are still part of the enclosing message's field namespace and don't need to be qualified with the oneof's name. So the behavior matching for boxed
field paths in the baseline MessageGraph::can_field_derive_copy
may in fact be the one expected by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree there could be a problem here and a regression test is useful. However, this PR should not change any behavior. It should only reorganize the code. Please separate the changed behavior into a separate PR so that we can discuss that in isolation.
My primary concern is that this PR should not introduce a breaking change accidentally. Because it is a lot of code changes, it is difficult to review. If additional changes are slipped in, that makes it even more difficult to review.
} else { | ||
self.config | ||
.disable_comments | ||
.get_first(fq_message_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PathMap::get_first
is subtly different from PathMap::get
and then calling .next()
. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Would it behave differently in any cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not dug into that. I just see that the code does something different 😉
I suggest using the old code and opening another (smaller) PR for adding tests to prove noting changed.
It was not obvious immediately from the declaration that CodeGenerator is private to the crate.
Add doc comments clarifying the lifetime of `Context` and `CodeGenerator` in the code generation process.
Refactoring improvements to prost-build:
CodeGenerator
to a newContext
struct.Context
fromMessageGraph
, where it caused some data duplication.Config
reference needing to be mutable for its user-provided service generator, by exposing both fromContext
in a more disciplined way.This also fixes a bug in the logic that determines whether
Copy
can be derived on a message, where aboxed
config rule matching a field that is part of a oneof would be disregarded.