Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up CodeGenerator by moving compilation-global data and logic to Context #1190

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mzabaluev
Copy link
Contributor

@mzabaluev mzabaluev commented Nov 19, 2024

Refactoring improvements to prost-build:

  • Move compilation-global context data and logic around it from CodeGenerator to a new Context struct.
  • Lift similar logic into Context from MessageGraph, where it caused some data duplication.
  • Resolve some awkwardness about the Config reference needing to be mutable for its user-provided service generator, by exposing both from Context in a more disciplined way.

This also fixes a bug in the logic that determines whether Copy can be derived on a message, where a boxed config rule matching a field that is part of a oneof would be disregarded.

@mzabaluev
Copy link
Contributor Author

mzabaluev commented Nov 19, 2024

I would like this to be merged before I add an enum type registry, which is necessary to properly implement imports (and later with editions, the enum_type feature) for #1079.

Copy link
Collaborator

@caspermeijn caspermeijn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see plenty of internal changes that I like: better naming, cleanup of boxed, more helper functions.

  • Is it possible to separate those from the API changes? Then we can that merge that before doing a version number bump
  • Is it possible to separate each improvement into a separate commit? That will make reviewing easier.

buf: &mut String,
) {
impl<'a, 'b> CodeGenerator<'a, 'b> {
pub fn generate(context: &mut Context<'b>, file: FileDescriptorProto, buf: &mut String) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an API breaking change, right? I don't understand how the Context instance should be created by prost users as there is no documentation changes included.

Copy link
Contributor Author

@mzabaluev mzabaluev Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, CodeGenerator is private to the crate, as is the newly added Context.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see: Config::generate is public, but CodeGenerator is private. Can you confirm that no breaking change happened using cargo semver-checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I also think it would help to make this method pub(crate) to alleviate any confusion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes pub(crate) is a great idea

pub struct CodeGenerator<'a> {
config: &'a mut Config,
pub struct CodeGenerator<'a, 'b> {
context: &'a mut Context<'b>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why some fields are in CodeGenerator, Context or Config. For example, depth also feels like it could be part of Context.

Can you explain when a field should be in each of those structs?

Copy link
Contributor Author

@mzabaluev mzabaluev Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config is the public settings configured by the API user.

Context is a private helper that captures data global to the compilation requests: the message graph, the map of extern paths, and the reference to the Config. I also plan to add a map of enum openness flags for a proper implementation of open enums in #1079.
It is populated before code generation and the instance is shared between successive calls to CodeGenerator::generate for the requested members. This replaces several arguments that did the same thing with the disparate data structures.

The owned members of CodeGenerator manage state specific to code generation for a proto file. So depth belongs there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Context fields are used for multiple files and CodeGenerator fields are discarded after a single file. That makes sense.

Can you add this explanation as a doc comment to the types?

prost-build/src/context.rs Show resolved Hide resolved
pub struct CodeGenerator<'a> {
config: &'a mut Config,
pub struct CodeGenerator<'a, 'b> {
context: &'a mut Context<'b>,
Copy link
Contributor Author

@mzabaluev mzabaluev Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config is the public settings configured by the API user.

Context is a private helper that captures data global to the compilation requests: the message graph, the map of extern paths, and the reference to the Config. I also plan to add a map of enum openness flags for a proper implementation of open enums in #1079.
It is populated before code generation and the instance is shared between successive calls to CodeGenerator::generate for the requested members. This replaces several arguments that did the same thing with the disparate data structures.

The owned members of CodeGenerator manage state specific to code generation for a proto file. So depth belongs there.

prost-build/src/context.rs Show resolved Hide resolved
prost-build/src/context.rs Show resolved Hide resolved
prost-build/src/context.rs Show resolved Hide resolved
@mzabaluev mzabaluev changed the title Clean up CodeGenerator by moving global data and code to Context Clean up CodeGenerator by moving compilation-global data and logic to Context Nov 20, 2024
Move global context data and logic around it from CodeGenerator
to a new Context struct. Also move the logic of can_message_derive_copy
and can_field_derive_copy from MessageGraph where it caused some
data duplication.
Copy link
Collaborator

@caspermeijn caspermeijn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every line is improved, but I think this is an improvement overall. I have a few questions that I like to be answered before merging.

pub struct CodeGenerator<'a> {
config: &'a mut Config,
pub struct CodeGenerator<'a, 'b> {
context: &'a mut Context<'b>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Context fields are used for multiple files and CodeGenerator fields are discarded after a single file. That makes sense.

Can you add this explanation as a doc comment to the types?

buf: &mut String,
) {
impl<'a, 'b> CodeGenerator<'a, 'b> {
pub fn generate(context: &mut Context<'b>, file: FileDescriptorProto, buf: &mut String) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see: Config::generate is public, but CodeGenerator is private. Can you confirm that no breaking change happened using cargo semver-checks?

self.buf.push_str(&format!(
"impl {}::Name for {} {{\n",
self.config.prost_path.as_deref().unwrap_or("::prost"),
"impl {prost_path}::Name for {} {{\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. It improves readability.

@@ -297,15 +285,16 @@ impl CodeGenerator<'_> {
self.pop_mod();
}

if self.config.enable_type_names {
if self.context.config().enable_type_names {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a decrease in readability. It adds an extra layer .context and an extra indirection config().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a .config() helper method to CodeGenerator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it like this for now. I wanted to point to a concrete example where I think readability suffered. But I think this PR is a net positive.

.can_field_derive_copy(fq_message_name, &field.descriptor)
self.context.can_oneof_field_derive_copy(
fq_message_name,
oneof.descriptor.name(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this needs the oneof name, while it previously didn't need that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the discussion on the Context::can_field_derive_copy_impl method.

prost-build/src/context.rs Show resolved Hide resolved
self.config
}

pub fn service_generator(&mut self) -> Option<&mut (dyn ServiceGenerator + 'static)> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should be called service_generator_mut to indicate it returns a mutable reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is never a need to obtain a non-mutable reference, so I felt that a _mut suffix would be an unnecessary barb. These methods are private to the crate so they can be renamed at any time if needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking this is the only place where a mut reference is taken (as is the stated intention of the PR). So marking that spot makes it easier to find in a month time.

Comment on lines +216 to +235
if self
.message_graph
.is_nested(field.type_name(), fq_message_name)
{
return false;
}
let config_path = match oneof {
None => Cow::Borrowed(fq_message_name),
Some(oneof_name) => Cow::Owned(format!("{fq_message_name}.{oneof_name}")),
};
if self
.config
.boxed
.get_first_field(&config_path, field.name())
.is_some()
{
false
} else {
self.can_message_derive_copy(field.type_name())
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic has changed compared to MessageGraph::can_field_derive_copy. I don't expect that in a PR that mentions just to move code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic currently in master has a bug: it would fail to match a boxed rule that named a oneof field, because it did not take the oneof part of the path into consideration. I will add a mention to the PR description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a regression test to prove this implements correct behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For comparison, check the logic in Context::should_box_oneof_field which I did not modify from how it works in the baseline.

Copy link
Contributor Author

@mzabaluev mzabaluev Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the matching behavior for fields in a oneof is undocumented, and logically, all named fields in a oneof are still part of the enclosing message's field namespace and don't need to be qualified with the oneof's name. So the behavior matching for boxed field paths in the baseline MessageGraph::can_field_derive_copy may in fact be the one expected by the user.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there could be a problem here and a regression test is useful. However, this PR should not change any behavior. It should only reorganize the code. Please separate the changed behavior into a separate PR so that we can discuss that in isolation.

My primary concern is that this PR should not introduce a breaking change accidentally. Because it is a lot of code changes, it is difficult to review. If additional changes are slipped in, that makes it even more difficult to review.

} else {
self.config
.disable_comments
.get_first(fq_message_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PathMap::get_first is subtly different from PathMap::get and then calling .next(). Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Would it behave differently in any cases?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not dug into that. I just see that the code does something different 😉

I suggest using the old code and opening another (smaller) PR for adding tests to prove noting changed.

It was not obvious immediately from the declaration that
CodeGenerator is private to the crate.
Add doc comments clarifying the lifetime of `Context` and
`CodeGenerator` in the code generation process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants