Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend parse callbacks to expose discovered composite types and aliases #2658

Merged
merged 2 commits into from
Dec 2, 2024

Conversation

mxyns
Copy link
Contributor

@mxyns mxyns commented Oct 8, 2023

this is my attempt at #2657

example use case:
this allowed me to build a library that discovers the bindings (for composite type) generated by bindgen (https://github.com/mxyns/bindgen-bridge)

i tried to expose as little hidden internal structs as possible (like passing ItemId as a usize) but I did not feel like duplicating the definition of CompType was a good idea so I just made it public

@mxyns mxyns force-pushed the extended-callbacks branch 3 times, most recently from 9aba819 to e1aa4bf Compare October 8, 2023 13:22
@mxyns mxyns force-pushed the extended-callbacks branch from 75188da to e1aa4bf Compare January 19, 2024 12:21
@pvdrz
Copy link
Contributor

pvdrz commented Aug 30, 2024

Hey! Thanks for the PR and sorry for the delay on feedback. I think this feature is reasonable but there are some things I'd like to discuss first:

  • I'm not sure if exposing the ID of each item is a good idea as any change on those IDs could cause unexpected breaking changes for folks that use this feature and rely heavily on those IDs in one way or another. The most extreme version of it would be someone literally checking if an item has a specific index and doing something accordingly.
  • Maybe those two methods could be combined in a single one that uses a non-exhaustive enum to dispatch between structs, enums, unions and type aliases. In that way, we can extend this in the future to other items that folks might want to map as well, such as functions or constants.
  • We try as much as possible to not expose types from other crates into our public API, so that Ident you're exposing would ideally be an String or &str.
  • This feature needs some tests.

Thanks again 👍

@mxyns
Copy link
Contributor Author

mxyns commented Sep 6, 2024

Hey!

  • I'm not sure if exposing the ID of each item is a good idea as any change on those IDs could cause unexpected breaking changes for folks that use this feature and rely heavily on those IDs in one way or another. The most extreme version of it would be someone literally checking if an item has a specific index and doing something accordingly.

The goal of the ID in these callbacks is to find links aliases to discovered types. Without that, the aliases are way less useful as we don't know which type they are an alias of... My use case heavily relies on the IDs, but does not assume stable IDs. I basically build a map that uses the IDs as a key and that records the types discovered and their associated aliases.

I could add in the documentation of the callbacks that IDs may be unstable and that the user should not expect them to be consistent across different bindgen runs. This would avoid people checking for hardcoded values of the IDs while still allowing us to find aliases and their aliased type. What do you think?

  • Maybe those two methods could be combined in a single one that uses a non-exhaustive enum to dispatch between structs, enums, unions and type aliases. In that way, we can extend this in the future to other items that folks might want to map as well, such as functions or constants.

Do you mean something that would look like this:

pub enum DiscoveredItem {
        Struct {
                original_name: Option<&str>,
                final_ident: &str
                // + whatever people want to expose here
        },
        Union {
                original_name: Option<&str>,
                final_ident: &str,
                // + whatever people want to expose here
        },
        Alias {
                alias_name: &str,
                alias_for: usize // if we decide to keep IDs
        }
        // functions, modules, etc.
}

// The actual callback
fn new_item_found(
        &self,
        id: usize, // if we decide to keep it
        item: DiscoveredItem
    ) 

I could do that although I am a bit scared that the DiscoveredItem enum becomes really large as people add more and more information into it.

  • We try as much as possible to not expose types from other crates into our public API, so that Ident you're exposing would ideally be an String or &str.

Sure, I will change this :)

  • This feature needs some tests.

Agreed, will look into it when we agree on the final code :)

@pvdrz
Copy link
Contributor

pvdrz commented Sep 7, 2024

The goal of the ID in these callbacks is to find links aliases to discovered types. Without that, the aliases are way less useful as we don't know which type they are an alias of... My use case heavily relies on the IDs, but does not assume stable IDs. I basically build a map that uses the IDs as a key and that records the types discovered and their associated aliases.

But you still have the original C/C++ name of the type, right? Although name-based logic is usually easy to break, specially when some types don't even have a name 🤔

I could add in the documentation of the callbacks that IDs may be unstable and that the user should not expect them to be consistent across different bindgen runs. This would avoid people checking for hardcoded values of the IDs while still allowing us to find aliases and their aliased type. What do you think?

I'm not a big fan of policies as they are prone to the spacebar heating problem where someone doesn't read the policy and comes back with an angry issue complaning that we broke their workflow because we changed the way IDs are generated. Maybe we can meet halfway and add an opaque type that wraps the raw integer ID, something like:

// FIXME: name is a work-in-progress
#[derive(PartialEq, Eq, Hash)]
pub struct DiscoveredItemId(usize);

In that way, we only expose the bare minimum that we should guarantee: "two IDs are the same if they point back to the same C/C++ type"

I could do that although I am a bit scared that the DiscoveredItem enum becomes really large as people add more and more information into it.

Yeah that's an issue but if at some point one of the variants become way larger than the rest, we could add the extra fields for that variant into a Box<ExtraVariantInfo> field to mitigate it.

What I want to avoid is creating several methods with the same functionality and having to keep their naming and documentation consistent.

@mxyns
Copy link
Contributor Author

mxyns commented Sep 8, 2024

But you still have the original C/C++ name of the type, right? Although name-based logic is usually easy to break, specially when some types don't even have a name 🤔

Right, not a fan of name based as well. Also I never tried with C++ but I imagine the namespace is included in the struct name ?

I'm not a big fan of policies as they are prone to the spacebar heating problem where someone doesn't read the policy and comes back with an angry issue complaning that we broke their workflow because we changed the way IDs are generated. Maybe we can meet halfway and add an opaque type that wraps the raw integer ID, something like:

// FIXME: name is a work-in-progress
#[derive(PartialEq, Eq, Hash)]
pub struct DiscoveredItemId(usize);

Works for me 🙂

Yeah that's an issue but if at some point one of the variants become way larger than the rest, we could add the extra fields for that variant into a Box<ExtraVariantInfo> field to mitigate it.

Okay let's do that.

I'll work on it tomorrow and push an update.

@mxyns mxyns force-pushed the extended-callbacks branch 2 times, most recently from e073105 to 68bcced Compare September 9, 2024 12:51
@mxyns
Copy link
Contributor Author

mxyns commented Sep 9, 2024

Hi @pvdrz
I think I'm done with the core code of this PR. Let me know what you think :)

Now, for the testing, I'm not really sure how to proceed. The testing system for callbacks seems to be based on comments at the top of the tests cases that are parsed to apply the callbacks. Here the callback does not impact the generated code but rather exports information about it.

Using this system would require to hard code the expected result in the comments and quite a lot of parsing to make it work with this specific callback. As the enum get extended with more fields for each variant, it will quickly become a nightmare for people working on / maintaining it.

Should I just make a standalone test in bindgen-tests/tests/parse_callbacks/test_item_discovery_callback.rs that works on a single sample header file and that ensures the discovery works?

@mxyns mxyns force-pushed the extended-callbacks branch 3 times, most recently from 251cac7 to bf34530 Compare September 9, 2024 15:21
@mxyns
Copy link
Contributor Author

mxyns commented Sep 9, 2024

I added an example of what I think tests for this feature could look like. I am not sure if the names like _bindgen_ty_1 are guaranteed to be stable so the name checking allows using regexes like _bindgen_ty_*.

I'm not sure if the test is good enough since it is a small feature. We could probably add this check to all the other header comparison tests if we generate the expected result for all of them but I'd need a bit of help on how to integrate that in your testing system since I'm not accustomed to it.

@pvdrz pvdrz force-pushed the extended-callbacks branch from cfab753 to ebfcb44 Compare December 2, 2024 03:09
@pvdrz pvdrz enabled auto-merge December 2, 2024 03:11
@pvdrz pvdrz added this pull request to the merge queue Dec 2, 2024
Merged via the queue into rust-lang:main with commit b23d978 Dec 2, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants