Support inferring schemas from Python dataclasses#37728
Support inferring schemas from Python dataclasses#37728Abacn wants to merge 3 commits intoapache:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances Apache Beam's type hinting capabilities by introducing comprehensive support for Python dataclasses. It enables the system to recognize, convert, and infer schemas from dataclasses, integrating them seamlessly into Beam's data processing model, similar to how NamedTuples are handled. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
4099aa8 to
f58f026
Compare
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
assign set of reviewers |
|
Assigning reviewers: R: @claudevdm for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
claudevdm
left a comment
There was a problem hiding this comment.
This only applies when the typehint is beam.Row right? Just want to be sure we arent changing default coders in update incompatible way.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for inferring schemas from Python dataclasses, which is a great enhancement. The changes are logical and include good test coverage for the new functionality. I've found one area for improvement regarding code duplication, which could enhance maintainability.
Yes, the entrypoint of the change being effective is Previously, same config results in TypeError (shown in PR description), therefore no upgrade compatibility concern. There was a change in native_type_compatibility.py added dataclass into type_map that might change typehint of existing dataclasses 0aa835d#diff-3f6570dbd8f6850deb466f36b7b7983e9b5c1adaec3810dfe672e108188a35ebR426 Further testing shows this change was actually dead code. Dataclass will hit this branch: and return before iterating over type_map. Reverted in the latest commit. |
| expected.row_type.schema.fields, | ||
| typing_to_runner_api(MyCuteClass).row_type.schema.fields) | ||
|
|
||
| def test_trivial_example_dataclass(self): |
There was a problem hiding this comment.
What do you think about adding a test that does beam.Create([MyCuteDataclass()]).wit_output_types(beam.Row) -> Reshuffle (the results should be named tuples)?
And also a test that beam.Create([MyCuteDataclass()]).wit_output_types(beam.Row) -> Reshuffle the resulting type is still the original dataclass?
There was a problem hiding this comment.
Thanks, added unit tests for both named tuple and dataclasses (they are expected to resulting in same generated user type passing through GBK)
Noticing current schemas_test unit tests does not involve pipeline tests. Created a row_type_test to settle these tests
… passing through GBK
Fix #22085
Tested with JdbcIO write using dataclass as input type and worked. Previously would get
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.