Skip to content

Update facet id generation to match mixer#485

Open
n-h-diaz wants to merge 4 commits intodatacommonsorg:masterfrom
n-h-diaz:facet
Open

Update facet id generation to match mixer#485
n-h-diaz wants to merge 4 commits intodatacommonsorg:masterfrom
n-h-diaz:facet

Conversation

@n-h-diaz
Copy link
Contributor

@n-h-diaz n-h-diaz commented Mar 4, 2026

Some internal teams using DC depend on stable facet ids after migrating to spanner and requested that we make them consistent with our existing backend

So this PR replicates the facet id generation currently in mixer in the new ingestion
https://github.com/datacommonsorg/mixer/blob/0618c1f3ef80703c98fc97f6c6c6e5cd3d7c13d3/internal/util/util.go#L497-L515

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses the need for stable facet IDs across different systems by aligning the facet ID generation in the new ingestion pipeline with the established logic in the Mixer backend. This change ensures that internal teams relying on consistent facet IDs after migrating to Spanner will experience seamless data integration and continuity.

Highlights

  • Consistent Facet ID Generation: Implemented a new utility class DataUtils to generate facet IDs using the FNV-1a 32-bit hash algorithm, replicating the logic from the existing Mixer backend to ensure consistency.
  • Observation Builder Update: Modified the Observation builder to utilize the newly introduced DataUtils.generateFacetId method for creating facet IDs, replacing the previous Objects.hash based approach.
  • New Test Coverage: Added a dedicated test class DataUtilsTest with parameterized tests to thoroughly validate the generateFacetId method against expected outputs.
  • Dependency Update: Included the JUnit dependency in pom.xml to support the new test cases.
Changelog
  • pipeline/data/pom.xml
    • Added JUnit dependency for testing purposes.
  • pipeline/data/src/main/java/org/datacommons/ingestion/data/DataUtils.java
    • Added new utility class DataUtils.
    • Implemented generateFacetId method to create consistent facet IDs using FNV-1a hash.
    • Included fnv1a32 private helper method for FNV-1a hash computation.
  • pipeline/data/src/main/java/org/datacommons/ingestion/data/Observation.java
    • Updated Observation builder to use DataUtils.generateFacetId for facet ID creation.
    • Removed the previous Objects.hash based facet ID generation logic.
  • pipeline/data/src/test/java/org/datacommons/ingestion/data/DataUtilsTest.java
    • Added new parameterized test class DataUtilsTest.
    • Included test cases to verify the correctness of DataUtils.generateFacetId.
  • pipeline/util/src/test/java/org/datacommons/ingestion/util/CacheReaderTest.java
    • Updated expected facet ID values in testParseTimeSeriesRow to reflect the new generation logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully updates the facet ID generation logic to align with the existing Mixer Go implementation. The introduction of the DataUtils class with the FNV-1a hash algorithm is a clear and direct approach to achieve consistency. The changes in Observation.java correctly integrate this new utility, and the addition of DataUtilsTest.java provides good test coverage for the new functionality. The updates to CacheReaderTest.java ensure that existing tests reflect the new ID generation. The comment regarding method parameter count is a valid general best practice for Java and has been retained. Overall, the changes are well-implemented and address the stated objective.

@n-h-diaz
Copy link
Contributor Author

n-h-diaz commented Mar 4, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to update the facet ID generation to align with an existing Go implementation in Mixer. While the intent is clear, I've identified two critical issues. Firstly, the new Java implementation for facet ID generation in DataUtils.java does not correctly replicate the Go logic, as it introduces separators where there should be none, which will lead to different hash results and defeats the purpose of this change. Secondly, in Observation.java, the hashCode() method has been removed while the equals() method is retained. This violates the fundamental Java Object contract and will cause problems when Observation objects are used in hash-based collections. I have provided detailed comments and suggestions for both issues.

@n-h-diaz n-h-diaz requested review from keyurva and vish-cs March 5, 2026 00:22
@keyurva
Copy link
Contributor

keyurva commented Mar 5, 2026

Some internal teams using DC depend on stable facet ids after migrating to spanner and requested that we make them consistent with our existing backend

Can you add some context on who is requesting this and why?

@n-h-diaz
Copy link
Contributor Author

n-h-diaz commented Mar 5, 2026

Some internal teams using DC depend on stable facet ids after migrating to spanner and requested that we make them consistent with our existing backend

Can you add some context on who is requesting this and why?

This request is from NL2SQL team since they use allowlisted facet ids (cc @gmechali who has more context)

@keyurva
Copy link
Contributor

keyurva commented Mar 7, 2026

Some internal teams using DC depend on stable facet ids after migrating to spanner and requested that we make them consistent with our existing backend

Can you add some context on who is requesting this and why?

This request is from NL2SQL team since they use allowlisted facet ids (cc @gmechali who has more context)

Let's discuss this. We want to avoid this level of coupling since it makes it difficult to make updates in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants