Skip to content

updated goldens for data release 2026-01-28#1750

Open
rohitkumarbhagat wants to merge 3 commits intodatacommonsorg:masterfrom
rohitkumarbhagat:data-release-20260128
Open

updated goldens for data release 2026-01-28#1750
rohitkumarbhagat wants to merge 3 commits intodatacommonsorg:masterfrom
rohitkumarbhagat:data-release-20260128

Conversation

@rohitkumarbhagat
Copy link
Contributor

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @rohitkumarbhagat, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the latest data release, dated 2026-01-28, by updating numerous golden files across the system. The changes primarily involve refreshing data points for various statistical variables, introducing new demographic and economic indicators, and performing necessary data cleanup and structural adjustments to ensure data accuracy and consistency. This update ensures that the system's data reflects the most current information available.

Highlights

  • Data Release Updates: Bigtable table names and BigQuery dataset versions have been updated to reflect the new 2026-01-28 data release.
  • Statistical Variable Additions and Removals: Numerous new statistical variables have been added, particularly related to demographics (e.g., COVID-19 infection, place of birth, languages spoken, foreign background, age groups) and economic activity (exports, imports, trade balance with USA by NAICS codes). Several outdated statistical variables related to male population with disabilities, monoracial population in group quarters, multiracial population, and native American/Hawaiian farmers have been removed.
  • Observation Data Adjustments: Latest observation dates and values for unemployment and population across various countries and US states have been updated. Historical health insurance data for California has been added, and various health-related percentages (obesity, physical inactivity, high blood pressure, mental health, smoking) have been adjusted.
  • Metadata and Schema Enhancements: Statistical variable counts for various groups have been updated, new census data summaries for Bulgaria, NITI India Population Projection, and Poland have been integrated, and place types have been added or refined for several geographical entities. A new 'tradePartner' property has been introduced in the schema.
  • Data Consistency and Corrections: Minor data corrections include adjusting entity counts for specific facets, correcting a typo in 'Maxillofacial Studies', and standardizing capitalization for 'Dengue shock syndrome'.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily updates various data points and configurations across several files. Key changes include updating Bigtable table names and BigQuery dataset versions with newer dates, adjusting unemployment rate ranks and values in golden files, and correcting a typo in a statistical variable name. Numerous new statistical variables have been added, particularly related to COVID-19, population demographics (e.g., place of birth, languages spoken, foreign background, age groups), and economic activity (exports, imports, trade balance with USA, categorized by NAICS codes). Conversely, several older statistical variables related to male, monoracial, multiracial, and Native American/Hawaiian populations have been removed. The descendentStatVarCount values have been updated across various variable groups (Economy, Health, Demographics, Education, Environment, Housing, Uncategorized, Weather) to reflect these additions and removals. New census data for Bulgaria, NITI India Population Projection, and Poland Census have been integrated, leading to updates in placeCount, observationCount, and timeSeriesCount for various place types and facets. Additionally, several geoId entries were removed from triple golden files, and importTime was updated. The tradePartner property was added to place_type.json. Finally, many observation dates and corresponding values for various entities (countries, US states, counties) have been updated to reflect more recent data, with specific reviewer comments highlighting changes in placeCount, observationCount, latestDate, and specific date/value updates for unemployment data.

I am having trouble creating individual review comments. Click here to see my feedback.

internal/server/v0/triple/golden/get_triples/limit.json (995-1022)

high

These entries seem to be removed. Please confirm if this was intentional.

internal/server/v0/triple/golden/get_triples/limit.json (1082-1109)

high

These entries seem to be removed. Please confirm if this was intentional.

internal/server/v1/observations/golden/bulk_point/all_latest.json (373-374)

medium

The date has been updated from '2025-09' to '2025-12' and the value has been updated from 7603000 to 7503000. Please confirm if this was intentional.

internal/server/v1/observations/golden/bulk_point/preferred_latest.json (193-194)

medium

The date has been updated from '2025-09' to '2025-12' and the value has been updated from 7603000 to 7503000. Please confirm if this was intentional.

internal/server/v1/info/golden/bulk_variable_info/bulk_bt_and_sql.json (27)

medium

The placeCount has been updated from 401 to 416. Please confirm if this was intentional.

internal/server/v1/info/golden/bulk_variable_info/bulk_result.json (27)

medium

The placeCount has been updated from 401 to 416. Please confirm if this was intentional.

internal/server/v1/info/golden/bulk_variable_info/bulk_result.json (4453)

medium

The observationCount has been updated from 181550 to 172489. Please confirm if this was intentional.

internal/server/v1/info/golden/bulk_variable_info/bulk_result.json (4498-4499)

medium

The observationCount has been updated from 3137 to 2930. Please confirm if this was intentional.

internal/server/v1/info/golden/bulk_variable_info/bulk_result.json (4881)

medium

The latestDate has been updated from 2023 to 2024. Please confirm if this was intentional.

},
{
"name": "Count of person with bachelor's degree in oral andsetMaxillofacial Studies",
"name": "Count of person with bachelor's degree in oral and Maxillofacial Studies",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be lower case

Suggested change
"name": "Count of person with bachelor's degree in oral and Maxillofacial Studies",
"name": "Count of person with bachelor's degree in oral and maxillofacial studies",

not sure where upstream this needs to be fixed

},
{
"name": "Percentage of persons aged 14 years or younger or aged 65 years or older",
"dcid": "Count_Person_YearsUpto14OrYears65Onwards_AsAFractionOf_Count_Person_15To64Years"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems inconsistent with the name -- should the denominator be Count_Person? not Count_Person_15To64Years?
If not, the name needs to be updated to be more clear of what this statvar represents

},
{
"name": "Percentage of persons who are in other religious groups in Finland",
"dcid": "Count_Person_FINOtherReligiousGroup_AsAFractionOf_Count_Person"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this statvar have a location in it? should the statvar just be percentage of persons in other religious group and then the observations are for finland?
is it possible to have observations for this statvar in locations other than finland?

"dcid": "Count_Person_LutheranChurchMissouriSynod_AsAFractionOf_Count_Person"
},
{
"name": "Percentage of persons who are not citizens of Finland",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"name": "Percentage of persons who are not citizens of Finland",
"name": "Percentage of persons who are not citizens",

"dcid": "Count_Person_Male_NativeHawaiianOrOtherPacificIslanderAlone"
},
{
"name": "Male Population With Disabilities",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these getting deleted?

"dcid": "dc/e351bke5y8c75"
},
{
"name": "Count of pregnant women, male, condition hepatitis B",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this name be clarified? I don't really know what this means

"dcid": "Count_Person_Female_WithModeratelyElevatedBloodPressureOrSeverelyElevatedBloodPressure_AsAFractionOf_Count_Person_Female"
},
{
"name": "Count of Pregnant Women, 10 To 19 Years, Male, Condition Hepatitis B",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, the names of the statvars in this chunk need to be clarified -- is the gender of the baby male? or is it that the pregnant person identifies as male? or neither?

"Count_Person_7To14Years_Employed_AsFractionOf_Count_Person_7To14Years",
"Count_Person_7To14Years_Female_Employed_AsFractionOf_Count_Person_7To14Years_Female",
"Count_Person_7To14Years_Male_Employed_AsFractionOf_Count_Person_7To14Years_Male",
"Count_Person_COVID19",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -30662,6 +30662,4006 @@
"dcid": "grid_1/42_-123",
"provenanceId": "dc/base/TideGaugeStations"
},
{
"name": "\"PINE GROVE YOUTH CONSERVATION CAMP\"",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does name have double quotes and all caps?

"provenanceId": "dc/base/BaseSchema"
},
{
"name": "tradePartner",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we add new nodes like this to the graph, is it possible to add more description / information about what this constraint property really represents?

@clincoln8
Copy link
Contributor

Hi! I don't know how these goldens are used / what exactly these tests are for, but I've added some comments just based on what I noticed.

@clincoln8
Copy link
Contributor

When we do data releases, is it possible to add what this data release contains in the PR description (new / refreshed datasets, etc) and a summary of the expected diffs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants