updated goldens for data release 2026-01-28#1750
updated goldens for data release 2026-01-28#1750rohitkumarbhagat wants to merge 3 commits intodatacommonsorg:masterfrom
Conversation
Summary of ChangesHello @rohitkumarbhagat, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates the latest data release, dated 2026-01-28, by updating numerous golden files across the system. The changes primarily involve refreshing data points for various statistical variables, introducing new demographic and economic indicators, and performing necessary data cleanup and structural adjustments to ensure data accuracy and consistency. This update ensures that the system's data reflects the most current information available. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request primarily updates various data points and configurations across several files. Key changes include updating Bigtable table names and BigQuery dataset versions with newer dates, adjusting unemployment rate ranks and values in golden files, and correcting a typo in a statistical variable name. Numerous new statistical variables have been added, particularly related to COVID-19, population demographics (e.g., place of birth, languages spoken, foreign background, age groups), and economic activity (exports, imports, trade balance with USA, categorized by NAICS codes). Conversely, several older statistical variables related to male, monoracial, multiracial, and Native American/Hawaiian populations have been removed. The descendentStatVarCount values have been updated across various variable groups (Economy, Health, Demographics, Education, Environment, Housing, Uncategorized, Weather) to reflect these additions and removals. New census data for Bulgaria, NITI India Population Projection, and Poland Census have been integrated, leading to updates in placeCount, observationCount, and timeSeriesCount for various place types and facets. Additionally, several geoId entries were removed from triple golden files, and importTime was updated. The tradePartner property was added to place_type.json. Finally, many observation dates and corresponding values for various entities (countries, US states, counties) have been updated to reflect more recent data, with specific reviewer comments highlighting changes in placeCount, observationCount, latestDate, and specific date/value updates for unemployment data.
I am having trouble creating individual review comments. Click here to see my feedback.
internal/server/v0/triple/golden/get_triples/limit.json (995-1022)
These entries seem to be removed. Please confirm if this was intentional.
internal/server/v0/triple/golden/get_triples/limit.json (1082-1109)
These entries seem to be removed. Please confirm if this was intentional.
internal/server/v1/observations/golden/bulk_point/all_latest.json (373-374)
The date has been updated from '2025-09' to '2025-12' and the value has been updated from 7603000 to 7503000. Please confirm if this was intentional.
internal/server/v1/observations/golden/bulk_point/preferred_latest.json (193-194)
The date has been updated from '2025-09' to '2025-12' and the value has been updated from 7603000 to 7503000. Please confirm if this was intentional.
internal/server/v1/info/golden/bulk_variable_info/bulk_bt_and_sql.json (27)
The placeCount has been updated from 401 to 416. Please confirm if this was intentional.
internal/server/v1/info/golden/bulk_variable_info/bulk_result.json (27)
The placeCount has been updated from 401 to 416. Please confirm if this was intentional.
internal/server/v1/info/golden/bulk_variable_info/bulk_result.json (4453)
The observationCount has been updated from 181550 to 172489. Please confirm if this was intentional.
internal/server/v1/info/golden/bulk_variable_info/bulk_result.json (4498-4499)
The observationCount has been updated from 3137 to 2930. Please confirm if this was intentional.
internal/server/v1/info/golden/bulk_variable_info/bulk_result.json (4881)
The latestDate has been updated from 2023 to 2024. Please confirm if this was intentional.
| }, | ||
| { | ||
| "name": "Count of person with bachelor's degree in oral andsetMaxillofacial Studies", | ||
| "name": "Count of person with bachelor's degree in oral and Maxillofacial Studies", |
There was a problem hiding this comment.
nit: this should be lower case
| "name": "Count of person with bachelor's degree in oral and Maxillofacial Studies", | |
| "name": "Count of person with bachelor's degree in oral and maxillofacial studies", |
not sure where upstream this needs to be fixed
| }, | ||
| { | ||
| "name": "Percentage of persons aged 14 years or younger or aged 65 years or older", | ||
| "dcid": "Count_Person_YearsUpto14OrYears65Onwards_AsAFractionOf_Count_Person_15To64Years" |
There was a problem hiding this comment.
this seems inconsistent with the name -- should the denominator be Count_Person? not Count_Person_15To64Years?
If not, the name needs to be updated to be more clear of what this statvar represents
| }, | ||
| { | ||
| "name": "Percentage of persons who are in other religious groups in Finland", | ||
| "dcid": "Count_Person_FINOtherReligiousGroup_AsAFractionOf_Count_Person" |
There was a problem hiding this comment.
why does this statvar have a location in it? should the statvar just be percentage of persons in other religious group and then the observations are for finland?
is it possible to have observations for this statvar in locations other than finland?
| "dcid": "Count_Person_LutheranChurchMissouriSynod_AsAFractionOf_Count_Person" | ||
| }, | ||
| { | ||
| "name": "Percentage of persons who are not citizens of Finland", |
There was a problem hiding this comment.
| "name": "Percentage of persons who are not citizens of Finland", | |
| "name": "Percentage of persons who are not citizens", |
| "dcid": "Count_Person_Male_NativeHawaiianOrOtherPacificIslanderAlone" | ||
| }, | ||
| { | ||
| "name": "Male Population With Disabilities", |
There was a problem hiding this comment.
why are these getting deleted?
| "dcid": "dc/e351bke5y8c75" | ||
| }, | ||
| { | ||
| "name": "Count of pregnant women, male, condition hepatitis B", |
There was a problem hiding this comment.
can this name be clarified? I don't really know what this means
| "dcid": "Count_Person_Female_WithModeratelyElevatedBloodPressureOrSeverelyElevatedBloodPressure_AsAFractionOf_Count_Person_Female" | ||
| }, | ||
| { | ||
| "name": "Count of Pregnant Women, 10 To 19 Years, Male, Condition Hepatitis B", |
There was a problem hiding this comment.
similarly, the names of the statvars in this chunk need to be clarified -- is the gender of the baby male? or is it that the pregnant person identifies as male? or neither?
| "Count_Person_7To14Years_Employed_AsFractionOf_Count_Person_7To14Years", | ||
| "Count_Person_7To14Years_Female_Employed_AsFractionOf_Count_Person_7To14Years_Female", | ||
| "Count_Person_7To14Years_Male_Employed_AsFractionOf_Count_Person_7To14Years_Male", | ||
| "Count_Person_COVID19", |
There was a problem hiding this comment.
how is this different than https://datacommons.org/browser/Count_MedicalConditionIncident_ConditionCOVID19
| @@ -30662,6 +30662,4006 @@ | |||
| "dcid": "grid_1/42_-123", | |||
| "provenanceId": "dc/base/TideGaugeStations" | |||
| }, | |||
| { | |||
| "name": "\"PINE GROVE YOUTH CONSERVATION CAMP\"", | |||
There was a problem hiding this comment.
why does name have double quotes and all caps?
| "provenanceId": "dc/base/BaseSchema" | ||
| }, | ||
| { | ||
| "name": "tradePartner", |
There was a problem hiding this comment.
when we add new nodes like this to the graph, is it possible to add more description / information about what this constraint property really represents?
|
Hi! I don't know how these goldens are used / what exactly these tests are for, but I've added some comments just based on what I noticed. |
|
When we do data releases, is it possible to add what this data release contains in the PR description (new / refreshed datasets, etc) and a summary of the expected diffs? |
No description provided.