Data release: release-20260205#1763
Conversation
Summary of ChangesHello @ajaits, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a comprehensive update to the system's data foundation, ensuring that users have access to the most current and expanded statistical information. It integrates new census and projection data from several countries, introduces a wealth of new demographic and economic indicators, and refines existing data points and structural metadata for improved accuracy and utility. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request appears to be a routine data release, updating various data and configuration files with new timestamps and values for release-20260205. Most changes are updates to golden test files, reflecting the new data state. While the updates are largely as expected for a data release, I've identified a significant data quality issue in one of the golden files. Several newly added statistical variables have contradictory names, such as 'Count of pregnant women, male...'. This should be addressed to maintain data integrity and avoid user confusion.
beets
left a comment
There was a problem hiding this comment.
mostly non-blocking, but please take a look at the comments. especially the last one with missing place names
| "dcid": "dc/e351bke5y8c75" | ||
| }, | ||
| { | ||
| "name": "Count of pregnant women, male, condition hepatitis B", |
There was a problem hiding this comment.
this stat var doesn't make sense to me. why is gender=male?
looking at the cl that added this, not all ended up getting fixed.
https://critique.corp.google.com/cl/850327190
There was a problem hiding this comment.
The source does have this variable though data is all 0. source generates all combinations of gender with each health condition.
| "topPlaces": [ | ||
| { | ||
| "dcid": "wikidataId/Q1585725", | ||
| "name": "Sofia Capital" |
There was a problem hiding this comment.
I'm not sure of the source, but the name seems off. wikidata shows this as Sofia City
https://www.wikidata.org/wiki/Q1585725
There was a problem hiding this comment.
the data release didn't update the name, it has been like this before likely from an earlier dump from wiikipedia. this is likely showing up now because an new bulgaria stats import added data to this place.
opened b/482272045 to track this.
| "dcid": "geoId/02060" | ||
| }, | ||
| { | ||
| "name": "Chugach Census Area", |
There was a problem hiding this comment.
Where did the names of these places go?
There was a problem hiding this comment.
@n-h-diaz These places have been added as ProvisionalNodes. Did we loose any place import since provisional nodes didn't add names but the node has other properties like containedIn, landArea https://screenshot.googleplex.com/8G4zphp3wRJzoBs
There was a problem hiding this comment.
hmm yes it looks like this is due to some merging issues since the provisional nodes are defined in schema import group: specifically we are trimming schema -> schema triples (so things like typeOf: County will get dropped from place) and schema -> leaf triples (so things like name will get dropped from place). this is a bug with prophet.
short term we could either:
- remove the provisional definitions and rebuild place (we could also just delete the whole file if it's easier for now)
- move the provisional place nodes into place import group instead of schema and then rebuild place
long term:
- fix the merging issue in prophet to ensure only schema group triples get dropped (if this is possible)
- clean up the provisional nodes, so that we don't keep around provisional definitions if there's a proper definition elsewhere
beets
left a comment
There was a problem hiding this comment.
thanks for the replies ajai. will approve to unblock the release
Mixer data release: release-20260205