-
Notifications
You must be signed in to change notification settings - Fork 800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(example): Add Faceted map using Species Habitat dataset #3809
base: main
Are you sure you want to change the base?
Conversation
This commit introduces a new example to the Altair documentation showcasing choropleth maps faceted by category. The example visualizes the distribution of suitable habitat for different species across US counties, using the proposed new Species Habitat dataset from vega-datasets (vega/vega-datasets#684). Key features of this example: - Demonstrates the use of `alt.Chart.mark_geoshape()` for geographical visualizations. - Shows how to create faceted maps for comparing categorical data across geographic regions. - Utilizes the `transform_lookup` and `transform_calculate` transforms for data manipulation within Altair. - Uses a CSV data file temporarily hosted in the vega-datasets repository branch (pending dataset merge). This example addresses issue #1711, which requested a faceted map example for the Altair documentation. Co-authored-by: Mattijn van Hoek <mattijn@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR! Added two comments. Is this a replacement of https://altair-viz.github.io/gallery/us_incomebrackets_by_state_facet.html? Btw, why the facet is working in that example, I thought that didn’t work yet 🧐
fields=['habitat_yearround_pct'] | ||
) | ||
).transform_filter( | ||
"indexof(['2', '15'], '' + floor(datum.id / 1000)) == -1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is happening here? And why is this necessary? Is it filtering out Alaska maybe, to focus on CONUS only? Normally I would do this as part of the data wrangling beforehand, but since you refer to the counties by URL it is understandable. Maybe add a comment what is happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good observation. This filter is intended to keep only the counties where the state portion of their FIPS code is neither 2 (Alaska) nor 15 (Hawaii).
The dataset covers only the contiguous states (aka CONUS / coterminous U.S.) so we need to find a way to have it removed from the vector map. (Incidentally, Alaska seems to have gone its own way on the species data. I'm not sure about Hawaii but there were no overlaps there when I looked across all species.)
I can see the benefit of filtering this out prior to the visualization in a production setting. Perhaps in this use case as an example visualization, this approach may help others better understand a use of transform_filter
. I'm happy to go either way here.
Just as a semantic point of interest (and in the fun-with-geospatial
department), I came across this USGS page that notes the official definitions of some of the terms used to describe portions of the U.S., going back to Alaska's statehood in 1959. (I grew up hearing continental used interchangeably with contiguous in this context, and I never heard the term coterminous United States until this PR...)
https://www.usgs.gov/faqs/what-constitutes-united-states-what-are-official-definitions
On May 14, 1959, the U.S. Board on Geographic Names issued the following definitions,
which defined the Continental United States as "the 49 States on the North American Continent
and the District of Columbia..." The BGN reaffirmed these definitions on May 13, 1999.
United States: The 50 States and the District of Columbia.
Continental United States: The 49 States (including Alaska, excluding Hawaii) located on the
continent of North America, and the District of Columbia.
Conterminous United States: The 48 States and the District of Columbia; that is, the United States
prior to January 3, 1959 (Alaska Statehood), wholly filling an unbroken block of territory and
excluding Alaska and Hawaii. Although the official reference applies the term "conterminous,"
many use the word "contiguous," which is almost synonymous and better known.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for providing context! It is all new to me. I grew up hearing neither continental, contiguous or conterminous US. It was just US. What a beautiful childhood I had (still we were struggling to understand the differences between the country the Netherlands vs Holland and the kingdom of the Netherlands😂)
).configure_view( | ||
stroke=None | ||
).configure_mark( | ||
invalid='filter' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference if this is set to show
, I remember I had to use this to keep the counties that hadn’t a match in the lookup, but not entirely sure anymore. I think filter
is the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs state:
"break-paths-show-path-domains" (default) — This is equivalent to "break-paths-show-domains" for path-based marks (line/area/trail) and "filter" for non-path marks.
Since the script already ensures invalid values (i.e. counties with no habitat) are assigned a value of zero...
habitat_pct="datum.habitat_yearround_pct === null ? 0 : datum.habitat_yearround_pct"
...I think configure_mark
should simply be removed. Does that sound right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see you map habitat_yearround_pct
to a new field habitat_pct
, just to get a 0
in the tooltip instead of null
.
Hmm, it is a quite hard one to review.
What you have done is completely right to make this specific example better understandable.
But for the sake of simplicity and for the generic purpose of the technique, I'm leaning towards suggesting to remove the:
.transform_filter(
"indexof(['2', '15'], '' + floor(datum.id / 1000)) == -1"
).transform_calculate(
habitat_pct="datum.habitat_yearround_pct === null ? 0 : datum.habitat_yearround_pct"
)
and use habitat_yearround_pct
directly in the encodings and include:
.configure_mark(
invalid='show'
)
instead of invalid='filter'
, to keep including the non-matching counties in the lookup.
It also could be interesting to learn about how to change the color of null
-valued elements, by including something as such:
.configure_scale(
invalid=alt.ScaleInvalidDataConfig(color=alt.value('lightgray')) # default is `color='zero-or-min'`
)
But I also think that is too much for this example. I think I would suggest to keep the example as simple as possible, meaning it might not be as aesthetically pleasing..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better, in the corresponding PR in the vega-datasets repo, set all counties within CONUS to 0
that are unassigned and leave the Alaska and Hawaii county unassigned without setting these to zero. Then Alaska and Hawaii will automatically be filtered out here as invalid, and CONUS is then all set and valid, you could set then the projection from AlbersUsa
(full US projection) to Albers
(CONUS projection).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better, in the corresponding PR in the vega-datasets repo, set all counties within CONUS to
0
that are unassigned and leave the Alaska and Hawaii county unassigned without setting these to zero. Then Alaska and Hawaii will automatically be filtered out here as invalid, and CONUS is then all set and valid, you could set then the projection fromAlbersUsa
(full US projection) toAlbers
(CONUS projection).
Yes, I think this is the cleanest way. I'll do just that.
Appreciate the suggestions @mattijn and good quetsion about the existing example. This would be intended to supplement rather than replace that example.
Will try to address the question on faceting separately. |
This commit introduces a new example to the Altair documentation showcasing choropleth maps faceted by category.
Important
This example depends on the new Species Habitat dataset being added in vega/vega-datasets#684. That PR must be merged first before this example will work correctly.
The example visualizes the distribution of suitable habitat for different species across US counties, using the proposed new Species Habitat dataset from vega-datasets (vega/vega-datasets#684).
Key features of this example:
alt.Chart.mark_geoshape()
for geographical visualizations.transform_lookup
andtransform_calculate
transforms for data manipulation within Altair.This example addresses issue #1711, which requested a faceted map example for the Altair documentation.