Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(example): Add Faceted map using Species Habitat dataset #3809

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dsmedia
Copy link
Contributor

@dsmedia dsmedia commented Feb 22, 2025

This commit introduces a new example to the Altair documentation showcasing choropleth maps faceted by category.

Important

This example depends on the new Species Habitat dataset being added in vega/vega-datasets#684. That PR must be merged first before this example will work correctly.

The example visualizes the distribution of suitable habitat for different species across US counties, using the proposed new Species Habitat dataset from vega-datasets (vega/vega-datasets#684).

Key features of this example:

  • Shows how to create faceted maps for comparing categorical data across geographic regions.
  • Demonstrates the use of alt.Chart.mark_geoshape() for geographical visualizations.
  • Utilizes the transform_lookup and transform_calculate transforms for data manipulation within Altair.
  • Uses a CSV data file temporarily hosted in the vega-datasets repository branch (pending dataset merge).

This example addresses issue #1711, which requested a faceted map example for the Altair documentation.

This commit introduces a new example to the Altair documentation showcasing  choropleth maps faceted by category.

The example visualizes the distribution of suitable habitat for different species across US counties, using the proposed new Species Habitat dataset from vega-datasets (vega/vega-datasets#684).

Key features of this example:

- Demonstrates the use of `alt.Chart.mark_geoshape()` for geographical visualizations.
- Shows how to create faceted maps for comparing categorical data across geographic regions.
- Utilizes the `transform_lookup` and `transform_calculate` transforms for data manipulation within Altair.
- Uses a CSV data file temporarily hosted in the vega-datasets repository branch (pending dataset merge).

This example addresses issue #1711, which requested a faceted map example for the Altair documentation.

Co-authored-by: Mattijn van Hoek <mattijn@gmail.com>
@dangotbanned dangotbanned changed the title docs: Add faceted map example using Species Habitat dataset docs(example): Add Faceted map using Species Habitat dataset Feb 22, 2025
Copy link
Contributor

@mattijn mattijn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! Added two comments. Is this a replacement of https://altair-viz.github.io/gallery/us_incomebrackets_by_state_facet.html? Btw, why the facet is working in that example, I thought that didn’t work yet 🧐

fields=['habitat_yearround_pct']
)
).transform_filter(
"indexof(['2', '15'], '' + floor(datum.id / 1000)) == -1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is happening here? And why is this necessary? Is it filtering out Alaska maybe, to focus on CONUS only? Normally I would do this as part of the data wrangling beforehand, but since you refer to the counties by URL it is understandable. Maybe add a comment what is happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation. This filter is intended to keep only the counties where the state portion of their FIPS code is neither 2 (Alaska) nor 15 (Hawaii).

The dataset covers only the contiguous states (aka CONUS / coterminous U.S.) so we need to find a way to have it removed from the vector map. (Incidentally, Alaska seems to have gone its own way on the species data. I'm not sure about Hawaii but there were no overlaps there when I looked across all species.)

I can see the benefit of filtering this out prior to the visualization in a production setting. Perhaps in this use case as an example visualization, this approach may help others better understand a use of transform_filter. I'm happy to go either way here.

Just as a semantic point of interest (and in the fun-with-geospatial department), I came across this USGS page that notes the official definitions of some of the terms used to describe portions of the U.S., going back to Alaska's statehood in 1959. (I grew up hearing continental used interchangeably with contiguous in this context, and I never heard the term coterminous United States until this PR...)

https://www.usgs.gov/faqs/what-constitutes-united-states-what-are-official-definitions

On May 14, 1959, the U.S. Board on Geographic Names issued the following definitions, 
which defined the Continental United States as "the 49 States on the North American Continent 
and the District of Columbia..." The BGN reaffirmed these definitions on May 13, 1999. 

United States: The 50 States and the District of Columbia.

Continental United States: The 49 States (including Alaska, excluding Hawaii) located on the 
continent of North America, and the District of Columbia.

Conterminous United States: The 48 States and the District of Columbia; that is, the United States 
prior to January 3, 1959 (Alaska Statehood), wholly filling an unbroken block of territory and 
excluding Alaska and Hawaii. Although the official reference applies the term "conterminous," 
many use the word "contiguous," which is almost synonymous and better known.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing context! It is all new to me. I grew up hearing neither continental, contiguous or conterminous US. It was just US. What a beautiful childhood I had (still we were struggling to understand the differences between the country the Netherlands vs Holland and the kingdom of the Netherlands😂)

).configure_view(
stroke=None
).configure_mark(
invalid='filter'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference if this is set to show, I remember I had to use this to keep the counties that hadn’t a match in the lookup, but not entirely sure anymore. I think filter is the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs state:

"break-paths-show-path-domains" (default) — This is equivalent to "break-paths-show-domains" for path-based marks (line/area/trail) and "filter" for non-path marks.

Since the script already ensures invalid values (i.e. counties with no habitat) are assigned a value of zero...

habitat_pct="datum.habitat_yearround_pct === null ? 0 : datum.habitat_yearround_pct"

...I think configure_mark should simply be removed. Does that sound right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see you map habitat_yearround_pct to a new field habitat_pct, just to get a 0 in the tooltip instead of null.

Hmm, it is a quite hard one to review.

What you have done is completely right to make this specific example better understandable.
But for the sake of simplicity and for the generic purpose of the technique, I'm leaning towards suggesting to remove the:

.transform_filter(
        "indexof(['2', '15'], '' + floor(datum.id / 1000)) == -1"
).transform_calculate(
        habitat_pct="datum.habitat_yearround_pct === null ? 0 : datum.habitat_yearround_pct"
)

and use habitat_yearround_pct directly in the encodings and include:

.configure_mark(
    invalid='show'
)

instead of invalid='filter', to keep including the non-matching counties in the lookup.

It also could be interesting to learn about how to change the color of null-valued elements, by including something as such:

.configure_scale(
    invalid=alt.ScaleInvalidDataConfig(color=alt.value('lightgray'))  # default is `color='zero-or-min'`
)

But I also think that is too much for this example. I think I would suggest to keep the example as simple as possible, meaning it might not be as aesthetically pleasing..

Copy link
Contributor

@mattijn mattijn Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better, in the corresponding PR in the vega-datasets repo, set all counties within CONUS to 0 that are unassigned and leave the Alaska and Hawaii county unassigned without setting these to zero. Then Alaska and Hawaii will automatically be filtered out here as invalid, and CONUS is then all set and valid, you could set then the projection from AlbersUsa (full US projection) to Albers (CONUS projection).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better, in the corresponding PR in the vega-datasets repo, set all counties within CONUS to 0 that are unassigned and leave the Alaska and Hawaii county unassigned without setting these to zero. Then Alaska and Hawaii will automatically be filtered out here as invalid, and CONUS is then all set and valid, you could set then the projection from AlbersUsa (full US projection) to Albers (CONUS projection).

Yes, I think this is the cleanest way. I'll do just that.

@dsmedia
Copy link
Contributor Author

dsmedia commented Feb 23, 2025

Thanks for this PR! Added two comments. Is this a replacement of https://altair-viz.github.io/gallery/us_incomebrackets_by_state_facet.html?

Appreciate the suggestions @mattijn and good quetsion about the existing example. This would be intended to supplement rather than replace that example.

  1. The income brackets example demonstrates faceting by numerical ranges, which is still a valid and useful visualization technique.

  2. The new species habitat example demonstrates faceting by distinct categories (species), which represents a common use case highlighted by @palewire in the original issue:

#1711 (comment)

In news graphics, the most common case for a faceted map is when you want
to create a set of "mini multiples" that compare quantitative values on a
shared scaled across a set of competing nominative values.

#1711 (comment)

I think facets by time series segment or by a quantitative bracket are interesting, but I'd wager that both are much less common than charts that facet by a nominative category.

Will try to address the question on faceting separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants