Update type_histogram.R #288

eleuven · 2025-01-21T14:04:44Z

do not force common break points across facets. this can be done by passing break points to the break option. however, passing a vector to break is currently broken as is.null() is not vectorised.

see here for some background

Fixes #269

do not force common break points across facets. this can be done through the break option. however, passing a vector to break is currently broken as is.null() is not vectorised.

grantmcdermott

Thanks for this.

I can see that a couple of tests are failing. I'm not sure if it's because of things that I've pointed out in my quick review of the code or something else. Hopefully I'll have time to investigate properly this evening.

R/type_histogram.R

grantmcdermott · 2025-01-21T19:30:40Z

R/type_histogram.R

-        datapoints_breaks = hist(datapoints$x, breaks = hbreaks, plot = FALSE)
        datapoints = split(datapoints, list(datapoints$by, datapoints$facet))
        datapoints = Filter(function(k) nrow(k) > 0, datapoints)

        datapoints = lapply(datapoints, function(k) {
-            h = hist(k$x, breaks = datapoints_breaks$breaks, plot = FALSE)
+            h = hist(k$x, breaks = hbreaks, plot = FALSE)


I don't think we can delete the datapoints_breaks object, since we we should still use it by default in the case of grouped histograms. The thing we want to avoid is different binning widths for each group (unless the user explicitly requests it) which is why we calculate it on the full dataset first.

sure, but atm there is no possibility of different binning for each group/facet since datapoint_breaks is always passed to breaks, and there is no possibility of passing a vector of breaks because of ifelse(). it would be nice if breaks accepted the same possibilities as hist (function, vector, number).

i think that different binning by group/facet often makes sense when freq=FALSE (missing atm).

thanks for looking into this!

perhaps introducing an option to allow for free breaks (as in this commit) is an idea...

grantmcdermott · 2025-01-28T05:12:53Z

This is great, thanks @eleuven. I finally had a chance to play around with your suggestion tonight and agree that we need a solution like the one you've proposed here.

As it happens, we've just been resolving a similar issue for joint bandwidth selection in grouped density plots (e.g., see here). The solution we landed on there was to introduce a joint.bw argument that takes the arguments "mean" (default), "full", or "none".

My only hesitation right now is thinking about argument consistency across type_density() and type_histogram(). Do we want to introduce a joint.breaks argument for the latter (that mirrors joint.bw in the former), or stick with something simple like your freebreaks suggestion?

Let me think on it a bit more and get back to you.

eleuven · 2025-01-28T10:36:19Z

As it happens, we've just been resolving a similar issue for joint bandwidth selection in grouped density plots (e.g., see here). The solution we landed on there was to introduce a joint.bw argument that takes the arguments "mean" (default), "full", or "none".

My only hesitation right now is thinking about argument consistency across type_density() and type_histogram(). Do we want to introduce a joint.breaks argument for the latter (that mirrors joint.bw in the former), or stick with something simple like your freebreaks suggestion?

it looks like my freebreaks suggestion coincides with "full", or "none" in the joint.bw so it make sense to use at least some similar naming

i do not know what the historical background is for "mean", but it strikes me as a bit odd and that "none" is a sensible default, and in the case of a histogram i am also not sure whether it makes sense

however, if i am not mistaken then what is now implemented by "mean" could also be achieved by the user through using "full" + passing a custom bw, so in that sense i do not think the joint.bw implementation is conceptually very clean as it mixes up the bandwidth estimator and the population for which it should be computed

just thinking out loud though...

grantmcdermott · 2025-01-31T23:29:43Z

it looks like my freebreaks suggestion coincides with "full", or "none" in the joint.bw so it make sense to use at least some similar naming

Yup (specifically "none").

however, if i am not mistaken then what is now implemented by "mean" could also be achieved by the user through using "full" + passing a custom bw, so in that sense i do not think the joint.bw implementation is conceptually very clean as it mixes up the bandwidth estimator and the population for which it should be computed

The joint.bw arg is ignored if the user passes through a custom (numeric) bw, so I think we do maintain conceptual clarity there. Nonetheless, you're certainly right that it's hard to find something consistent across all cases. We've played around with many different datasets (both real and simulated) for density type and it turns out that it's very hard to find one rule that dominates globally. (See the discussion here for some guidance and why settled on "mean" as the most pragmatic default.)

I appreciate the feedback and discussion, though. Right now, I'm leaning towards a joint.breaks argument... Let me play around with an implementation to see what it feels like in practice.

- document

Fix borked merge

grantmcdermott · 2025-02-02T05:51:04Z

Okay, after experimenting, I've decided to stick with a very slightly tweaked version of your free.breaks logical arg. (The added period between "free" and "breaks" allows the lazy user to do things like type_hist(free = TRUE).)

Note, however, that this wasn't the main reason free facet scales weren't working. Rather, it was the fact that we were keeping zero count bins. So I've added a new drop.zeros arg to handle that case (which it now does by default) and updated the documentation to explain when these two args do.

Thanks again for pushing this forward @eleuven!

Update type_histogram.R

8d4ffcd

do not force common break points across facets. this can be done through the break option. however, passing a vector to break is currently broken as is.null() is not vectorised.

grantmcdermott reviewed Jan 21, 2025

View reviewed changes

introduce option to allow for free breaks across groups and facets

b409f68

grantmcdermott added 4 commits February 1, 2025 21:17

tweak free.breaks + add drop.zeros arg

118f42a

- document

Merge branch 'main' into patch-1

c2051dc

Update type_histogram.Rd

74f548e

Update type_histogram.R

532fd06

Fix borked merge

grantmcdermott merged commit da6e957 into grantmcdermott:main Feb 2, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update type_histogram.R #288

Update type_histogram.R #288

eleuven commented Jan 21, 2025 •

edited by grantmcdermott

Loading

grantmcdermott left a comment

grantmcdermott Jan 21, 2025

eleuven Jan 21, 2025

eleuven Jan 23, 2025

grantmcdermott commented Jan 28, 2025 •

edited

Loading

eleuven commented Jan 28, 2025

grantmcdermott commented Jan 31, 2025

grantmcdermott commented Feb 2, 2025

Update type_histogram.R #288

Update type_histogram.R #288

Conversation

eleuven commented Jan 21, 2025 • edited by grantmcdermott Loading

grantmcdermott left a comment

Choose a reason for hiding this comment

grantmcdermott Jan 21, 2025

Choose a reason for hiding this comment

eleuven Jan 21, 2025

Choose a reason for hiding this comment

eleuven Jan 23, 2025

Choose a reason for hiding this comment

grantmcdermott commented Jan 28, 2025 • edited Loading

eleuven commented Jan 28, 2025

grantmcdermott commented Jan 31, 2025

grantmcdermott commented Feb 2, 2025

eleuven commented Jan 21, 2025 •

edited by grantmcdermott

Loading

grantmcdermott commented Jan 28, 2025 •

edited

Loading