Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usability improvements for pd() and pd_to_p() #665

Closed
bwiernik opened this issue Aug 22, 2024 · 9 comments · Fixed by #668
Closed

Usability improvements for pd() and pd_to_p() #665

bwiernik opened this issue Aug 22, 2024 · 9 comments · Fixed by #668
Assignees

Comments

@bwiernik
Copy link
Contributor

bwiernik commented Aug 22, 2024

I've been using pd() a bunch lately, and there are a few things that make it a little annoying to integrate into my workflow. These 2 things would make it much smoother for me:

[ ] A as_vector argument that returns a simple vector of pd values, rather than a data frame.

This would fit into a workflow where I have a data frame with posterior draws (e.g., as a list column or as posterior::rvar) and I want to compute pd as a new column.

(An alternative would be to make the method for rvar always return a vector value, but I don't like that inconsistency, and it would mean it's not applicable to similar data structures that don't use posterior, like list columns.)

results |> transform(pd = pd(.epred))

[ ] A pd_to_p.p_direction() method that takes the data frame output of p_direction() and directly converts it a data frame with p values instead. That would avoid having to do this rather involved set of steps:

results |> pd() |> transform(p = pd_to_p(pd)) |> subset(select = - pd)

[ ] Maybe it's a little too heretical, but maybe add an as_frequentist_p or as_p argument to pd() to return the results scaled as p values in one step

@strengejacke
Copy link
Member

For the first point, I think @mattansb added an as.numeric() method for all functions.

@mattansb
Copy link
Member

I think that was @DominiqueMakowski ?

@bwiernik How about something like allowing the data.frame method to select an rvar column? (Which I think was what I meant with #604)

grid <- data.frame(
  A = letters[1:2],
  B = c(2, 3),
  val = posterior::rvar(array(rnorm(1200), dim = c(600, 2)))
)

# Pull rvar column:
bayestestR::p_direction(grid$val)
#> Probability of Direction
#> 
#> Parameter |     pd
#> ------------------
#> x[1]      | 53.50%
#> x[2]      | 52.67%

# Or pass the data frame and tell the function what column has rvars:
bayestestR::p_direction(grid, rvar_col = "val")
#>   A B          val        pd
#> 1 a 2 0.062 ± 0.99 0.5350000
#> 2 b 3 0.037 ± 1.00 0.5266667

# Original behavior is maintained when not specifying rvar_col:
bayestestR::p_direction(grid)
#> Probability of Direction
#> 
#> Parameter |   pd
#> ----------------
#> B         | 100%

This is implemented in #666 😈

@bwiernik
Copy link
Contributor Author

I typically use pd() inside of a call to mutate() with several other transformations (eg, taking an rvar columns and computing its median, CI, and pd in one step)

@strengejacke
Copy link
Member

This sounds like something that can be done with model_parameters().

@bwiernik
Copy link
Contributor Author

No, I'm working with vectors of predictions or custom contrasts

@DominiqueMakowski
Copy link
Member

I agree that there's room for improvement, I also find out functions sometimes fiddly within tidyverse pipelines

@strengejacke
Copy link
Member

[ ] A pd_to_p.p_direction() method that takes the data frame output of p_direction() and directly converts it a data frame with p values instead. That would avoid having to do this rather involved set of steps:

Should that return a data frame again?

@strengejacke strengejacke self-assigned this Aug 31, 2024
strengejacke added a commit that referenced this issue Aug 31, 2024
* Usability improvements for pd() and pd_to_p()
Fixes #665

* docs

* add as_p

* Work on #664

* add tests

* add tests

* tests for as..vector

* news

* include #666

* news

* news

* lintr
@bwiernik
Copy link
Contributor Author

bwiernik commented Sep 3, 2024

[ ] A pd_to_p.p_direction() method that takes the data frame output of p_direction() and directly converts it a data frame with p values instead. That would avoid having to do this rather involved set of steps:

Should that return a data frame again?

Yes, I think that's what you implemented?

@strengejacke
Copy link
Member

yes. and we have as.numeric() or as.vector() methods to returns a vector instead of df.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants