-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Ensure ordering is passed through model_matrix #213
Conversation
This is needed to allow formulas to be reordered, and for the order to be preserved when creating a model matrix. If you pass a non-structured Formula, it works. It fails when the Formula has structure. Not 100% sure that this is the best way to achieve this, but gets the jobs done. |
Thanks for catching this, and for the patch! I'll take a closer look tomorrow! |
5b6292a
to
e3f1012
Compare
I haven't forgotten about this! There are lots of things at play at the moment (e.g. #216) as I try to dash off more of the open issues... and those changes may impact how we think about this too. I think that this patch may be masking a deeper problem around how the Formula constructor is reshuffling formulae... so having this unmerged is a forcing function for me to really think about it. I'll definitely make sure this is resolved before the 1.1 release which will be released soon-ish (once you are convinced it is ready to cater to statsmodels, and once the other PRs land); but let me know if having this on a branch is problematic to you, and we can merge it in and I'll just make a note to revisit it before the 1.1.0 release. |
I'm happy to wait. I'm just using a personal branch as the target for development that has this patch and has in fact some of the other patches that haven't merged too. |
I'm happy to wait. I'm just using a personal branch as the target for development that has this patch and has in fact some of the other patches that haven't merged too. One thing I noticed when writing this patch was that even if you build formulae recursively where you attach a formula to another formula all with ordering, that inside the recursive calls in this function, the sub formulae will be passed as a lists of terms and not a Formula. This is why I ended up with this internal solution using partial since only the outer Formula seemed to retain its ordering information. |
model_matrix silently loses Formula ordering information and changes it to degree
Ensure that nested formulas have their order preserved
e3f1012
to
8fb8ca7
Compare
When I pull #213 and #217 onto main, I can get statsmodels to pass all tests. There is one small issue that I think could be classified as "won't (can't) fix". patsy can sometimes suppress the import pandas as pd
import numpy as np
from patsy import dmatrices
from formulaic import model_matrix
data = pd.DataFrame(
{
"y": np.random.standard_normal(30),
"a": np.random.choice(["a", "b", "c"], size=30),
"b": np.random.choice(["x", "y", "z"], size=30),
}
)
formula = "y ~ C(a) + C(b) - 1"
print(dmatrices(formula, data, return_type="dataframe")[1].head(5))
print(model_matrix(formula, data).rhs.head(5)) This outputs
|
I have poked around in lhs = Formula("x:y + y - 1", _ordering="none")
rhs = Formula("1 + w + d:x +x + e ", _ordering="sort")
formula = Formula(lhs=lhs, rhs=rhs, _ordering="degree")
larger_formula = Formula(left=formula, right=formula)
print(type(larger_formula))
for key, value in larger_formula._structure.items():
print(key, type(value))
for inner_key, inner_value in value._structure.items():
print((key, inner_key), type(inner_value)) returns
|
I don't know why I didn't think about this before, but I've patched formulaic to mimic this behaviour in #220 . Thanks for highlighting.
Technically a Formula is just a The issue with the example you provided above is that the My plan is to rework the base Thanks again @bashtage ! |
model_matrix silently loses Formula ordering information and changes it to degree