Skip to content

Conversation

@provinzkraut
Copy link
Contributor

@provinzkraut provinzkraut commented Nov 15, 2025

Fix #874.

Align Struct's __post_init__ behaviour with that of dataclasses, when created by copy.copy / copy.deepcopy / copy.replace.

  • Call __post_init__ on copy.replace / __replace__
  • Do not call post_initoncopy.copy/copy`
  • Do not call post_initoncopy.deepcopy/deepcopy`

This is a breaking change in behaviour, as structs intentionally did not call __post_init__ after a __replace__ operation. However, as this diverges from dataclass behaviour, it's probably the right thing to do.

To achieve the desired copy.deepcopy behaviour, I had to implement a new __deepcopy__ method (previously structs did not define a custom __deepcopy__.

An open question though is how to behave in the case of a __copy__ operation. Currently, __post_init__ isn't called there either. Intuitively, it would make sense for __copy__ and __replace__ to behave the same in regards to __post_init__, as they're similar operations (both create a new instance from an existing instance of the same type).

@kramar11
Copy link

There is no need to call __post_init__ on copy as the "validation" of the attributes already happened when creating the object.
dataclasses don't call __post_init__ on copy.copy() either.

While testing the behaviour of msgspec.Structs vs dataclasses, I just noticed something odd, namely that msgspec.Struct.__post_init__ gets called on copy.deepcopy() but not on copy.copy()! dataclasses neither call __post_init__ on copy.copy() nor copy.deepcopy().

Testscript.py:

import copy
from dataclasses import dataclass, replace

import msgspec

@dataclass
class D:
    x: int
    def __post_init__(self):
        print("  - dataclass: post init called")


class M(msgspec.Struct):
    x: int
    def __post_init__(self):
        print("  - msgspec: post init called")


print("Construct objects")
d = D(1)
m = M(1)

print("Test replace")
d2 = replace(d, x=2)
m2 = msgspec.structs.replace(m, x=2)

print("Test copy.copy()")
d3 = copy.copy(d)
m3 = copy.copy(m)

print("Test copy.deepcopy()")
d4 = copy.deepcopy(d)
m4 = copy.deepcopy(m)

Output:

Construct objects
  - dataclass: post init called
  - msgspec: post init called
Test replace
  - dataclass: post init called
Test copy.copy()
Test copy.deepcopy()
  - msgspec: post init called

@ofek
Copy link
Collaborator

ofek commented Nov 23, 2025

Is there a compelling reason to call that upon a deep copy? If not, then I'd prefer to also fix that in this PR.

@provinzkraut
Copy link
Contributor Author

Is there a compelling reason to call that upon a deep copy? If not, then I'd prefer to also fix that in this PR.

None that I can think of. Mirroring dataclass behaviour seems to be sensible, however, removing this __post_init__ call on copy.deepcopy would be a breaking change imo, so I'm not sure how we want to go about that.

@ofek
Copy link
Collaborator

ofek commented Nov 23, 2025

I think breaking changes are fine since we are still sub-1.0 and we're also going to introduce others like #790. Both will come in the next minor release.

@provinzkraut
Copy link
Contributor Author

I think breaking changes are fine since we are still sub-1.0 and we're also going to introduce others like #790. Both will come in the next minor release.

I'll update this PR accordingly then

@provinzkraut provinzkraut changed the title call __post_init__ after replace() Align __post_init__ behaviour after copy.copy / copy.deepcopy / copy.replace with dataclasses Nov 25, 2025
}


static PyObject* get_deepcopy_func() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying a new style of handling these kinds of imports here. We initialize the field on the module state as NULL, and import on an as-needed basis. I think this is a fairly reasonable compromise between performance and complexity. After the first import, it's just one additional x == NULL check, which should be negligible in terms of performance.

The only real downside I can see is that you now have got to remember to use the "getter function" and cannot rely on the module state.

@ofek lmk what you think about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

__post_init__ not called from replace

3 participants