Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize encoding versions and test them #289

Merged
merged 2 commits into from
Jan 10, 2020
Merged

Conversation

flying-sheep
Copy link
Member

@flying-sheep flying-sheep commented Jan 10, 2020

Since we have versions we should also check them. Fixes #287

@@ -191,7 +192,7 @@ def write_csr(f, key, value, dataset_kwargs=MappingProxyType({})):
def write_csc(f, key, value, dataset_kwargs=MappingProxyType({})):
group = f.create_group(key)
group.attrs["encoding-type"] = "csc_matrix"
group.attrs["enocding-version"] = "0.1.0"
group.attrs["encoding-version"] = EncodingVersions.csc_matrix.value
Copy link
Member Author

@flying-sheep flying-sheep Jan 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @falexwolf @ivirshup: This is why I’m so anal about reusing code.

You made that typo only in one of the variants, only in zarr. Copy&paste isn’t better because as soon as one of the copies get edited, the others desync.

Copy link
Member

@ivirshup ivirshup Jan 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flying-sheep, my main idea was for them not to be in sync. This way we don't need to update every elements versions when only one has changed.

An example of this is I would like to update just the csc and csr matrices to make sure they only store sorted indices (which would make offline access faster). In this case, nothing about the dataframe encoding would require updating.

This also applies to zarr and hdf5, features can differ between them, and I don't think we need to update both versions if we find a bug that only relates to one.

Copy link
Member Author

@flying-sheep flying-sheep Jan 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? they aren’t in sync. Each value gets read from a different enum member. Did you read the code?

You’re right about zarr and HDF5 though, those are currently the same. In case we ever want to bump one and not the other, we can just split the enum into two.

@flying-sheep flying-sheep merged commit cb38243 into master Jan 10, 2020
@flying-sheep flying-sheep deleted the encoding-version branch January 10, 2020 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Breaking change in 0.7 : a bug?
2 participants