Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas-dedupe==1.5.0 not compatible with dedupe>=3.0 (released on 27th June 2024) #64

Open
gildastone opened this issue Sep 2, 2024 · 0 comments

Comments

@gildastone
Copy link

pandas-dedupe install the latest version of dedupe which is 3.0.3 as of now. However, when defining the field_properties in df_final = pandas_dedupe.dedupe_dataframe(df=df, field_properties=[...]), the following error is raised by dedupe:

File "/.../lib/python3.11/site-packages/dedupe/api.py", line 1141, in init
self.data_model = datamodel.DataModel(variable_definition)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../lib/python3.11/site-packages/dedupe/datamodel.py", line 32, in init
raise ValueError(
ValueError: It looks like you are trying to use a variable definition composed of dictionaries. dedupe 3.0 uses variable objects directly. So instead of [{"field": "name", "type": "String"}] we now do [dedupe.variables.String("name")].

A quick and dirty fix I did to use dedupe>=3.0.3 (just to unblock myself) is to update the utility function pandas_dedupe.utility_functions.select_fields(fields, field_properties)(link) with:

if isinstance(i, String):
    fields.append(i)

Where i is of type dedupe.variables.String instead of:

if type(i)==str:
    fields.append({'field': i, 'type': 'String'})

Last commit in this project dates from 4 years. Any plans to upgrade the package to be compatible with dedupe>=3.0 and drop compatibility with older versions? Any help needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant