You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the Pyclient implicitly converts CSV data from the get method to a pandas DataFrame. In this conversion pandas makes assumptions about the contents of each column. This can lead to unwanted results, such as the casting of integer values to string values, or string values to floats. The behaviour of dealing with NA values is often unpredictable too.
Solution
The Pyclient already contains functionality for working with column metadata. This metadata can then be used to ensure the conversion occurs in the way it is expected to.
A newly written parsing function must be implemented within the get method of the Pyclient.
Alternatives
No response
Additional context
When processing data from the National Node staging areas into BBMRI-ERIC tables, this unexpected behaviour was encountered (f.e. latitude values were seen as floats and zero's were removed from the values). To circumvent this, the pyclient's get method has been copied into the BBMRI-ERIC publish package and adjusted in such a way that no pandas DataFrame is included in the process. A second function has been added that resets the datatypes: reset_data_types
The text was updated successfully, but these errors were encountered:
YpeZ
changed the title
feat(Pyclient): use column metadata when processing data from CSV API
fix(Pyclient): ensure correct column types using schema metadata when processing data from CSV API
Dec 12, 2024
Issue
Currently the Pyclient implicitly converts CSV data from the
get
method to a pandas DataFrame. In this conversion pandas makes assumptions about the contents of each column. This can lead to unwanted results, such as the casting of integer values to string values, or string values to floats. The behaviour of dealing withNA
values is often unpredictable too.Solution
The Pyclient already contains functionality for working with column metadata. This metadata can then be used to ensure the conversion occurs in the way it is expected to.
A newly written parsing function must be implemented within the
get
method of the Pyclient.Alternatives
No response
Additional context
When processing data from the National Node staging areas into BBMRI-ERIC tables, this unexpected behaviour was encountered (f.e. latitude values were seen as floats and zero's were removed from the values). To circumvent this, the pyclient's
get
method has been copied into the BBMRI-ERIC publish package and adjusted in such a way that no pandas DataFrame is included in the process. A second function has been added that resets the datatypes: reset_data_typesThe text was updated successfully, but these errors were encountered: