Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSVRecord failing when field/header is missing #5

Open
enriquedelpino opened this issue Oct 4, 2024 · 11 comments
Open

CSVRecord failing when field/header is missing #5

enriquedelpino opened this issue Oct 4, 2024 · 11 comments

Comments

@enriquedelpino
Copy link
Contributor

Hi all,

Us in Graph.Build have been users of your project for quite a long time already. We are working on a use case, where a mapping transformation for a CSV file expects to find a determinate field (defined by a column header), but in practicality, we cannot assume all the CSV files being transformed are complete in definition. They might have a column missing, as this column might be optional.

In these scenarios, we would like to carry on with the transformation even if it's partial, but under the current code, we see the following lines in the get method on CSVRecord, cause the whole transformation to break.

if (!this.data.containsKey(toDatabaseCase)) { throw new IllegalArgumentException(String.format("Mapping for %s not found, expected one of %s", toDatabaseCase, data.keySet())); }

We believe this code should be more resilient, log out the missing field, but still return an empty list, to avoid stopping the transformation running.

` @OverRide
public List get(String reference) {
String toDatabaseCase;
if (this.data.containsKey(reference.toUpperCase())) {
toDatabaseCase = reference.toUpperCase();
} else if (this.data.containsKey(reference.toLowerCase())) {
toDatabaseCase = reference.toLowerCase();
} else {
toDatabaseCase = reference;
}
if (!this.data.containsKey(toDatabaseCase)) {
logger.warn(String.format("Mapping for %s not found, expected one of %s", toDatabaseCase, data.keySet()));
return List.of();
}
String obj = this.data.get(toDatabaseCase);

    if (obj == null) return List.of();
    return List.of(obj);
}`

I've been comparing the current definitions of the different Record implementations (JSON, XML, Excel, etc.) to the older class definitions, for instance in your old RML Mapper v 5.0, and it seems this same error was wide spread among the other record types, but it has been addressed in the new dataio project, which leads me to believe, my proposition actually aligns to your general position in the rest of record types when it comes to a missing field on the input file.

What are your thoughts on this?

Kind Regards,
Enrique

@ghsnd
Copy link
Contributor

ghsnd commented Oct 7, 2024

Hi Enrique,

The fact that different record implementations behave differently when encountering an 'unknown' reference is an issue that we'll address by the next release. Thanks for pointing this out!

The "fuzzy" reference matching is something we'll have a closer look at, because also here consistent behaviour over every record type is desired.

Best regards,
Gerald

@enriquedelpino
Copy link
Contributor Author

Thank you very much for your answer, I appreciate it very much.

I have created a PR that addresses this issue I mentioned above in case you want to consider my contribution in order to address this if that actually helps you in this process.

#8

Please, let me know your thoughts on this.

@ghsnd
Copy link
Contributor

ghsnd commented Nov 7, 2024

Hi Enrique,
To address these issues, the API of Record has changed. Now the get method on Record returns a RecordValue which contains methods to see if the value is OK (~non-null), empty (~null) or no value due to an error. I'm also integrating this in RMLMapper. I hope this helps you out.

@enriquedelpino
Copy link
Contributor Author

Hi Gerald,

This is excellent news. I've checked the new Record API, and it is great you thought of keeping trace of errored Records so that they can be reported independently. This is actually something I was going to need at some point down the line.

Nonetheless, I've been trying to locate the matching changes RMLMapper, but I have been unable to trace new changes related to this on the development branch. Is there a chance you've got these changes in a feature branch so that I can have a look at them and start preparing my code according to this?

Kind Regards,
Enrique

@ghsnd
Copy link
Contributor

ghsnd commented Nov 25, 2024

Hi Enrique,

We're integrating these changes in RMLMapper, in parallel with some other changes/fixes. The moment these work together well, we'll merge them into the development branch. I'll let you know when that's done so you can move on. Maybe somewhere this week :).

Best regards,
Gerald

@enriquedelpino
Copy link
Contributor Author

This is great news Gerald! I highly appreciate this!

@ghsnd
Copy link
Contributor

ghsnd commented Dec 10, 2024

Hi Enrique, by fixing this in DataIO I kind of opened a can of worms trying to integrate it in RMLMapper, especially some test cases need attention. We're working on this, but are not ready yet. I'll keep you posted.

@enriquedelpino
Copy link
Contributor Author

Hi Gerald, first of all Happy New Year! I was wondering if you've been able to make some progress on this front. I don't want to appear to be demanding by any means, so I was wondering if there would be anything I can do to help with this regard and the project in general. We'd be more than happy to collaborate on this.

Kind Regards,
Enrique

@ghsnd
Copy link
Contributor

ghsnd commented Jan 13, 2025

Hi Enrique, the best for 2025!
In the mean time the changes are released in DataIO 2.0.0. Integration in RMLMapper is ready and awaiting internal review. I'll keep you posted!
Kind regards,
Gerald

@ghsnd
Copy link
Contributor

ghsnd commented Jan 13, 2025

Hi, it's avaliable in the development branch of RMLMapper. So it will be in the next release.
Best regards,
Gerald

@enriquedelpino
Copy link
Contributor Author

Hi Gerald!

This is fantastic news, if the changes are already on the development branches, we've got more than enough for the time being to carry on working on this. Waiting for the release to happen in due time should not be an issue at all.

Kind Regards,
Enrique

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants