-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle invalid multi-byte sequences in iconv encoding conversions #264
base: dev
Are you sure you want to change the base?
Conversation
Hi @evanmiller! Just wondering if you've had a chance to look at this? Would you prefer that I finalise the full PR before reviewing? As noted I've only implemented the handler in the code for processing sav string values but I can chuck it in to the others as well and non-draft the PR if that's easier. Thanks! |
Hi, I like the overall approach. Maybe rename to |
Thanks! Good call, shall do 🙂 I'll make that change then add the handler in to the other reader functions and update the PR. |
I've renamed the handler to |
Ok, looking good. Since this introduces a new API I will save it for the next minor version bump (ReadStat 1.2). Can you add some documentation (at least to the header file) about how C clients can use the new machinery? I don't think it's really obvious. I'm also not sure that USER_ABORT is the right return code here since most of the handlers are system-provided rather than user-provided. Maybe a failed conversion should return BAD_STRING or perhaps a new error code. |
No worries, I'll add a comment in the header. Given the weirdness it probably warrants a write-up with an example in the "Reading files" section of the README for some context. Do you think it needs a full example or is that unnecessary bloat? Fair call re: the error message. I think |
Hi @evanmiller, just wondering if you had any feedback on the last comment? This is the way I'm leaning:
|
Sorry for nudging...just curious about this bug and its fix, since it's been quite for some time now. I got hit again by the "invalid byte sequence" bug for another of my .sav files where currently the only solution is to switch back to R package haven with version <= 2.3.1. |
@evanmiller Are you looking for a new maintainer by any chance? Our group uses this package, and while we're by no means experts, we're probably good enough to review PRs and take care of releases. |
Hi @evanmiller,
This PR is another go at #252 (cc tidyverse/haven#615).
As suggested I've reworked the bad byte handler so it's called from surrounding code when a conversion fails instead of processing bytes inside
readstat_convert()
. I've implemented the handler in the string conversion code insav_process_row()
as an example, and there are a handful of bad byte handlers in readstat_convert.c.The handler signature includes the
src
anddst
variables fromreadstat_convert()
as well as the observation index and variable object for error logging.Thanks!