Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text wrapping fixes #69

Merged
merged 5 commits into from
Jul 16, 2024
Merged

text wrapping fixes #69

merged 5 commits into from
Jul 16, 2024

Conversation

kyleam
Copy link
Collaborator

@kyleam kyleam commented Jul 11, 2024

This series fixes three wrap_text() bugs. I started looking into this because rendering the scorecards for six packages on MPN (gsDesign, mlr, pals, r2rtf, tidyposterior, and vcr) failed with a TeX error related to long lines ("Unable to read an entire line..."). The failure was introduced by 773c9ee (bug fix and refactor: tracability matrix, 2024-03-19, gh-57).

That failure is resolved by the fourth commit ("wrapi_text: fix indexing logic"). The second and fifth commits have the other two fixes.


kyleam added 5 commits July 11, 2024 09:56
When handling a string (say "abcde") that exceeded the specified width
(say 2), wrapi_text() uses two values to put the final string
together:

 * start: a string that serves as frozen/good start.  It was
   calculated during the last iteration that exceeded the maximum
   width ("ab\nc", for the example above when splitting on each
   character and at the iteration for "e").

 * add: a string representing the characters after `start` through the
   character of the current iteration ("d\ne").

When wrapi_text() joins these two values, it includes a newline,
leading to a spurious newline between "c" and "d".

Drop the newline, leaving just the original character that was used to
split the string ("" in the example above).
This is already used in two spots, and the next commit will introduce
a third.

(While touching these lines, switch to more conventional spacing.)
wrapi_text() splits a string into pieces and then builds it back up,
inserting a newline before the next piece if it would lead to the
string exceeding the specified width.  The `shift` variable is set
whenever extending the string with the next piece would exceed the
maximum width.  It marks the last position that did _not_ exceed the
bound.

The logic for incrementing `shift`, however, is flawed.  It's
calculated by adding the current `shift` value to the current
iteration's index, resulting in skipped values and NAs (from
out-of-bounds indexing).

One way to correct the value would be to set `shift` to the index for
the current iteration.  Instead set it to the index for the _next_
iteration, redefining `shift` to be the first piece to consider when
extending the string.  Defining it this way lets us use `shift` rather
than `1 + shift` when indexing.
wrapi_text() splits a string into pieces and then builds it back up,
inserting a newline before the next piece if it would lead to the
string exceeding the specified width.  The result is stored in the
`start` variable, which gets extended each time the width is exceeded.

To extend `start`, wrapi_text() appends the value of

  {pieces that did not exceed width}{sep}{newline}{piece that exceeded width}

wrapi_text() incorrectly assumes that there is at least one piece that
did _not_ exceed the width since the last time `start` was extended,
leading to a malformed result with the last piece repeated in an
incorrect position.

Update the logic to account for the fact that successive pieces can
exceed the specified width.
@kyleam
Copy link
Collaborator Author

kyleam commented Jul 11, 2024

Regression test failures when executed with wrapi_text from main (3e8399d):

── Failed tests ────────────────────────────────────────────────────────────────
Failure ('test-util.R:39:3'): wrap_text(): past offenders
wrap_text("abcde", width = 2, wrap_sym = NULL, strict = TRUE) not identical to "ab\ncd\ne".
1/1 mismatches
x[1]: "ab\nc\nd\ne"
y[1]: "ab\ncd\ne"

Failure ('test-util.R:44:3'): wrap_text(): past offenders
wrap_text("abcdefghij", width = 4, wrap_sym = NULL, strict = TRUE) not identical to "abcd\nefgh\nij".
1/1 mismatches
x[1]: "abcd\ne\nfgh\ni\nNANANANANAji\nj"
y[1]: "abcd\nefgh\nij"

Failure ('test-util.R:49:3'): wrap_text(): past offenders
wrap_text(...) not identical to "test-foo\n.R\ntest-foo\n_bar.R".
1/1 mismatches
x[1]: "test-foo\n.\nR\ntest-foo\n_\nNANANANANAR.rab_\nb\nNANANANANANANANANANANAN
x[1]: ANANANANANANANANANANANANANANAR.rab\na\nNANANANANANANANANANANANANANANANANAN
x[1]: ANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANAR.ra\nr\nNANA
x[1]: NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
x[1]: NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANAR.r\n.\...
y[1]: "test-foo\n.R\ntest-foo\n_bar.R"

Failure ('test-util.R:57:3'): wrap_text(): past offenders
wrap_text(...) not identical to "test\n-foo\n.R\ntest\n-foo\n_bar\n.R".
1/1 mismatches
x[1]: "test\n-\nfoo\n.\ntset\nR.\nR\nR.rab_oof-tset\nR\n\n\nNANANANANANANANANANA
x[1]: NAR.rab_oof-tset\n\nt\nNANANANANANANANANANANANANANANANANANANANANANANAR.rab
x[1]: _oof-tset\ne\nNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
x[1]: NANANANANANAR.rab_oof-tse\ns\nNANANANANANANANANANANANANANANANANANANANANANA
x[1]: NANANANANANANANANANANANANANANANANANANANANANANANANANANANAR.rab_oof-ts\nt...
y[1]: "test\n-foo\n.R\ntest\n-foo\n_bar\n.R"

Failure ('test-util.R:65:3'): wrap_text(): past offenders
wrap_text("foo/bar/baz", width = 4) not identical to "foo/\nbar/\nbaz".
1/1 mismatches
x[1]: "foo/\nbar/\nbaz/bar/\nbaz"
y[1]: "foo/\nbar/\nbaz"

[ FAIL 5 | WARN 0 | SKIP 0 | PASS 9 ]


expect_identical(
wrap_text("foo/bar/baz", width = 4),
"foo/\nbar/\nbaz"
Copy link
Collaborator Author

@kyleam kyleam Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that all these cases involve multiple splits on the same separator. Although I haven't looked into it closely, I think that is why none of these issues were noticed when looking at the bbr example in gh-57. For example, man/check_nonmem_table_output.Rd is split to man/NEWLINEcheck_nonmem_table_NEWLINEoutput.Rd. However, despite that having two newlines inserted, that's one split on "/" and then another on "_". For any given separator, paste_pieces() doesn't get far enough into its logic to hit the bugs.

@kyleam kyleam requested a review from barrettk July 11, 2024 15:10
Base automatically changed from ci-r-version-update to main July 12, 2024 14:58
Copy link
Collaborator

@barrettk barrettk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exact cause of the issue was a bit confusing, but the fixes and regression tests look good

@kyleam
Copy link
Collaborator Author

kyleam commented Jul 15, 2024

The exact cause of the issue was a bit confusing [...]

There are still some aspects I'm not sure about, but fwiw here's how I understand the link between b4ecbd1 and the "Unable to read an entire line..." error.

The out-of-bounds indexing can produce very long lines due to all the NAs. Trying to render mlr, for example, errors with

! ! Unable to read an entire line---bufsize=200000.
! Please increase buf_size in texmf.cnf.

! Please increase buf_size in texmf.cnf.

! xdvipdfmx:fatal: Reading DVI file failed!

And here are the long lines in that tex file:

$ while read line; do printf '%s' "$line" | wc -c; done <mlr_2.19.2.scorecard.tex | sort -gr | head
  4867912
  3756604
  14350
  12131
  10215
  6341
  5778
  5737
  5303
  4986

If I try to increase the value in /usr/local/texlive/2023/texmf.cnf, as suggested, it will instead fail like this:

  ! TeX capacity exceeded, sorry [main memory size=5000000].
  l.2499 ...NANANANANANANANANANNANANANANANANANANANAN
                                                    ANANANANANANANANANANANANAN...

That, I think, provides a clear indication that NA garbage is behind the issue.

It'd be possible to dig deeper and figure out, given the bad indexing logic, what features of these particular packages are leading to so many NAs, but I haven't done so.

but the fixes and regression tests look good

Thanks for reviewing. I was getting pretty turned around while stepping through the wrap_texti/paste_pieces logic, so I'm glad to hear those changes make sense to you.

@kyleam kyleam merged commit 4edcf72 into main Jul 16, 2024
5 checks passed
@kyleam kyleam deleted the wrap-fixes branch July 16, 2024 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants