text wrapping fixes #69

kyleam · 2024-07-11T14:48:22Z

This series fixes three wrap_text() bugs. I started looking into this because rendering the scorecards for six packages on MPN (gsDesign, mlr, pals, r2rtf, tidyposterior, and vcr) failed with a TeX error related to long lines ("Unable to read an entire line..."). The failure was introduced by 773c9ee (bug fix and refactor: tracability matrix, 2024-03-19, gh-57).

That failure is resolved by the fourth commit ("wrapi_text: fix indexing logic"). The second and fifth commits have the other two fixes.

wait for DESCRIPTION: add pkgdown site to URL field #67 and ci: update R versions to cover latest Metworx #68 to land

When handling a string (say "abcde") that exceeded the specified width (say 2), wrapi_text() uses two values to put the final string together: * start: a string that serves as frozen/good start. It was calculated during the last iteration that exceeded the maximum width ("ab\nc", for the example above when splitting on each character and at the iteration for "e"). * add: a string representing the characters after `start` through the character of the current iteration ("d\ne"). When wrapi_text() joins these two values, it includes a newline, leading to a spurious newline between "c" and "d". Drop the newline, leaving just the original character that was used to split the string ("" in the example above).

This is already used in two spots, and the next commit will introduce a third. (While touching these lines, switch to more conventional spacing.)

wrapi_text() splits a string into pieces and then builds it back up, inserting a newline before the next piece if it would lead to the string exceeding the specified width. The `shift` variable is set whenever extending the string with the next piece would exceed the maximum width. It marks the last position that did _not_ exceed the bound. The logic for incrementing `shift`, however, is flawed. It's calculated by adding the current `shift` value to the current iteration's index, resulting in skipped values and NAs (from out-of-bounds indexing). One way to correct the value would be to set `shift` to the index for the current iteration. Instead set it to the index for the _next_ iteration, redefining `shift` to be the first piece to consider when extending the string. Defining it this way lets us use `shift` rather than `1 + shift` when indexing.

wrapi_text() splits a string into pieces and then builds it back up, inserting a newline before the next piece if it would lead to the string exceeding the specified width. The result is stored in the `start` variable, which gets extended each time the width is exceeded. To extend `start`, wrapi_text() appends the value of {pieces that did not exceed width}{sep}{newline}{piece that exceeded width} wrapi_text() incorrectly assumes that there is at least one piece that did _not_ exceed the width since the last time `start` was extended, leading to a malformed result with the last piece repeated in an incorrect position. Update the logic to account for the fact that successive pieces can exceed the specified width.

kyleam · 2024-07-11T14:51:32Z

Regression test failures when executed with wrapi_text from main (3e8399d):

── Failed tests ────────────────────────────────────────────────────────────────
Failure ('test-util.R:39:3'): wrap_text(): past offenders
wrap_text("abcde", width = 2, wrap_sym = NULL, strict = TRUE) not identical to "ab\ncd\ne".
1/1 mismatches
x[1]: "ab\nc\nd\ne"
y[1]: "ab\ncd\ne"

Failure ('test-util.R:44:3'): wrap_text(): past offenders
wrap_text("abcdefghij", width = 4, wrap_sym = NULL, strict = TRUE) not identical to "abcd\nefgh\nij".
1/1 mismatches
x[1]: "abcd\ne\nfgh\ni\nNANANANANAji\nj"
y[1]: "abcd\nefgh\nij"

Failure ('test-util.R:49:3'): wrap_text(): past offenders
wrap_text(...) not identical to "test-foo\n.R\ntest-foo\n_bar.R".
1/1 mismatches
x[1]: "test-foo\n.\nR\ntest-foo\n_\nNANANANANAR.rab_\nb\nNANANANANANANANANANANAN
x[1]: ANANANANANANANANANANANANANANAR.rab\na\nNANANANANANANANANANANANANANANANANAN
x[1]: ANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANAR.ra\nr\nNANA
x[1]: NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
x[1]: NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANAR.r\n.\...
y[1]: "test-foo\n.R\ntest-foo\n_bar.R"

Failure ('test-util.R:57:3'): wrap_text(): past offenders
wrap_text(...) not identical to "test\n-foo\n.R\ntest\n-foo\n_bar\n.R".
1/1 mismatches
x[1]: "test\n-\nfoo\n.\ntset\nR.\nR\nR.rab_oof-tset\nR\n\n\nNANANANANANANANANANA
x[1]: NAR.rab_oof-tset\n\nt\nNANANANANANANANANANANANANANANANANANANANANANANAR.rab
x[1]: _oof-tset\ne\nNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
x[1]: NANANANANANAR.rab_oof-tse\ns\nNANANANANANANANANANANANANANANANANANANANANANA
x[1]: NANANANANANANANANANANANANANANANANANANANANANANANANANANANAR.rab_oof-ts\nt...
y[1]: "test\n-foo\n.R\ntest\n-foo\n_bar\n.R"

Failure ('test-util.R:65:3'): wrap_text(): past offenders
wrap_text("foo/bar/baz", width = 4) not identical to "foo/\nbar/\nbaz".
1/1 mismatches
x[1]: "foo/\nbar/\nbaz/bar/\nbaz"
y[1]: "foo/\nbar/\nbaz"

[ FAIL 5 | WARN 0 | SKIP 0 | PASS 9 ]

kyleam · 2024-07-11T15:10:16Z

tests/testthat/test-util.R

+
+  expect_identical(
+    wrap_text("foo/bar/baz", width = 4),
+    "foo/\nbar/\nbaz"


Note that all these cases involve multiple splits on the same separator. Although I haven't looked into it closely, I think that is why none of these issues were noticed when looking at the bbr example in gh-57. For example, man/check_nonmem_table_output.Rd is split to man/NEWLINEcheck_nonmem_table_NEWLINEoutput.Rd. However, despite that having two newlines inserted, that's one split on "/" and then another on "_". For any given separator, paste_pieces() doesn't get far enough into its logic to hit the bugs.

barrettk

The exact cause of the issue was a bit confusing, but the fixes and regression tests look good

kyleam · 2024-07-15T19:06:12Z

The exact cause of the issue was a bit confusing [...]

There are still some aspects I'm not sure about, but fwiw here's how I understand the link between b4ecbd1 and the "Unable to read an entire line..." error.

The out-of-bounds indexing can produce very long lines due to all the NAs. Trying to render mlr, for example, errors with

! ! Unable to read an entire line---bufsize=200000.
! Please increase buf_size in texmf.cnf.

! Please increase buf_size in texmf.cnf.

! xdvipdfmx:fatal: Reading DVI file failed!

And here are the long lines in that tex file:

$ while read line; do printf '%s' "$line" | wc -c; done <mlr_2.19.2.scorecard.tex | sort -gr | head
  4867912
  3756604
  14350
  12131
  10215
  6341
  5778
  5737
  5303
  4986

If I try to increase the value in /usr/local/texlive/2023/texmf.cnf, as suggested, it will instead fail like this:

  ! TeX capacity exceeded, sorry [main memory size=5000000].
  l.2499 ...NANANANANANANANANANNANANANANANANANANANAN
                                                    ANANANANANANANANANANANANAN...

That, I think, provides a clear indication that NA garbage is behind the issue.

It'd be possible to dig deeper and figure out, given the bad indexing logic, what features of these particular packages are leading to so many NAs, but I haven't done so.

but the fixes and regression tests look good

Thanks for reviewing. I was getting pretty turned around while stepping through the wrap_texti/paste_pieces logic, so I'm glad to hear those changes make sense to you.

kyleam added 5 commits July 11, 2024 09:56

wrapi_text: simplify checkmate assertion

858f6e1

wrapi_text: assign length of pieces to variable

24319e5

This is already used in two spots, and the next commit will introduce a third. (While touching these lines, switch to more conventional spacing.)

kyleam commented Jul 11, 2024

View reviewed changes

kyleam requested a review from barrettk July 11, 2024 15:10

Base automatically changed from ci-r-version-update to main July 12, 2024 14:58

barrettk approved these changes Jul 15, 2024

View reviewed changes

kyleam merged commit 4edcf72 into main Jul 16, 2024
5 checks passed

kyleam deleted the wrap-fixes branch July 16, 2024 12:30

kyleam mentioned this pull request Aug 9, 2024

Add support for externally scored packages #70

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text wrapping fixes #69

text wrapping fixes #69

kyleam commented Jul 11, 2024 •

edited

Loading

kyleam commented Jul 11, 2024

kyleam Jul 11, 2024 •

edited

Loading

barrettk left a comment

kyleam commented Jul 15, 2024 •

edited

Loading

text wrapping fixes #69

text wrapping fixes #69

Conversation

kyleam commented Jul 11, 2024 • edited Loading

kyleam commented Jul 11, 2024

kyleam Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

barrettk left a comment

Choose a reason for hiding this comment

kyleam commented Jul 15, 2024 • edited Loading

kyleam commented Jul 11, 2024 •

edited

Loading

kyleam Jul 11, 2024 •

edited

Loading

kyleam commented Jul 15, 2024 •

edited

Loading