Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

202 fix trimming@main #203

Merged
merged 27 commits into from
Oct 11, 2023
Merged

202 fix trimming@main #203

merged 27 commits into from
Oct 11, 2023

Conversation

ayogasekaram
Copy link
Contributor

closes #202

related to rtables issue #630

@ayogasekaram ayogasekaram linked an issue Sep 26, 2023 that may be closed by this pull request
@ayogasekaram ayogasekaram added the sme Tracks changes for the sme board label Sep 26, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 26, 2023

badge

Code Coverage Summary

Filename             Stmts    Miss  Cover    Missing
-----------------  -------  ------  -------  ----------------------------------------------------------------------------------------------------------
R/format_value.R       197      14  92.89%   94, 110-117, 197, 216, 287, 409, 420, 428
R/generics.R            98       4  95.92%   455, 467, 500, 529
R/labels.R              55       7  87.27%   53, 59, 68, 109, 137, 146, 150
R/matrix_form.R        416      27  93.51%   355-356, 451, 463-464, 483, 514, 603-604, 619-624, 651-652, 684-685, 717-718, 750, 836, 884, 936, 938, 941
R/mpf_exporters.R      124      12  90.32%   2, 84-86, 180, 220, 225, 408, 410, 415-416, 442
R/page_size.R           45       1  97.78%   169
R/pagination.R         479      22  95.41%   400, 472, 592-593, 598, 652-653, 671-679, 965, 972, 996-998, 1131-1132, 1144-1145, 1157-1158
R/tostring.R           493      25  94.93%   83, 135, 198, 232, 240-241, 276, 329-330, 419-421, 428-431, 600-601, 783, 878, 931, 972, 1017, 1072, 1079
R/utils.R                3       0  100.00%
TOTAL                 1910     112  94.14%

Diff against main

Filename             Stmts    Miss  Cover
-----------------  -------  ------  --------
R/mpf_exporters.R       +1       0  +0.08%
R/tostring.R           +61      +8  -1.14%
R/utils.R               +2       0  +100.00%
TOTAL                  +64      +8  -0.23%

Results for commit: 61b1e36

Minimum allowed coverage is 80%

♻️ This comment has been updated with latest results

@github-actions
Copy link
Contributor

github-actions bot commented Sep 26, 2023

Unit Tests Summary

    1 files      4 suites   7s ⏱️
  33 tests   33 ✔️ 0 💤 0
227 runs  227 ✔️ 0 💤 0

Results for commit 61b1e36.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@Melkiades Melkiades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking into it and as I thought we need basically to rewrite wrap_txt and wrap_string. They do a poor job at keeping the original string intact and it is not well documented (in-code helper functions e.g.). I was thinking that this could have been avoided for longer as it is not extremely necessary to allow users to add spaces and newlines everywhere, but we need the flag to be able to do so eventually, also considering the case of decimal alignment. If anything, these functions do not have any test nor example and we need to compartmentalize the issue and the solution bottom-up

@Melkiades Melkiades marked this pull request as draft September 27, 2023 09:37
@Melkiades Melkiades marked this pull request as ready for review October 2, 2023 18:42
@Melkiades Melkiades requested review from Melkiades and removed request for shajoezhu, Melkiades and insights-engineering-bot October 2, 2023 18:43
@Melkiades Melkiades dismissed their stale review October 2, 2023 18:43

outdated

R/tostring.R Outdated Show resolved Hide resolved
@edelarua
Copy link
Contributor

edelarua commented Oct 2, 2023

Looks great! It seems to cause failures for listings because of the unused arguments in toString, and there's a minor fix that will need to be applied to summarize_coxreg in tern. Are there any other downstream changes you have identified that we need to address?

@ayogasekaram
Copy link
Contributor Author

I have a couple of tests failing in rtables with the change (exporters and pagination).

Co-authored-by: Emily de la Rua <59304861+edelarua@users.noreply.github.com>
Signed-off-by: Davide Garolini <dgarolini@gmail.com>
@Melkiades
Copy link
Contributor

Indeed something changed because the split of words is not optimal as it was before.

Before:

> toString(tt_for_wrap, max_width = lpp_tmp, tf_wrap = T, widths = clw) %>% cat
Enough long title
to be probably
wider than
expected

—————————————————————————————————————————————————————————————————————————————————
                      Incredibly long                                            
                     column name to be    This_should_be_som                     
                          wrapped            ewhere_split        C: Combination  
—————————————————————————————————————————————————————————————————————————————————
ASIAN                                                                            
  AGE                                                                            
    Mean                   32.50                36.68                36.99       
  EOSDY                                                                          
    Mean            A very long content      A very long          A very long    
                    to_be_wrapped_and_s   content to_be_wrap   content to_be_wrap
                          plitted          ped_and_splitted     ped_and_splitted 
BLACK OR AFRICAN                                                                 
AMERICAN                                                                         
  AGE                                                                            
    Mean                   34.27                34.93                33.71       
  EOSDY                                                                          
    Mean            A very long content      A very long          A very long    
                    to_be_wrapped_and_s   content to_be_wrap   content to_be_wrap
                          plitted          ped_and_splitted     ped_and_splitted 
—————————————————————————————————————————————————————————————————————————————————

Also this seems
quite wider than
expected
initially.

Now:

> toString(tt_for_wrap, max_width = lpp_tmp, tf_wrap = T, widths = clw) %>% cat
Enough long title
to be probably
wider than
expected

—————————————————————————————————————————————————————————————————————————————————
                      Incredibly long                                            
                     column name to be    This_should_be_som                     
                          wrapped            ewhere_split        C: Combination  
—————————————————————————————————————————————————————————————————————————————————
ASIAN                                                                            
  AGE                                                                            
    Mean                   32.50                36.68                36.99       
  EOSDY                                                                          
    Mean            A very long content      A very long          A very long    
                    to_be_wrapped_and_s        content              content      
                          plitted         to_be_wrapped_and_   to_be_wrapped_and_
                                               splitted             splitted     
BLACK OR AFRICAN                                                                 
AMERICAN                                                                         
  AGE                                                                            
    Mean                   34.27                34.93                33.71       
  EOSDY                                                                          
    Mean            A very long content      A very long          A very long    
                    to_be_wrapped_and_s        content              content      
                          plitted         to_be_wrapped_and_   to_be_wrapped_and_
                                               splitted             splitted     
—————————————————————————————————————————————————————————————————————————————————

Also this seems
quite wider
than expected
initially.

I will try to make it a bit smarter by taking into account the word before, but there is not a simple solution to this I think

@Melkiades
Copy link
Contributor

Ok I work on it a bit and I added a "smart" option that tries to estimate the optimal division. It is a recurrent for loop and I thought it would have been much slower, but surprise surprise:

str <- "A very long content to_be_wrapped_and_splitted and then something"
str_mb <- paste0(rep(str, 30), collapse = "")

microbenchmark::microbenchmark(
  "smart2" = wrap_string(str_mb, 2, smart = TRUE, collapse = "\n"),
  "smart5" = wrap_string(str_mb, 5, smart = TRUE, collapse = "\n"),
  "nsmart2" = wrap_string(str_mb, 2, smart = FALSE, collapse = "\n"),
  "nsmart5" = wrap_string(str_mb, 5, smart = FALSE, collapse = "\n"),
  times = 100
) %>% autoplot() + theme_classic()
Screenshot 2023-10-03 111035

@Melkiades
Copy link
Contributor

Melkiades commented Oct 3, 2023

btw for the reviewers @edelarua @ayogasekaram: 250+ line additions are simply comments, missing documentation, and missing tests. There were NO direct tests of the wrapping, hence imo the source of multiple problems and bugs downstream. Nonetheless, the wrapping algorithm now is entirely new and it has nothing shared with the old method. I kept wrap_txt for no other reason than the deprecation cycle. The indentation is as before removed and reinserted but now it is all self-contained in specific helper functions that have indentation checks that are a bit more formal than before. The indentation fixes were introduced by me when the wrapping went live and completely broke the indentation because base::strwrap destroys empty spaces. Now empty spaces are supported and they are destroyed only when trailing on a string that needs to be split because too long. Otherwise, they are kept as if they are a word (e.g. if you have " " (3 times \s), then it is considered as one word made of one white space). Failures related to \n need to be resolved in a separate PR as this is specifically related to wrap_string and toString refactoring while the \n solution needs to happen in matrix_form. In other words, here what it is missing from this refactoring that needs to touch other functions:

  • Special characters (\n and \t, for example) need to be resolved before going into toString (i.e. in matrix_form)
  • top left material needs to be better handled by matrix_form: indentation and wrapping are broken there and do not cooperate well with column names (which should be always top, I think - we can discuss this)
  • tf_wrap and max_width are overlapping parameters that should be merged into one (with if (!is.null(max_width)))
  • widths and colwidths are two parameters that do the same but there is no reason to keep them separate
  • more examples and vignette
  • Found downstream 3 exceptions to be handled more uniformly -> 1. ContentRows with no labels, 2. rtables::rtable() with no content does have mf_rinfo as null, 3. cases where input widths are 0 0 0

R/tostring.R Outdated Show resolved Hide resolved
@edelarua
Copy link
Contributor

This is good to go! :)

@Melkiades Melkiades merged commit 960daa7 into main Oct 11, 2023
@Melkiades Melkiades deleted the 202_fix_trimming@main branch October 11, 2023 15:25
edelarua added a commit to insightsengineering/tern that referenced this pull request Oct 11, 2023
See insightsengineering/formatters#203

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sme Tracks changes for the sme board
Projects
None yet
3 participants