Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some quotes seem to be excessively long #35

Open
friendly opened this issue Oct 8, 2023 · 1 comment
Open

Some quotes seem to be excessively long #35

friendly opened this issue Oct 8, 2023 · 1 comment

Comments

@friendly
Copy link
Owner

friendly commented Oct 8, 2023

In data-raw/quote-stats.R I calculated length of quotes in characters, words, sentences. Some quotes seem very long to me and perhaps should be reviewed manually.

library(statquotes)
library(stringr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
 ...
qt <- get_quotes()
text <- qt$text

count characters, words, sentences

stats <- data.frame(
  qid = qt$qid,
  chars = str_count(text, boundary("character")),
  words = str_count(text, boundary("word")),
  sent = str_count(text, boundary("sentence")),
  txt = substr(qt$text, 1, 40)
)

which are the longest?

stats |>
  dplyr::slice_max(words, n=12) |>
  dplyr::arrange(qid)
#>    qid chars words sent                                      txt
#> 1  297   860   151   10 The goals in statistics are to use data 
#> 2  336   996   171    6 It is difficult to understand why statis
#> 3  349  1034   158    7 Scholars feel the need to present tables
#> 4  371  1155   194    9 It's not easy to select more than a few 
#> 5  394   818   149    6 It was always important for the biometri
#> 6  426  1157   178   11 In contrast to the logical development a
#> 7  442  1071   161    7 An important distinction needs to be mad
#> 8  472  1015   146    7 In marked contrast to what is advocated 
#> 9  521   920   172    6 What is the probability of obtaining a d
#> 10 524   988   170    8 We admit with Sir Winston Churchill that
#> 11 531  1145   191    8 An important part of the explanation [of
#> 12 602   926   156    6 An observation is judged significant, if

Created on 2023-10-08 with reprex v2.0.2

@friendly
Copy link
Owner Author

friendly commented Oct 8, 2023

In commit 310a474 I reduced 4 quotes in length (qid:297, 371, 426, 472) and decided that those were the worst offenders.

Also, there were many quotes missing tags; I added a whole bunch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant