Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Read R help pages not working as described #242

Open
3 tasks done
stevegbrooks opened this issue Feb 26, 2025 · 8 comments
Open
3 tasks done

[Bug]: Read R help pages not working as described #242

stevegbrooks opened this issue Feb 26, 2025 · 8 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@stevegbrooks
Copy link
Contributor

Confirm setup

  • I have installed the development version of {gptstudio} (pak::pak("MichelNivard/gptstudio")) and tested if the problem remains.
  • I have installed the {reprex} and {sessioninfo} packages to be able to run this issue's code snippet pak::pak(c("reprex", "sessioninfo")).

What happened?

I was trying to use the "Read R help pages" feature in gptstudio::gptstudio_chat(), and when I typed into the prompt "tell me about dplyr::join_by()", it did not append the message "R documentation: dplyr::join_by" to the chat history as described in #210.

Then, when I tried out a package that the LLM certainly was not trained on, e.g., "Tell me about dv.loader::load_data", it failed to retrieve the help pages. So something about this feature isn't working in the dev version.

Image

Relevant log output

NA

Session info

r
gptstudio::gptstudio_sitrep()
#> 
#> ── Configuration for gptstudio ─────────────────────────────────────────────────
#> Using user configuration file at
#> '/fsx/home/brooksst/.config/R/gptstudio/config.yml'
#> 
#> 
#> ── Current Settings ──
#> 
#> 
#> 
#> - Model: Using ENV variables
#> 
#> - Task: coding
#> 
#> - Language: en
#> 
#> - Service: azure_openai
#> 
#> - Custom prompt:
#> 
#> - Stream: TRUE
#> 
#> - Code style: tidyverse
#> 
#> - Skill: beginner
#> 
#> 
#> 
#> ── Checking API connections ──
#> 
#> 
#> 
#> ── Checking OpenAI API connection 
#> 
#> ✖ API key is not set or invalid for OpenAI service.
#> 
#> 
#> 
#> ── Checking HuggingFace API connection 
#> 
#> ✖ API key is not set or invalid for HuggingFace service.
#> 
#> 
#> 
#> ── Checking Anthropic API connection 
#> 
#> ✖ API key is not set or invalid for Anthropic service.
#> 
#> 
#> 
#> ── Checking Google AI Studio API connection 
#> 
#> ✖ API key is not set or invalid for Google AI Studio service.
#> 
#> 
#> 
#> ── Checking Azure OpenAI API connection 
#> 
#> ✔ Successfully connected to the Azure OpenAI API service.
#> 
#> 
#> 
#> ── Checking Perplexity API connection 
#> 
#> ✖ API key is not set or invalid for Perplexity service.
#> 
#> 
#> 
#> ── Checking Cohere API connection 
#> 
#> ✖ API key is not set or invalid for Cohere service.
#> 
#> 
#> 
#> ── Check Ollama for Local API connection 
#> 
#> ✖ Couldn't connect to Ollama in <http://localhost:11434>. Is it running there?
#> 
#> 
#> 
#> ── Getting help ──
#> 
#> 
#> 
#> See the gptstudio homepage (<https://michelnivard.github.io/gptstudio/>) for
#> getting started guides and package documentation. File an issue or contribute
#> to the package at the GitHub repo
#> (<https://github.com/MichelNivard/gptstudio>).
#> ── End of gptstudio configuration ──────────────────────────────────────────────


<sup>Created on 2025-02-26 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>

<details style="margin-bottom:10px;">

<summary>

Session info
</summary>

 r
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14)
#>  os       Ubuntu 22.04.5 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Dublin
#>  date     2025-02-26
#>  pandoc   3.3 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  cli           3.6.3      2024-06-21 [1] RSPM
#>  curl          6.0.1      2024-11-14 [1] RSPM (R 4.4.1)
#>  digest        0.6.36     2024-06-23 [1] RSPM
#>  evaluate      0.24.0     2024-06-10 [1] RSPM
#>  fansi         1.0.6      2023-12-08 [3] RSPM (R 4.4.1)
#>  fastmap       1.2.0      2024-05-15 [1] RSPM
#>  fs            1.6.5      2024-10-30 [1] RSPM (R 4.4.1)
#>  glue          1.7.0      2024-01-09 [1] RSPM
#>  gptstudio     0.4.0.9010 2025-02-26 [1] Github (MichelNivard/gptstudio@83490c1)
#>  htmltools     0.5.8.1    2024-04-04 [1] RSPM
#>  htmlwidgets   1.6.4      2023-12-06 [1] RSPM (R 4.4.1)
#>  httpuv        1.6.15     2024-03-26 [1] RSPM (R 4.4.1)
#>  httr2         1.0.7      2024-11-26 [1] RSPM (R 4.4.1)
#>  jsonlite      1.8.8      2023-12-04 [1] RSPM
#>  knitr         1.49       2024-11-08 [1] RSPM (R 4.4.1)
#>  later         1.4.1      2024-11-27 [1] RSPM (R 4.4.1)
#>  lifecycle     1.0.4      2023-11-07 [1] RSPM
#>  magrittr      2.0.3      2022-03-30 [1] RSPM (R 4.4.1)
#>  mime          0.12       2021-09-28 [1] RSPM (R 4.4.1)
#>  pillar        1.9.0      2023-03-22 [3] RSPM (R 4.4.1)
#>  promises      1.3.2      2024-11-28 [1] RSPM (R 4.4.1)
#>  R6            2.5.1      2021-08-19 [1] RSPM (R 4.4.1)
#>  rappdirs      0.3.3      2021-01-31 [1] RSPM (R 4.4.1)
#>  Rcpp          1.0.13-1   2024-11-02 [1] RSPM (R 4.4.1)
#>  reprex        2.1.1      2024-07-06 [1] RSPM
#>  rlang         1.1.4      2024-06-04 [1] RSPM
#>  rmarkdown     2.29       2024-11-04 [1] RSPM (R 4.4.1)
#>  rstudioapi    0.16.0     2024-03-24 [1] RSPM (R 4.4.1)
#>  sessioninfo   1.2.2      2021-12-06 [1] RSPM
#>  shiny         1.9.1      2024-08-01 [1] RSPM (R 4.4.1)
#>  utf8          1.2.4      2023-10-22 [3] RSPM (R 4.4.1)
#>  vctrs         0.6.5      2023-12-01 [1] RSPM
#>  withr         3.0.2      2024-10-28 [1] RSPM (R 4.4.1)
#>  xfun          0.49       2024-10-31 [1] RSPM (R 4.4.1)
#>  xtable        1.8-4      2019-04-21 [1] RSPM (R 4.4.1)
#>  yaml          2.3.10     2024-07-26 [1] RSPM (R 4.4.1)
#> 
#>  [1] /fsx/home/brooksst/R/x86_64-pc-linux-gnu-library/4.4
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/local/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────


</details>

Code of Conduct

  • I agree to follow this project's Code of Conduct
@stevegbrooks stevegbrooks added the bug an unexpected problem or unintended behavior label Feb 26, 2025
@stevegbrooks
Copy link
Contributor Author

For the record, i have the dv.loader package installed:

> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Dublin
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dv.loader_2.1.0

loaded via a namespace (and not attached):
 [1] miniUI_0.1.1.1       jsonlite_1.8.8       compiler_4.4.1       promises_1.3.2       reprex_2.1.1         Rcpp_1.0.13-1        parallel_4.4.1      
 [8] callr_3.7.6          later_1.4.1          jquerylib_0.1.4      globals_0.16.3       yaml_2.3.10          fastmap_1.2.0        mime_0.12           
[15] R6_2.5.1             curl_6.0.1           httr2_1.0.7          knitr_1.49           htmlwidgets_1.6.4    tibble_3.2.1         future_1.34.0       
[22] shiny_1.9.1          pillar_1.9.0         bslib_0.8.0          gptstudio_0.4.0.9010 rlang_1.1.4          utf8_1.2.4           cachem_1.1.0        
[29] httpuv_1.6.15        xfun_0.49            fs_1.6.5             sass_0.4.9           cli_3.6.3            withr_3.0.2          magrittr_2.0.3      
[36] ps_1.8.1             processx_3.8.4       digest_0.6.36        rstudioapi_0.16.0    xtable_1.8-4         rappdirs_0.3.3       lifecycle_1.0.4     
[43] vctrs_0.6.5          evaluate_0.24.0      glue_1.7.0           listenv_0.9.1        codetools_0.2-20     fansi_1.0.6          parallelly_1.38.0   
[50] rmarkdown_2.29       pkgconfig_2.0.3      tools_4.4.1          htmltools_0.5.8.1  

@calderonsamuel
Copy link
Collaborator

You are totally right. I guess we forgot to mention that this is implemented only for OpenAI at the moment. This happened because it was the first one to support a "name" and not just a "role" in each message.

I'll rename the issue to have a more clear action path. However, I believe the migration to {ellmer} logic would have a higher priority. WDYT @JamesHWade ?

@stevegbrooks
Copy link
Contributor Author

stevegbrooks commented Feb 27, 2025

makes sense, thanks!

I switched to our openai-compatible API, and I was able to get it to work, but had some issues with hallucination. This is a good use case, because there's no way these models were trained on this package as it was only recently open-sourced:

Image

However, the dv.loader::load_data help page clearly shows the params as being:

Image

Not sure how we can get the LLM to more strictly abide by the R documentation for its reply. I tried changing the custom prompt, but that didn't help. Any suggestions?

@stevegbrooks
Copy link
Contributor Author

I think i found the issue:

gptstudio:::read_docs("dplyr::mutate")
[[1]]
[[1]]$pkg_ref
[1] "dplyr"

[[1]]$topic
[1] "mutate"

[[1]]$inner_text
[[1]]$inner_text$title
[1] "Create, modify, and delete columns"

[[1]]$inner_text$description
NULL

[[1]]$inner_text$usage
NULL

[[1]]$inner_text$arguments
NULL

[[1]]$inner_text$format
NULL

[[1]]$inner_text$value
NULL

[[1]]$inner_text$examples
NULL



> sessioninfo::package_info("gptstudio", dependencies = F)
 package   * version    date (UTC) lib source
 gptstudio * 0.4.0.9010 2025-02-26 [1] Github (MichelNivard/gptstudio@83490c1)

 [1] /fsx/home/brooksst/R/x86_64-pc-linux-gnu-library/4.4
 [2] /usr/local/lib/R/site-library
 [3] /usr/local/lib/R/library

It looks like the read_docs function is broken. I looked into a bit further, and the Rd2HTML function isn't parsing the Rd file completely. I can open a pull request to fix this.

@calderonsamuel
Copy link
Collaborator

Thanks @stevegbrooks . I believe the malfunction in the parsing process is a separate issue. Could you open a new ticket with your reprex? In this case, knowing the R version should be useful. Any PR with a fix should link to the new issue

@stevegbrooks
Copy link
Contributor Author

Ok sure - will do.

@calderonsamuel
Copy link
Collaborator

We still need to extend this feature to other services!

@stevegbrooks
Copy link
Contributor Author

oh my bad - sorry. I'll open a new issue though for this new bug. thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants