Skip to content

Conversation

@asinghvi17
Copy link
Collaborator

@asinghvi17 asinghvi17 commented Apr 2, 2024

This is a combined PR for a bunch of different PRs that are currently up. Below is a summary of changes:

  • Add metadata to the dataframe returned by dataset, indicating that the dataframe was generated by RDatasets.jl and mentioning its package and dataset name as a Tuple. This is essentially a call DataFrames.metadata!(df, "RDatasets.jl" => (package_name, dataset_name)).
  • Add a description function to RDatasets, make it readable in the REPL
    • Make this function discoverable, document it.
  • Bump RData.jl compat to 1.
  • Add instructions for data addition and improve data addition script
  • Bump version to v0.8

PRs #135 from @frankier and #124 from @jbrea are incorporated here.

@codecov-commenter
Copy link

codecov-commenter commented Apr 2, 2024

Codecov Report

❌ Patch coverage is 17.64706% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.61%. Comparing base (b1a5959) to head (30ad0b0).

Files with missing lines Patch % Lines
src/dataset.jl 17.64% 42 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (b1a5959) and HEAD (30ad0b0). Click for more details.

HEAD has 11 uploads less than BASE
Flag BASE (b1a5959) HEAD (30ad0b0)
15 4
Additional details and impacted files
@@             Coverage Diff             @@
##           master     #145       +/-   ##
===========================================
- Coverage   83.33%   36.61%   -46.72%     
===========================================
  Files           3        4        +1     
  Lines          24       71       +47     
===========================================
+ Hits           20       26        +6     
- Misses          4       45       +41     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@asinghvi17 asinghvi17 requested a review from bkamins April 2, 2024 13:46
@asinghvi17 asinghvi17 marked this pull request as ready for review April 2, 2024 13:52
@bkamins
Copy link
Contributor

bkamins commented Apr 4, 2024

The changes look to make sense. I left one comment. I am not a maintainer of this package (and I do not know its internals). Maybe @nalimilan knows who has appropriate knowledge of the internals to approve it. Thank you for working on it.

@kdpsingh
Copy link

Appreciate everyone's work on this package.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! I have a few comments.

src/dataset.jl Outdated
Comment on lines 42 to 44
!!! note Unexported
This function is left deliberately unexported, since the name is pretty common.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a standard pattern AFAIK. Better mark the function as public via @compat public description at the same place as exports. This is available since Compat 3.47.0 and 4.10.0. Could also add packages to that list BTW.

Suggested change
!!! note Unexported
This function is left deliberately unexported, since the name is pretty common.

src/dataset.jl Outdated
RDatasets.description(package_name::AbstractString, dataset_name::AbstractString)
RDatasets.description(df::DataFrame) # only call this on dataframes from RDatasets!
Returns an `RDatasetDescription` object containing the description of the dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Returns an `RDatasetDescription` object containing the description of the dataset.
Return an `RDatasetDescription` object containing the description of the dataset.

src/dataset.jl Outdated

"""
RDatasets.description(package_name::AbstractString, dataset_name::AbstractString)
RDatasets.description(df::DataFrame) # only call this on dataframes from RDatasets!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this information in the docstring body instead. Also say what happens if that's not the case.

Suggested change
RDatasets.description(df::DataFrame) # only call this on dataframes from RDatasets!
RDatasets.description(df::DataFrame)

src/dataset.jl Outdated
error("Unable to locate dataset file $rdaname or $csvname")
end
# Finally, inject metadata into the dataframe to indicate origin:
DataFrames.metadata!(dataset, "RDatasets.jl", (string(package_name), string(dataset_name)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed AFAICT:

Suggested change
DataFrames.metadata!(dataset, "RDatasets.jl", (string(package_name), string(dataset_name)))
metadata!(dataset, "RDatasets.jl", (string(package_name), string(dataset_name)))

src/dataset.jl Outdated
The main purpose of its existence is to provide a way to display the content
differently in HTML and markdown contexts.
Invoked through [`RDatasets.description`](@ref).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Invoked through [`RDatasets.description`](@ref).
Obtained through [`RDatasets.description`](@ref).

src/dataset.jl Outdated
Comment on lines 59 to 60
if "RDatasets.jl" in DataFrames.metadatakeys(df)
package_name, dataset_name = DataFrames.metadata(df, "RDatasets.jl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if "RDatasets.jl" in DataFrames.metadatakeys(df)
package_name, dataset_name = DataFrames.metadata(df, "RDatasets.jl")
if "RDatasets.jl" in metadatakeys(df)
package_name, dataset_name = metadata(df, "RDatasets.jl")

Project.toml Outdated
name = "RDatasets"
uuid = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
version = "0.7.7"
version = "0.8.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good occasion to tag 1.0.0. Clearly the package is stable enough.

Suggested change
version = "0.8.0"
version = "1.0.0"

Comment on lines +21 to +22
RDatasets.description(iris) # only use this on DataFrames returned from `dataset`!
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RDatasets.description(iris) # only use this on DataFrames returned from `dataset`!
```
RDatasets.description(iris)
```
Only use the latter on data frames returned from `dataset`.

@ablaom
Copy link
Collaborator

ablaom commented Dec 8, 2025

asinghvi17 Sorry to ping (and thanks for the work here) - any progress here?

@andreasnoack
Copy link
Member

I've tagged a minimal maintenance release versioned 0.8.0, but it would be good to get this one finalized for a 1.0 release.

@asinghvi17
Copy link
Collaborator Author

Thanks for the ping here! I had lost track of this a bit - will address the comments.

asinghvi17 and others added 8 commits December 13, 2025 09:30
Co-authored-by: jbrea <jbrea@users.noreply.github.com>
* Streamline adding a new dataset

 * Add instructions to README for adding a new dataset
 * Add scripts to update the dataset metadata
 * Add update_doc method to only add a single dataset
 * Add HTML documentation generation to update_doc
 * Change update_doc to correctly round trip quotes in the metadata CSV

* Sort datasets CSV

* Allow datasets with a .RData extension as well as .rda

---------

Co-authored-by: Frankie Robertson <frankie@robertson.name>
This allows them to be displayed in a much better way in the REPL.
- Fix docstring conventions: "Returns" -> "Return", "Invoked" -> "Obtained"
- Capitalize "Markdown" consistently in documentation
- Move DataFrame constraint info from signature comment into docstring body
- Remove unnecessary DataFrames. prefixes (use metadata!, metadatakeys, metadata directly)
- Replace unexported note with @public declaration for description and packages
- Add SciMLPublic.jl dependency for @public macro
- Throw error instead of warning when DataFrame lacks RDatasets metadata
- Add `default` keyword argument to description(df) for graceful fallback
- Bump version to 1.0.0

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Milan Bouchet-Valat <nalimilan@club.fr>
Co-Authored-By: Claude <noreply@anthropic.com>
@ablaom
Copy link
Collaborator

ablaom commented Dec 18, 2025

@asinghvi17 Are you ready for @nalimilan to take another look?

@asinghvi17
Copy link
Collaborator Author

asinghvi17 commented Dec 19, 2025

Not quite yet, still a couple things to fix up. Give me a couple days?

Edit: sorry, that came out a bit more aggressive than I intended :D - meant to check back in a couple days where I should have something

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants