Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the inflammation dataset with a dataset in long format that has column names #612

Open
dmcglinn opened this issue Jan 25, 2024 · 2 comments
Assignees
Labels
discussion Ongoing, in order to reach an agreement

Comments

@dmcglinn
Copy link

How could the content be improved?

Hey I love the lesson and use it every time I teach R; however, from a pedagogical perspective the inflammation dataset is just strangely structured (wide format without column names) and so the code used to read it in (e.g., read.csv(..., header = FALSE), and then to work with it graphically (plot(1:length(...), ) doesn't set the students up to be able to generalize their knowledge to a more standard data structure that they also want to read in and graph. Just wanted to pass that along (as I correct my students's coding errors :P) if there is every a rehashing of this content - switch to a more commonly encountered data structure. Thanks for all the hard work.

Which part of the content does your suggestion apply to?

Reading and plotting the data.

@isaac-jennings isaac-jennings added the discussion Ongoing, in order to reach an agreement label Apr 2, 2024
@isaac-jennings isaac-jennings self-assigned this Apr 2, 2024
@isaac-jennings
Copy link
Contributor

Hi @dmcglinn, thank you for the contribution.

I certainly see your point of view, and I am currently in two minds. One being in agreement with what you have described in your issue. On the other hand, I feel as though the dataset may have been intentionally implemented this way, as the context of the datasets is that they are derived from treatment or device measurements. Possibly replicating output for scientific devices/hardware. Having said that, tidy data is almost certainly the best structure from a teaching/delivery/learning perspective.

Labeling this as discussion for @Bisaloo and @HaoZeke; any thoughts?

@Bisaloo
Copy link
Member

Bisaloo commented Apr 6, 2024

I completely agree we're likely to have to update the dataset at some point. The lesson & the whole ecosystem have evolved a lot since the dataset was first picked and it makes sense that it's no longer the best fit.

However, it's an important change with many implications. Among the least obvious implications, we may have to coordinate with the python-novice-inflammation lesson to decide if we want to stay in sync and pull the plug in a coordinated manner.

I would suggest that we start a meta-issue with all the feedback and request about the dataset to gather requirements about a potential new dataset and then get in touch with the curriculum advisory committee and the rest of the community. I may be that the recommendation is to fork the lesson for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Ongoing, in order to reach an agreement
Projects
None yet
Development

No branches or pull requests

3 participants