Gosset Vignette 1: An Initial Overview for Building into R-Instat #9339

lilyclements · 2025-01-08T16:01:57Z

lilyclements
Jan 8, 2025
Maintainer

This is a discussion related to the Gosset Vignette 1 on incorporating the analyses into R-Instat. In this vignette, the authors demonstrate a "workflow to assess crop variety performance using decentralised on-farm testing data generated with the tricot approach".

I've attached the full R code to the bottom of this to run through, and split it meanwhile into the four parts that I can see (import, prepare, describe, model)

Import
In the vignette, they use their package data, nicabean. This splits into two data sets which they manually define but is automatically split into two data sets in R-Instat. Nice.

data("nicabean", package = "gosset")

Prepare

Thinking about what we would "define" in an equivalent "define" dialog. They seem to have: Traits, Items, Input, and ID as four that seem distinct. But this may become clearer.

To prepare for analysis, they transform the ordinal rankings into "Plackett-Luce rankings" (a sparse matrix) using the rank_numeric() function.

We need to think about how to make this more more R-Instat friendly. So far, I've just done that we use purrr::map instead of a for loop, but, the output of this is a list of a "rankings" class. I think we want to retain this "rankings" type for subsequent analyses to work. Do we want to just have this stored as an object to access later?
This is used for two descriptive types of analyses, described next. This is also referred to in the Modelling part, so it makes sense to me that this is defined once when preparing the data.

rankings_list <- traits %>%
  purrr::map(~ {
    dat %>%
      dplyr::filter(trait == .x) %>%
      gosset::rank_numeric(data = ., 
                           items = "item", 
                           input = "rank", 
                           id = "id", 
                           ascending = TRUE)
  })
names(rankings_list) <- traits

Describe

In this vignette, there are correlation descriptives. These both use the rankings_list created above.

Correlation between each trait and the overall score
Graph of how each variety scored for each trait

Model
In the model part, there are bits which arguably belong elsewhere. Or perhaps belong here if they're not used beyond this modelling part.

They look at the effect of rainfall on yield for this data. To do this, they use total rainfall (Rtotal) from CHIRPS data, accessed in R through the chirps package.
(They note that: "Additional covariates, such as temperature, can also be incorporated into a Plackett-Luce tree using packages like ag5Tools or nasapower" - perhaps something to investigate for this part when modelling).

They create a data frame containing the Yield rankings and CHIRPS rainfall data. Is this something which would be done just for this model, or for elsewhere? Is this something which the user would want to look at to examine? (Should this be done through our prepare set of dialogs in the menu. If so, do we currently have chirps_data in our prepare menus? We would need to have a way to run rainfall and group functions).
After creating this data frame, they model it using pltree.

In the modelling, they use pltree to create the model. Then to view it they use print, summary, node_labels, node_rules, top_items, plot. There will probably be other options too to offer.

After Modelling, they have methods to look at the model: Reliability and compare further, which I assume extend on our modelling to be in a "Use Model" type dialog.

## Vignette 1
# https://agrdatasci.github.io/gosset/articles/vignette-1-trait-prioritization-and-crop-performance.html
###################################################################

library(gosset)
library(PlackettLuce)
library(climatrends)
library(chirps)
library(ggplot2)
library(tidyverse)

# read in nicabean
# in R-Instat this reads in and gives the two data frames as we would like.
data("nicabean", package = "gosset")

trial = nicabean$trial
covar = nicabean$covar

# Definition: Define traits as one?
traits = unique(trial$trait)

# define items, input, and id?

# PREPARE =========================================================================
# Transform the ordinal rankings into Plackett-Luce rankings
# (a sparse matrix) using the rank_numeric() function.
# We iterate over the traits and add the rankings to a list called R.
# Since the varieties are ranked in ascending order (1 = high),
# we use the argument ascending = TRUE.


rankings_list <- traits %>%
  purrr::map(~ {
    trial %>%
      dplyr::filter(trait == .x) %>%
      gosset::rank_numeric(data = ., 
                           items = "item", 
                           input = "rank", 
                           id = "id", 
                           ascending = TRUE)
  })
names(rankings_list) <- traits


# DESCRIBE =========================================================================

# Correlation between ‘overall appreciation’ and the other traits ---------------------------

# 1. The Kendall correlation between overall appreciation and the other traits in the trial. 
baseline <- "OverallAppreciation"
baseline_trait <- rankings_list[[baseline]]

kendall <- rankings_list %>%
  purrr::keep(names(.) != baseline) %>%
  purrr::map_dfr(~ kendallTau(x = .x, y = baseline_trait), .id = "trait")

# The Kendall correlation indicates that farmers prioritized the traits yield, taste,
# and marketability when assessing overall appreciation.

# 2. Distances and the distribution of the kendall correlation coefficients.
# For that we use the function kendallTau_bootstrap() which resamples the data using a
# bootstrapping approach to draw an uniform distribution in the data.
kendall <- rankings_list %>%
  keep(names(.) != baseline) %>%
  map_dfr(
    ~ kendallTau_bootstrap(.x,
                           rankings_list[[baseline]],
                           nboot = 50,
                           seed = 1206),
    .id = "trait"
  ) %>%
  pivot_longer(cols = everything(), names_to = "trait", values_to = "kendallTau") 

ggplot(kendall, aes(y = trait, x = kendallTau)) +
  geom_boxplot() +
  labs(y = "", x = "Correlation with the 'Overall appreciation'") +
  theme_minimal()

# Performance of varieties across traits  ---------------------------
# The worth_map() function provides a visual tool to assess and compare
# variety performance across different traits.
# The values represented in a worth map are log-worth estimates.
# From a breeder or product developer perspective, the function worth_map() is a
# valuable tool for identifying variety performance across multiple traits and
# selecting crossing materials.

mod <- rankings_list %>%
  map(PlackettLuce)

worth_map(mod,
          labels = traits,
          labels.order = rev(traits)) +
  labs(x = "Variety",
       y = "Trait")

# MODEL =========================================================================
# The effect of rainfall on yield

# As mentioned there's two parts here - "Creating the Data" and "Modelling"
# It's unclear to me if we want to have the creating-the-data-frame part here, or as a different part of this menu. Perhaps that depends on if this is only used here or if it is used elsewhere? And on how much sense it makes for the user to examine this?

# 1. Getting the Data frame
# CHIRPS data is requested via the chirps package
# note this takes a while to run:

dates = c(min(covar[, "planting_date"]),
          max(covar[, "planting_date"]) + 70)

chirps = get_chirps(covar[, c("longitude","latitude")], 
                    dates = as.character(dates),
                    as.matrix = TRUE,
                    server = "ClimateSERV")

# rename the date variable
newnames = dimnames(chirps)[[2]]
newnames = gsub("chirps-v2.0.", "", newnames)
newnames = gsub("[.]", "-", newnames)
dimnames(chirps)[[2]] = newnames

# We compute the rainfall indices for the period from the planting date to the
# first 45 days of plant growth using the rainfall() function from the
# climatrends package [13].
rain = rainfall(chirps, day.one = covar$planting_date, span = 45)

# To link the rankings to covariates, they must be coerced into a
# ‘grouped_rankings’ object.
# This is done using the group() function from the PlackettLuce package.
# For this example, we retain only the rankings corresponding to yield.
G <- group(rankings_list[["Yield"]], index = 1:length(rankings_list[["Yield"]]))

# combine rankings and rainfall data to one df
pldG <- cbind(G, rain)

# 2. Now we have our df, pldG. We model this:

tree_model <- pltree(G ~ Rtotal, data = pldG, alpha = 0.1)
summary(tree_model)
node_labels(tree_model)
node_rules(tree_model)
top_items(tree_model, top = 3)
plot(tree_model, ref = "Amadeus 77")


# Reliability of superior varieties
#We can compute the reliability estimates of the evaluated common bean varieties in
# each of the resulting nodes of the Plackett-Luce tree.
# This helps identify varieties with a higher probability of outperforming a variety
# check (Amadeus 77). For simplicity, we present only the varieties with a
# reliability score >= 0.5
reliability(tree_model, ref = "Amadeus 77")

# Going beyond yield
# A more comprehensive approach to assessing the performance of varieties involves
# using “overall appreciation,” as this trait is expected to capture the performance
# of a variety not only for yield but also for all other traits prioritised by farmers.
# To support this hypothesis, we use the compare() function, which applies the
# method proposed by Bland and Altman (1986) [16] to assess the agreement between two
# different measures.
# Here, we compare overall appreciation and yield. If both measures completely agree,
# all the varieties should be centered at 0 on the Y-axis.

Overall = PlackettLuce(rankings_list[[baseline]])
Yield = PlackettLuce(rankings_list[[yield]])

compare(Overall, Yield) +
  labs(x = "Average log(worth)",
       y = "Difference (Overall appreciation - Yield)")

lilyclements · 2025-02-05T14:29:02Z

lilyclements
Feb 5, 2025
Maintainer Author

@rdstern I've been looking at running this vignette in R-Instat, and so I have provided a script below. This first vignette does not use grouped rankings in the describe part, just ranking objects, but, as a start I've put the script below which can be run in R-Instat.

I've added some musings on dialog ideas. Could you try running this in R-Instat, and let me know how you see this fitting into dialogs? (You can ignore my musings if you wish and we can see if we have the same idea! Or you can read them. They're just in comments throughout the script).

# Dialog: Import From Library
utils::data(package="gosset", X=nicabean)

data_book$import_data(data_tables=lapply(X=nicabean, FUN=data.frame))

# Right click menu: Convert Column(s) To Factor
data_book$convert_column_to_type(data_name="trial", col_names="trait", to_type="factor")

# Dialog: Unstack (Pivot Wider)
trial <- data_book$get_data_frame(data_name="trial")
trial_unstacked <- tidyr::pivot_wider(data=trial, names_from=trait, values_from=rank)
data_book$import_data(data_tables=list(trial_unstacked=trial_unstacked))

rm(list=c("trial_unstacked", "trial"))

# Column selection subdialog: Created new column selection
data_book$add_column_selection(data_name="trial_unstacked", name="all_trait_vars", column_selection=list(C0=list(operation="base::match", parameters=list(x=c("Vigor","Architecture","ResistanceToPests","ResistanceToDiseases","ToleranceToDrought","Yield","Marketability","Taste","OverallAppreciation")))), and_or="|")

# Create Rankings Object:
# This is then used in other dialogs throughout. I see this as a bit like the "Create Survival Object" for survival data
# So we have a dialog where you create the rankings (or grouped_rankings) object.
trial_unstacked <- data_book$get_data_frame("trial_unstacked")
traits <- data_book$get_column_selection(data_name = "trial_unstacked", name = "all_trait_vars") #get_object (all_trial_vars)
traits <- traits$conditions$C0$parameters$x
trial_unstacked <- trial_unstacked %>% tidyr::pivot_longer(cols = all_of(traits), names_to = "trait", values_to = "rank")

rankings_list <- traits %>%
  purrr::map(~ {
    trial_unstacked %>%
      dplyr::filter(trait == .x) %>%
      gosset::rank_numeric(data = ., 
                           items = "item", 
                           input = "rank", 
                           id = "id", 
                           ascending = TRUE)
  })
names(rankings_list) <- traits


# For other parts of the prepare:
# The "trial" data is of course clean as it is one of their data sets. I have enquired to Kaue about data used in the training (at the very least it would be nice to know what sort of data the participants come with so we can clean it!)


# DESCRIBE =========================================================================

# Correlation ----------------------------------------------------------------------

# We pick one of the levels in our traits to be the baseline. 

# 1. The Kendall correlation between overall appreciation (baseline level) and the other traits in the trial. 
baseline <- "OverallAppreciation"    # set OverallAppreciation variable as the baseline variable
baseline_trait <- rankings_list[["OverallAppreciation"]]
kendall_rankings <- rankings_list %>%
  purrr::keep(names(.) != baseline) %>%
  purrr::map_dfr(~ gosset::kendallTau(x = .x, y = baseline_trait), .id = "trait")
kendall_rankings 
# "The Kendall correlation indicates that farmers prioritized the traits yield, taste, and marketability when assessing overall appreciation."

# 2. Distances and the distribution of the kendall correlation coefficients.
# For that we use the function kendallTau_bootstrap() which resamples the data using a
# bootstrapping approach to draw an uniform distribution in the data.
kendall <- rankings_list %>%
  keep(names(.) != baseline) %>%
  purrr::map_dfr(~ gosset::kendallTau_bootstrap(x = .x, y = rankings_list[[baseline]], nboot = 50, seed = 1206), .id = "trait"
  ) %>%
  pivot_longer(cols = everything(), names_to = "trait", values_to = "kendallTau") 
# not sure how useful visualisation in a table is, but here you go:
kendall

# we can visualise in a plot, which is much more useful presumably!
ggplot(kendall, aes(y = trait, x = kendallTau)) +
  geom_boxplot() +
  labs(y = "", x = "Correlation with the 'Overall appreciation'") +
  theme_minimal()

# Adding into R-Instat:
# Amend the Correlations dialog to have a third tab? This gives Correlation of a rankings object.
# Selector takes ranking objects only
# Then another selector to give the variable to compare against ("Overall") - in this case, the OverallAppreciation variable.
# We run `kendallTau` and return a table
# You can get the kendallTau_bootstrap graphic if you click a checkbox for this display
# In the package, there is also kendallTau_permute which isn't covered in the vignette. I get an error with this stating that it is not supported on Windows. I will check with Kaue at a later date on this, but I assume this is not used.

# In terms of this fitting with the current correlations dialog, we can use some of the use options in the current correlations dialog that are under "Display Options", except we can't have method (Rearrange checkbox) and it doesn't make sense to have "Display on Diagonal" since there's no diagonal here

corrr::fashion(kendall_rankings, decimals = 2, leading_zeros = FALSE, na_print = "")

# I suggest we have additional options, like to remove the p-values by default
kendall_rankings %>% dplyr::select(-c(`Pr(>|z|)`, `Zvalue`))
# Maybe remove N_effective too? "Effective N, which is the equivalent N needed if all items were compared to all items"

# For handling missing options: We can still have the Missing options here, and just read that into `gosset::kendallTau`
# Except I'm not sure if there can be missing things in a rankings object! I should talk to Kaue and explore that a bit more.

# MODELLING? =============================================================
# I don't know if we want this in modelling or in describe -- we use a model to help summarise, but only look at the descriptives of it, not the model itself. With thought, this is like how we use our ANOVA for descriptives, even if a model is created "behind the scenes". So I think this is a descriptive!

# "Performance of varieties across traits"
# This is a function to identify (visualise) variety performance across multiple traits
# The values represented in a worth map are log-worth estimates.

# We fit a model to each of our traits, giving a list of models
# So we can see which Varieties are affecting that Trait 
mod <- rankings_list %>% purrr::map(PlackettLuce::PlackettLuce)

# We can then visualise how much each variety impacts that traits ranking of 1-2-3. So if it is likely to give a lower rank, it is a darker brown colour, and if it is likely to give a higher rank it is a bluer colour. 
gosset::worth_map(mod,
                  labels = traits,
                  labels.order = rev(traits)) +
  labs(x = "Variety",
       y = "Trait")

# e.g., if we run percentages and look at the `look at trial_by_trait_item_rank` data frame, and look just at `OverallAppreciation`, we can see that SX tends to often be ranked 1 out of it's three rankings (and hence is a dark blue in the plot), and INT F is ranked 1 the least times (and hence is a dark brown).

#data_book$calculate_summary(data_name="trial", store_results=TRUE, return_output=FALSE, factors=c("trait","item","rank"), drop=FALSE, j=1, summaries=c("summary_count_all"), silent=TRUE, percentage_type="factors", perc_total_factors=c("item","trait"))


MODELLING ==================================================================
# Kaue mentioned to me they look at four models in the training:

# a) yield ~ 1
# b) yield ~ demographics
# c) yield ~ climatology
# d) yield ~ demographics + climatology

# He illustrates with (c) in the vignette to get CHIRPS data.

I'll outline his code to get the CHIRPS data, but I suggest to not run this since it takes a long time. Instead, I've attached the CHIRPS data to download - just import this in instead of running the R code since the R code takes a long time to run.

They do not do any cleaning to the CHIRPS data. But I suggest we will if we want to get those rainfall indices our way.
Kaue mentioned the indices they use is "Total Rainfall", "SDII", and/or "Rx5day" "Max. 5 Days Precipitation" so we want to be able to get these indices. As @rdstern suggested, these can be new summaries we offer in R-Instat.

Indices we want:

Rx5day: maximum 5-day precipitation (mm)
Rtotal: total precipitation (mm) in wet days, rain >= 1 (mm)
SDII: simple daily intensity index, total precipitation divided by the number of wet days (mm/days)

This is discussed in detail here

###############################################################################
## The Vignettes approach to download it (commented out)
## 1. Getting the Data frame (File) - Note this takes a while to run!
#covar <- data_book$get_data_frame("covar")#
#
#dates = c(min(covar[, "planting_date"]),
#          max(covar[, "planting_date"]) + 70)
#
#chirps = get_chirps(covar[, c("longitude","latitude")], 
#                    dates = as.character(dates),
#                    as.matrix = TRUE,
#                    server = "ClimateSERV")
## rename the date variable
#newnames = dimnames(chirps)[[2]]
#newnames = gsub("chirps-v2.0.", "", newnames)
#newnames = gsub("[.]", "-", newnames)
#dimnames(chirps)[[2]] = newnames
#
# We currently do import from CHIRPS but I don't think we have the flexibility to do what they're doing here - i.e., import for different longitudes and latitudes, and for different planting dates, that are read from a dataset. We should have a think how to do that!.
# Then they compute the rainfall indices for the period from the planting date to the first 45 days of plant growth using the rainfall() function from the climatrends package with this commented out line:
#
#rain <- climatrends::rainfall(chirps, day.one = covar$planting_date, span = 45)
###############################################################################
###############################################################################

###############################################################################
## Our Code: Data Cleaning (Do we do cleaning automatically when CHIRPS is imported into R-Instat? Or run through dlgs?)
# Dialog: Import CHIRPS data set. Amend the path to your path
Sheet1 <- rio::import(file="C:/Users/lclem/OneDrive/Documents/GitHub/outfillingR/chirps.xlsx", guess_max=Inf, which=1)
data_book$import_data(data_tables=list(Sheet1=Sheet1))
rm(Sheet1)
#
# Dialog: Calculations. Here we're adding in the ID column from the covariate sheet in.
Sheet1 <- data_book$get_data_frame(data_name="Sheet1", use_current_filter=FALSE)
attach(what=Sheet1)
id <- data_book$get_data_frame("covar")$id
data_book$add_columns_to_data(data_name="Sheet1", col_name="id", col_data=id, before=TRUE)
detach(name=Sheet1, unload=TRUE)
data_book$append_to_variables_metadata(data_name="Sheet1", col_names="id", property="labels", new_val="")
rm(list=c("id", "Sheet1"))

# Dialog: Stack (Pivot Longer) - Stacking the CHIRPS data
Sheet1 <- data_book$get_data_frame(data_name="Sheet1")
chirps_stacked <- tidyr::pivot_longer(data=Sheet1, cols=c("X2015.09.10","X2015.09.11","X2015.09.12","X2015.09.13","X2015.09.14","X2015.09.15","X2015.09.16","X2015.09.17","X2015.09.18","X2015.09.19","X2015.09.20","X2015.09.21","X2015.09.22","X2015.09.23","X2015.09.24","X2015.09.25","X2015.09.26","X2015.09.27","X2015.09.28","X2015.09.29","X2015.09.30","X2015.10.01","X2015.10.02","X2015.10.03","X2015.10.04","X2015.10.05","X2015.10.06","X2015.10.07","X2015.10.08","X2015.10.09","X2015.10.10","X2015.10.11","X2015.10.12","X2015.10.13","X2015.10.14","X2015.10.15","X2015.10.16","X2015.10.17","X2015.10.18","X2015.10.19","X2015.10.20","X2015.10.21","X2015.10.22","X2015.10.23","X2015.10.24","X2015.10.25","X2015.10.26","X2015.10.27","X2015.10.28","X2015.10.29","X2015.10.30","X2015.10.31","X2015.11.01","X2015.11.02","X2015.11.03","X2015.11.04","X2015.11.05","X2015.11.06","X2015.11.07","X2015.11.08","X2015.11.09","X2015.11.10","X2015.11.11","X2015.11.12","X2015.11.13","X2015.11.14","X2015.11.15","X2015.11.16","X2015.11.17","X2015.11.18","X2015.11.19","X2015.11.20","X2015.11.21","X2015.11.22","X2015.11.23","X2015.11.24","X2015.11.25","X2015.11.26","X2015.11.27","X2015.11.28","X2015.11.29","X2015.11.30","X2015.12.01","X2015.12.02","X2015.12.03","X2015.12.04","X2015.12.05","X2015.12.06","X2015.12.07","X2015.12.08","X2015.12.09","X2015.12.10","X2015.12.11","X2015.12.12","X2015.12.13","X2015.12.14","X2015.12.15","X2015.12.16","X2015.12.17","X2015.12.18","X2015.12.19","X2015.12.20","X2015.12.21","X2015.12.22","X2015.12.23","X2015.12.24","X2015.12.25","X2015.12.26","X2015.12.27","X2015.12.28","X2015.12.29","X2015.12.30","X2015.12.31","X2016.01.01","X2016.01.02","X2016.01.03","X2016.01.04","X2016.01.05","X2016.01.06","X2016.01.07","X2016.01.08","X2016.01.09","X2016.01.10","X2016.01.11","X2016.01.12","X2016.01.13","X2016.01.14","X2016.01.15","X2016.01.16","X2016.01.17","X2016.01.18","X2016.01.19","X2016.01.20","X2016.01.21","X2016.01.22","X2016.01.23","X2016.01.24","X2016.01.25","X2016.01.26","X2016.01.27","X2016.01.28","X2016.01.29","X2016.01.30","X2016.01.31","X2016.02.01","X2016.02.02","X2016.02.03","X2016.02.04","X2016.02.05","X2016.02.06","X2016.02.07","X2016.02.08","X2016.02.09","X2016.02.10","X2016.02.11","X2016.02.12","X2016.02.13","X2016.02.14","X2016.02.15","X2016.02.16","X2016.02.17","X2016.02.18","X2016.02.19","X2016.02.20","X2016.02.21","X2016.02.22","X2016.02.23","X2016.02.24","X2016.02.25","X2016.02.26","X2016.02.27","X2016.02.28","X2016.02.29","X2016.03.01","X2016.03.02","X2016.03.03","X2016.03.04","X2016.03.05","X2016.03.06","X2016.03.07","X2016.03.08","X2016.03.09","X2016.03.10","X2016.03.11","X2016.03.12","X2016.03.13","X2016.03.14","X2016.03.15","X2016.03.16","X2016.03.17","X2016.03.18","X2016.03.19","X2016.03.20","X2016.03.21","X2016.03.22","X2016.03.23","X2016.03.24","X2016.03.25","X2016.03.26","X2016.03.27","X2016.03.28","X2016.03.29","X2016.03.30","X2016.03.31","X2016.04.01","X2016.04.02","X2016.04.03","X2016.04.04","X2016.04.05","X2016.04.06","X2016.04.07","X2016.04.08","X2016.04.09","X2016.04.10","X2016.04.11","X2016.04.12","X2016.04.13","X2016.04.14","X2016.04.15","X2016.04.16","X2016.04.17","X2016.04.18","X2016.04.19","X2016.04.20","X2016.04.21","X2016.04.22","X2016.04.23","X2016.04.24","X2016.04.25","X2016.04.26","X2016.04.27","X2016.04.28","X2016.04.29","X2016.04.30","X2016.05.01","X2016.05.02","X2016.05.03","X2016.05.04","X2016.05.05","X2016.05.06","X2016.05.07","X2016.05.08","X2016.05.09","X2016.05.10","X2016.05.11","X2016.05.12","X2016.05.13","X2016.05.14","X2016.05.15","X2016.05.16","X2016.05.17","X2016.05.18","X2016.05.19","X2016.05.20","X2016.05.21","X2016.05.22","X2016.05.23","X2016.05.24","X2016.05.25","X2016.05.26","X2016.05.27","X2016.05.28","X2016.05.29","X2016.05.30","X2016.05.31","X2016.06.01","X2016.06.02","X2016.06.03","X2016.06.04","X2016.06.05","X2016.06.06","X2016.06.07","X2016.06.08","X2016.06.09","X2016.06.10","X2016.06.11","X2016.06.12","X2016.06.13","X2016.06.14","X2016.06.15","X2016.06.16","X2016.06.17","X2016.06.18","X2016.06.19","X2016.06.20","X2016.06.21","X2016.06.22","X2016.06.23","X2016.06.24","X2016.06.25","X2016.06.26","X2016.06.27","X2016.06.28","X2016.06.29","X2016.06.30","X2016.07.01","X2016.07.02","X2016.07.03","X2016.07.04","X2016.07.05","X2016.07.06","X2016.07.07","X2016.07.08","X2016.07.09","X2016.07.10","X2016.07.11","X2016.07.12","X2016.07.13","X2016.07.14","X2016.07.15","X2016.07.16","X2016.07.17","X2016.07.18","X2016.07.19","X2016.07.20","X2016.07.21","X2016.07.22","X2016.07.23","X2016.07.24","X2016.07.25","X2016.07.26","X2016.07.27","X2016.07.28","X2016.07.29","X2016.07.30","X2016.07.31","X2016.08.01","X2016.08.02","X2016.08.03","X2016.08.04","X2016.08.05","X2016.08.06","X2016.08.07","X2016.08.08","X2016.08.09","X2016.08.10","X2016.08.11","X2016.08.12","X2016.08.13","X2016.08.14","X2016.08.15","X2016.08.16","X2016.08.17","X2016.08.18","X2016.08.19","X2016.08.20","X2016.08.21","X2016.08.22","X2016.08.23","X2016.08.24","X2016.08.25","X2016.08.26","X2016.08.27","X2016.08.28","X2016.08.29","X2016.08.30","X2016.08.31","X2016.09.01","X2016.09.02","X2016.09.03","X2016.09.04","X2016.09.05","X2016.09.06","X2016.09.07","X2016.09.08","X2016.09.09","X2016.09.10","X2016.09.11","X2016.09.12","X2016.09.13","X2016.09.14","X2016.09.15","X2016.09.16","X2016.09.17","X2016.09.18","X2016.09.19","X2016.09.20","X2016.09.21","X2016.09.22","X2016.09.23","X2016.09.24","X2016.09.25","X2016.09.26","X2016.09.27","X2016.09.28","X2016.09.29","X2016.09.30","X2016.10.01","X2016.10.02","X2016.10.03","X2016.10.04","X2016.10.05","X2016.10.06","X2016.10.07","X2016.10.08","X2016.10.09","X2016.10.10","X2016.10.11","X2016.10.12","X2016.10.13","X2016.10.14","X2016.10.15","X2016.10.16","X2016.10.17","X2016.10.18","X2016.10.19","X2016.10.20","X2016.10.21","X2016.10.22","X2016.10.23","X2016.10.24","X2016.10.25","X2016.10.26","X2016.10.27","X2016.10.28","X2016.10.29","X2016.10.30","X2016.10.31","X2016.11.01","X2016.11.02","X2016.11.03","X2016.11.04","X2016.11.05","X2016.11.06","X2016.11.07","X2016.11.08","X2016.11.09","X2016.11.10","X2016.11.11","X2016.11.12","X2016.11.13","X2016.11.14","X2016.11.15","X2016.11.16","X2016.11.17","X2016.11.18","X2016.11.19","X2016.11.20","X2016.11.21","X2016.11.22","X2016.11.23","X2016.11.24","X2016.11.25","X2016.11.26","X2016.11.27","X2016.11.28","X2016.11.29","X2016.11.30","X2016.12.01","X2016.12.02","X2016.12.03","X2016.12.04","X2016.12.05","X2016.12.06","X2016.12.07","X2016.12.08","X2016.12.09","X2016.12.10","X2016.12.11","X2016.12.12","X2016.12.13","X2016.12.14","X2016.12.15","X2016.12.16","X2016.12.17","X2016.12.18","X2016.12.19","X2016.12.20","X2016.12.21","X2016.12.22","X2016.12.23","X2016.12.24","X2016.12.25","X2016.12.26","X2016.12.27","X2016.12.28","X2016.12.29","X2016.12.30","X2016.12.31","X2017.01.01","X2017.01.02","X2017.01.03","X2017.01.04","X2017.01.05","X2017.01.06","X2017.01.07","X2017.01.08","X2017.01.09","X2017.01.10","X2017.01.11","X2017.01.12","X2017.01.13","X2017.01.14","X2017.01.15","X2017.01.16","X2017.01.17","X2017.01.18","X2017.01.19","X2017.01.20","X2017.01.21","X2017.01.22","X2017.01.23","X2017.01.24","X2017.01.25","X2017.01.26","X2017.01.27","X2017.01.28","X2017.01.29","X2017.01.30","X2017.01.31","X2017.02.01","X2017.02.02","X2017.02.03","X2017.02.04","X2017.02.05","X2017.02.06","X2017.02.07","X2017.02.08","X2017.02.09","X2017.02.10","X2017.02.11","X2017.02.12","X2017.02.13","X2017.02.14","X2017.02.15","X2017.02.16","X2017.02.17","X2017.02.18","X2017.02.19","X2017.02.20","X2017.02.21","X2017.02.22","X2017.02.23","X2017.02.24","X2017.02.25","X2017.02.26","X2017.02.27","X2017.02.28","X2017.03.01","X2017.03.02","X2017.03.03","X2017.03.04","X2017.03.05","X2017.03.06","X2017.03.07","X2017.03.08","X2017.03.09","X2017.03.10","X2017.03.11","X2017.03.12","X2017.03.13","X2017.03.14","X2017.03.15","X2017.03.16","X2017.03.17","X2017.03.18","X2017.03.19","X2017.03.20","X2017.03.21","X2017.03.22","X2017.03.23","X2017.03.24","X2017.03.25","X2017.03.26","X2017.03.27","X2017.03.28","X2017.03.29","X2017.03.30","X2017.03.31","X2017.04.01","X2017.04.02","X2017.04.03","X2017.04.04","X2017.04.05","X2017.04.06","X2017.04.07","X2017.04.08","X2017.04.09","X2017.04.10","X2017.04.11","X2017.04.12"), names_to="date",  values_to="rain", names_prefix="X")
data_book$import_data(data_tables=list(chirps_stacked=chirps_stacked))

rm(list=c("chirps_stacked", "Sheet1"))

# Dialog: Find/Replace
date <- data_book$get_columns_from_data(data_name="chirps_stacked", col_names="date", use_current_filter=FALSE)
date <- stringr::str_replace_all(string=date, stringr::coll(".", FALSE), replacement="/")
data_book$add_columns_to_data(data_name="chirps_stacked", col_name="date", col_data=date, before=FALSE, adjacent_column="date")
rm(date)

# Dialog: Make Date
date <- data_book$get_columns_from_data(data_name="chirps_stacked", col_names="date", use_current_filter=FALSE)
date <- as.Date(as.character(date), format="%Y/%m/%d")
data_book$add_columns_to_data(data_name="chirps_stacked", col_name="date", col_data=date, before=FALSE, adjacent_column="date")
rm(date)

# End of cleaning! So now, we want to be able to find these indices: Rtotal, SDII, and Max 5 Days Precipitation. This is discussed in an issue here - https://github.com/IDEMSInternational/R-Instat/issues/9426
# That might result in me updating the R code here on how to do this in R-Instat, or otherwise, we need to add this into R-Instat.
# For now, I will just use total rainfall for testing the rest of this vignette out.

######################################################################################

# The rest here is for me to go through in R-Instat still. But, from a glance, I assume grouped_rankings is an option in a dialog when we "Create Rankings Object", and the modelling is for our model (c) and we would have this in a modelling dialog. I cannot see examples on modelling (a), (b), or (d). I'm sure we can work them out! But, it would be useful to have examples from Kaue so we can dig into that properly.

# To link the rankings to covariates, they must be coerced into a ‘grouped_rankings’ object.
# This is done using the group() function from the PlackettLuce package.
# For this example, we retain only the rankings corresponding to yield.
G <- group(rankings_list[["Yield"]], index = 1:length(rankings_list[["Yield"]]))

# combine rankings and rainfall data to one df
pldG <- cbind(G, rain)

# 2. Now we have our df, pldG. We model this:
tree_model <- pltree(G ~ Rtotal, data = pldG, alpha = 0.1)
summary(tree_model)
node_labels(tree_model)
node_rules(tree_model)
top_items(tree_model, top = 3)
plot(tree_model, ref = "Amadeus 77")

# Reliability of superior varieties
#We can compute the reliability estimates of the evaluated common bean varieties in
# each of the resulting nodes of the Plackett-Luce tree.
# This helps identify varieties with a higher probability of outperforming a variety
# check (Amadeus 77). For simplicity, we present only the varieties with a
# reliability score >= 0.5
reliability(tree_model, ref = "Amadeus 77")

# Going beyond yield
# A more comprehensive approach to assessing the performance of varieties involves
# using “overall appreciation,” as this trait is expected to capture the performance
# of a variety not only for yield but also for all other traits prioritised by farmers.
# To support this hypothesis, we use the compare() function, which applies the
# method proposed by Bland and Altman (1986) [16] to assess the agreement between two
# different measures.
# Here, we compare overall appreciation and yield. If both measures completely agree,
# all the varieties should be centered at 0 on the Y-axis.

1 reply

lilyclements Feb 6, 2025
Maintainer Author

This is following the structure given in discussion #9348

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gosset Vignette 1: An Initial Overview for Building into R-Instat #9339

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Gosset Vignette 1: An Initial Overview for Building into R-Instat #9339

lilyclements Jan 8, 2025 Maintainer

Replies: 1 comment · 1 reply

lilyclements Feb 5, 2025 Maintainer Author

lilyclements Feb 6, 2025 Maintainer Author

lilyclements
Jan 8, 2025
Maintainer

Replies: 1 comment 1 reply

lilyclements
Feb 5, 2025
Maintainer Author

lilyclements Feb 6, 2025
Maintainer Author