calculateSRDValues, calculateSRDDistribution, utilsRankingMatrix,
and calculateCrossValidation now stop with an informative error message
when the input data frame contains NA values, non-numeric columns, fewer
than 2 columns, or fewer than 2 rows. Previously, NA values could cause
an infinite loop in the C++ layer.
calculateCrossValidation now stops with an informative error message
when number_of_folds is 0 or 1. Previously, these values caused an
immediate R session crash.
calculateSRDValues, calculateSRDDistribution, utilsRankingMatrix,
and calculateCrossValidation now stop with an informative error message
when the input data frame contains a constant column or constant columns.
Constant columns carry no information for SRD analysis and previously
caused undefined behaviour in the C++ layer.
plotCrossValidation no longer row-binds the precomputed summary
statistics (min, xx1, Q1, median, Q3, xx19, max) together with the
fold-wise SRD values into a single sample for ggplot to recompute
quartiles and whiskers from. Previously, this mixture of real and
derived values distorted the boxplot, especially when number_of_folds
was small. The box geometry (min, Q1, median, Q3, max) is now taken
directly from boxplot_values, and the mean is computed directly from
the fold-wise SRD values; neither is derived from a contaminated sample
(see also the related ordering simplification under Improvements).
calculateSRDDistribution() and calculateCrossValidation() gain a new
seed parameter for reproducible results. When seed = NULL (the
default), the original stochastic behaviour is preserved, ensuring full
backward compatibility.
Three example datasets are now bundled with the package and accessible
via system.file("extdata", "<filename>", package = "rSRD").
The following three files use a semicolon separator and include a header row
(read.csv(..., header = TRUE, sep = ";")).
mep_profiles.csv: voting profiles of Members of the European Parliament.bundesliga20_21.csv: team performance data from the 2020/21 Bundesliga season.movies1994.csv: ratings and rankings of films released in 1994.Added a vignette, "Getting Started with rSRD", covering the full
analysis workflow: data preprocessing, computing SRD values, the
permutation test for significance, cross-validation, and pairwise
visualisation via heatmaps. Access it with
vignette("rSRD-introduction", package = "rSRD").
plotCrossValidation no longer re-derives a column ordering from
boxplot_values. The columns of SRD_values_of_different_folds already
arrive ordered by median, Q1, Q3, min, and max, since this ordering is
performed in the C++ layer (Cross_Validation::Wilcoxon/Alpaydin/
Dietterich) and preserved unchanged by calculateCrossValidation(),
which only attaches solution names to the already-ordered columns. The
redundant re-ordering step has been removed, simplifying the function and
removing its dependency on boxplot_values for this purpose (plotCrossValidation
still uses boxplot_values directly for box geometry; see Bug fixes).ggplot2::aes_string() calls in plotPermTest replaced with
ggplot2::aes() using the .data pronoun.plotPermTest now places solution labels at the interpolated y-value of
the distribution curve at each solution's SRD value, replacing a random
placement that changed on every call. The x- and y-axes now carry
informative labels ("Normalised SRD value" and "Relative frequency" or
"Cumulative relative frequency" depending on the densityToDistr argument),
replacing auto-generated labels that exposed internal implementation
details.import() directives for dplyr, ggplot2, tibble, janitor,
rlang, and stringr replaced with specific importFrom() calls.
stringr removed as an unused dependency.utilsColorPalette is now generated via grDevices::colorRampPalette()
instead of a 250-element hard-coded vector.?rSRD.%>% pipe operators replaced with the native R pipe |>.testthat (edition 3), covering
calculateSRDValues(), calculateCrossValidation(), utilsMaxSRD(),
utilsTieProbability(), utilsCalculateRank(), utilsCreateReference(),
utilsPreprocessDF() and many other functions.