vignettes/cmstatr_Validation.Rmd
cmstatr_Validation.Rmd
This vignette is intended to contain the same validation that is
included in the test suite within the cmstatr
package, but
in a format that is easier for a human to read. The intent is that this
vignette will include only those validations that are included in the
test suite, but that the test suite may include more tests than are
shown in this vignette.
The following packages will be used in this validation. The version of each package used is listed at the end of this vignette.
Throughout this vignette, the testthat
package will be
used. Expressions such as expect_equal
are used to ensure
that the two values are equal (within some tolerance). If this
expectation is not true, the vignette will fail to build. The tolerance
is a relative tolerance: a tolerance of 0.01 means that the two values
must be within
of each other. As an example, the following expression checks that the
value 10
is equal to 10.1
within a tolerance
of 0.01
. Such an expectation should be satisfied.
expect_equal(10, 10.1, tolerance = 0.01)
The basis_...
functions automatically perform certain
diagnostic tests. When those diagnostic tests are not relevant to the
validation, the diagnostic tests are overridden by passing the argument
override = "all"
.
The following table provides a cross-reference between the various
functions of the cmstatr
package and the tests shown within
this vignette. The sections in this vignette are organized by data set.
Not all checks are performed on all data sets.
carbon.fabric
Data Set
This data set is example data that is provided with
cmstatr
. The first few rows of this data are shown
below.
head(carbon.fabric)
#> id test condition batch strength
#> 1 WT-RTD-1-1 WT RTD 1 137.4438
#> 2 WT-RTD-1-2 WT RTD 1 139.5395
#> 3 WT-RTD-1-3 WT RTD 1 150.8900
#> 4 WT-RTD-1-4 WT RTD 1 141.4474
#> 5 WT-RTD-1-5 WT RTD 1 141.8203
#> 6 WT-RTD-1-6 WT RTD 1 151.8821
This data was entered into ASAP 2008 [1] and the reported Anderson–Darling k–Sample test statistics were recorded, as were the conclusions.
The value of the test statistic reported by cmstatr
and
that reported by ASAP 2008 differ by a factor of
,
as do the critical values used. As such, the conclusion of the tests are
identical. This is described in more detail in the Anderson–Darling k–Sample Vignette.
When the RTD warp-tension data from this data set is entered into
ASAP 2008, it reports a test statistic of 0.456 and fails to reject the
null hypothesis that the batches are drawn from the same distribution.
Adjusting for the different definition of the test statistic, the
results given by cmstatr
are very similar.
res <- carbon.fabric %>%
filter(test == "WT") %>%
filter(condition == "RTD") %>%
ad_ksample(strength, batch)
expect_equal(res$ad / (res$k - 1), 0.456, tolerance = 0.002)
expect_false(res$reject_same_dist)
res
#>
#> Call:
#> ad_ksample(data = ., x = strength, groups = batch)
#>
#> N = 18 k = 3
#> ADK = 0.912 p-value = 0.95989
#> Conclusion: Samples come from the same distribution ( alpha = 0.025 )
When the ETW warp-tension data from this data set are entered into
ASAP 2008, the reported test statistic is 1.604 and it fails to reject
the null hypothesis that the batches are drawn from the same
distribution. Adjusting for the different definition of the test
statistic, cmstatr
gives nearly identical results.
res <- carbon.fabric %>%
filter(test == "WT") %>%
filter(condition == "ETW") %>%
ad_ksample(strength, batch)
expect_equal(res$ad / (res$k - 1), 1.604, tolerance = 0.002)
expect_false(res$reject_same_dist)
res
#>
#> Call:
#> ad_ksample(data = ., x = strength, groups = batch)
#>
#> N = 18 k = 3
#> ADK = 3.21 p-value = 0.10208
#> Conclusion: Samples come from the same distribution ( alpha = 0.025 )
CMH-17-1G [2] provides an example data set and results from ASAP [1] and STAT17 [3]. This example data set is duplicated below:
dat_8_3_11_1_1 <- tribble(
~batch, ~strength, ~condition,
1, 118.3774604, "CTD", 1, 84.9581364, "RTD", 1, 83.7436035, "ETD",
1, 123.6035612, "CTD", 1, 92.4891822, "RTD", 1, 84.3831677, "ETD",
1, 115.2238092, "CTD", 1, 96.8212659, "RTD", 1, 94.8030433, "ETD",
1, 112.6379744, "CTD", 1, 109.030325, "RTD", 1, 94.3931537, "ETD",
1, 116.5564277, "CTD", 1, 97.8212659, "RTD", 1, 101.702222, "ETD",
1, 123.1649896, "CTD", 1, 100.921519, "RTD", 1, 86.5372121, "ETD",
2, 128.5589027, "CTD", 1, 103.699444, "RTD", 1, 92.3772684, "ETD",
2, 113.1462103, "CTD", 2, 93.7908212, "RTD", 2, 89.2084024, "ETD",
2, 121.4248107, "CTD", 2, 107.526709, "RTD", 2, 100.686001, "ETD",
2, 134.3241906, "CTD", 2, 94.5769704, "RTD", 2, 81.0444192, "ETD",
2, 129.6405117, "CTD", 2, 93.8831373, "RTD", 2, 91.3398070, "ETD",
2, 117.9818658, "CTD", 2, 98.2296605, "RTD", 2, 93.1441939, "ETD",
3, 115.4505226, "CTD", 2, 111.346590, "RTD", 2, 85.8204168, "ETD",
3, 120.0369467, "CTD", 2, 100.817538, "RTD", 3, 94.8966273, "ETD",
3, 117.1631088, "CTD", 3, 100.382203, "RTD", 3, 95.8068520, "ETD",
3, 112.9302797, "CTD", 3, 91.5037811, "RTD", 3, 86.7842252, "ETD",
3, 117.9114501, "CTD", 3, 100.083233, "RTD", 3, 94.4011973, "ETD",
3, 120.1900159, "CTD", 3, 95.6393615, "RTD", 3, 96.7231171, "ETD",
3, 110.7295966, "CTD", 3, 109.304779, "RTD", 3, 89.9010384, "ETD",
3, 100.078562, "RTD", 3, 99.1205847, "RTD", 3, 89.3672306, "ETD",
1, 106.357525, "ETW", 1, 99.0239966, "ETW2",
1, 105.898733, "ETW", 1, 103.341238, "ETW2",
1, 88.4640082, "ETW", 1, 100.302130, "ETW2",
1, 103.901744, "ETW", 1, 98.4634133, "ETW2",
1, 80.2058219, "ETW", 1, 92.2647280, "ETW2",
1, 109.199597, "ETW", 1, 103.487693, "ETW2",
1, 61.0139431, "ETW", 1, 113.734763, "ETW2",
2, 99.3207107, "ETW", 2, 108.172659, "ETW2",
2, 115.861770, "ETW", 2, 108.426732, "ETW2",
2, 82.6133082, "ETW", 2, 116.260375, "ETW2",
2, 85.3690411, "ETW", 2, 121.049610, "ETW2",
2, 115.801622, "ETW", 2, 111.223082, "ETW2",
2, 44.3217741, "ETW", 2, 104.574843, "ETW2",
2, 117.328077, "ETW", 2, 103.222552, "ETW2",
2, 88.6782903, "ETW", 3, 99.3918538, "ETW2",
3, 107.676986, "ETW", 3, 87.3421658, "ETW2",
3, 108.960241, "ETW", 3, 102.730741, "ETW2",
3, 116.122640, "ETW", 3, 96.3694916, "ETW2",
3, 80.2334815, "ETW", 3, 99.5946088, "ETW2",
3, 106.145570, "ETW", 3, 97.0712407, "ETW2",
3, 104.667866, "ETW",
3, 104.234953, "ETW"
)
dat_8_3_11_1_1
#> # A tibble: 102 × 3
#> batch strength condition
#> <dbl> <dbl> <chr>
#> 1 1 118. CTD
#> 2 1 85.0 RTD
#> 3 1 83.7 ETD
#> 4 1 124. CTD
#> 5 1 92.5 RTD
#> 6 1 84.4 ETD
#> 7 1 115. CTD
#> 8 1 96.8 RTD
#> 9 1 94.8 ETD
#> 10 1 113. CTD
#> # ℹ 92 more rows
CMH-17-1G Table 8.3.11.1.1(a) provides results of the MNR test from
ASAP for this data set. Batches 2 and 3 of the ETW data is considered
here and the results of cmstatr
are compared with those
published in CMH-17-1G.
For Batch 2 of the ETW data, the results match those published in the handbook within a small tolerance. The published test statistic is 2.008.
res <- dat_8_3_11_1_1 %>%
filter(condition == "ETW" & batch == 2) %>%
maximum_normed_residual(strength, alpha = 0.05)
expect_equal(res$mnr, 2.008, tolerance = 0.001)
expect_equal(res$crit, 2.127, tolerance = 0.001)
expect_equal(res$n_outliers, 0)
res
#>
#> Call:
#> maximum_normed_residual(data = ., x = strength, alpha = 0.05)
#>
#> MNR = 2.008274 ( critical value = 2.126645 )
#>
#> No outliers detected ( alpha = 0.05 )
Similarly, for Batch 3 of the ETW data, the results of
cmstatr
match the results published in the handbook within
a small tolerance. The published test statistic is 2.119
res <- dat_8_3_11_1_1 %>%
filter(condition == "ETW" & batch == 3) %>%
maximum_normed_residual(strength, alpha = 0.05)
expect_equal(res$mnr, 2.119, tolerance = 0.001)
expect_equal(res$crit, 2.020, tolerance = 0.001)
expect_equal(res$n_outliers, 1)
res
#>
#> Call:
#> maximum_normed_residual(data = ., x = strength, alpha = 0.05)
#>
#> MNR = 2.119175 ( critical value = 2.019969 )
#>
#> Outliers ( alpha = 0.05 ):
#> Index Value
#> 4 80.23348
For the ETW condition, the ADK test statistic given in [2] is
and the test concludes that the samples come from the same distribution.
Noting that cmstatr
uses the definition of the test
statistic given in [4], so the test
statistic given by cmstatr
differs from that given by ASAP
by a factor of
,
as described in the Anderson–Darling k–Sample
Vignette.
res <- dat_8_3_11_1_1 %>%
filter(condition == "ETW") %>%
ad_ksample(strength, batch)
expect_equal(res$ad / (res$k - 1), 0.793, tolerance = 0.003)
expect_false(res$reject_same_dist)
res
#>
#> Call:
#> ad_ksample(data = ., x = strength, groups = batch)
#>
#> N = 22 k = 3
#> ADK = 1.59 p-value = 0.59491
#> Conclusion: Samples come from the same distribution ( alpha = 0.025 )
Similarly, for the ETW2 condition, the test statistic given in [2] is
and the test concludes that the samples come from different
distributions. This matches cmstatr
res <- dat_8_3_11_1_1 %>%
filter(condition == "ETW2") %>%
ad_ksample(strength, batch)
expect_equal(res$ad / (res$k - 1), 3.024, tolerance = 0.001)
expect_true(res$reject_same_dist)
res
#>
#> Call:
#> ad_ksample(data = ., x = strength, groups = batch)
#>
#> N = 20 k = 3
#> ADK = 6.05 p-value = 0.0042703
#> Conclusion: Samples do not come from the same distribution (alpha = 0.025 )
CMH-17-1G Section 8.3.11.2.1 contains results from STAT17 for the
“observed significance level” from the Anderson–Darling test for various
distributions. In this section, the ETW condition from the present data
set is used. The published results are given in the following table. The
results from cmstatr
are below and are very similar to
those from STAT17.
Distribution | OSL |
---|---|
Normal | 0.006051 |
Lognormal | 0.000307 |
Weibull | 0.219 |
res <- dat_8_3_11_1_1 %>%
filter(condition == "ETW") %>%
anderson_darling_normal(strength)
expect_equal(res$osl, 0.006051, tolerance = 0.001)
res
#>
#> Call:
#> anderson_darling_normal(data = ., x = strength)
#>
#> Distribution: Normal ( n = 22 )
#> Test statistic: A = 1.052184
#> OSL (p-value): 0.006051441 (assuming unknown parameters)
#> Conclusion: Sample is not drawn from a Normal distribution ( alpha = 0.05 )
res <- dat_8_3_11_1_1 %>%
filter(condition == "ETW") %>%
anderson_darling_lognormal(strength)
expect_equal(res$osl, 0.000307, tolerance = 0.001)
res
#>
#> Call:
#> anderson_darling_lognormal(data = ., x = strength)
#>
#> Distribution: Lognormal ( n = 22 )
#> Test statistic: A = 1.568825
#> OSL (p-value): 0.0003073592 (assuming unknown parameters)
#> Conclusion: Sample is not drawn from a Lognormal distribution ( alpha = 0.05 )
res <- dat_8_3_11_1_1 %>%
filter(condition == "ETW") %>%
anderson_darling_weibull(strength)
expect_equal(res$osl, 0.0219, tolerance = 0.002)
res
#>
#> Call:
#> anderson_darling_weibull(data = ., x = strength)
#>
#> Distribution: Weibull ( n = 22 )
#> Test statistic: A = 0.8630665
#> OSL (p-value): 0.02186889 (assuming unknown parameters)
#> Conclusion: Sample is not drawn from a Weibull distribution ( alpha = 0.05 )
CMH-17-1G Section 8.3.11.1.1 provides results from ASAP for Levene’s
test for equality of variance between conditions after the ETW and ETW2
conditions are removed. The handbook shows an F statistic of 0.58,
however if this data is entered into ASAP directly, ASAP gives an F
statistic of 0.058, which matches the result of
cmstatr
.
res <- dat_8_3_11_1_1 %>%
filter(condition != "ETW" & condition != "ETW2") %>%
levene_test(strength, condition)
expect_equal(res$f, 0.058, tolerance = 0.01)
res
#>
#> Call:
#> levene_test(data = ., x = strength, groups = condition)
#>
#> n = 60 k = 3
#> F = 0.05811631 p-value = 0.943596
#> Conclusion: Samples have equal variances ( alpha = 0.05 )
CMH-17-1G Section 8.3.11.2.2 provides output from STAT17. The ETW2
condition from the present data set was analyzed by STAT17 and that
software reported an F statistic of 0.123 from Levene’s test when
comparing the variance of the batches within this condition. The result
from cmstatr
is similar.
res <- dat_8_3_11_1_1 %>%
filter(condition == "ETW2") %>%
levene_test(strength, batch)
expect_equal(res$f, 0.123, tolerance = 0.005)
res
#>
#> Call:
#> levene_test(data = ., x = strength, groups = batch)
#>
#> n = 20 k = 3
#> F = 0.1233937 p-value = 0.8847
#> Conclusion: Samples have equal variances ( alpha = 0.05 )
Similarly, the published value of the F statistic for the CTD
condition is
.
cmstatr
produces very similar results.
res <- dat_8_3_11_1_1 %>%
filter(condition == "CTD") %>%
levene_test(strength, batch)
expect_equal(res$f, 3.850, tolerance = 0.005)
res
#>
#> Call:
#> levene_test(data = ., x = strength, groups = batch)
#>
#> n = 19 k = 3
#> F = 3.852032 p-value = 0.04309008
#> Conclusion: Samples have unequal variance ( alpha = 0.05 )
CMH-17-1G Section 8.3.11.2.1 provides STAT17 outputs for the ETW condition of the present data set. The nonparametric Basis values are listed. In this case, the Hanson–Koopmans method is used. The published A-Basis value is 13.0 and the B-Basis is 37.9.
res <- dat_8_3_11_1_1 %>%
filter(condition == "ETW") %>%
basis_hk_ext(strength, method = "woodward-frawley", p = 0.99, conf = 0.95,
override = "all")
expect_equal(res$basis, 13.0, tolerance = 0.001)
res
#>
#> Call:
#> basis_hk_ext(data = ., x = strength, p = 0.99, conf = 0.95, method = "woodward-frawley",
#> override = "all")
#>
#> Distribution: Nonparametric (Extended Hanson-Koopmans, Woodward-Frawley method) ( n = 22 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_batch_variability`,
#> `outliers`,
#> `correct_method_used`,
#> `sample_size`
#> A-Basis: ( p = 0.99 , conf = 0.95 )
#> 12.99614
res <- dat_8_3_11_1_1 %>%
filter(condition == "ETW") %>%
basis_hk_ext(strength, method = "optimum-order", p = 0.90, conf = 0.95,
override = "all")
expect_equal(res$basis, 37.9, tolerance = 0.001)
res
#>
#> Call:
#> basis_hk_ext(data = ., x = strength, p = 0.9, conf = 0.95, method = "optimum-order",
#> override = "all")
#>
#> Distribution: Nonparametric (Extended Hanson-Koopmans, optimum two-order-statistic method) ( n = 22 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_batch_variability`,
#> `outliers`,
#> `correct_method_used`,
#> `sample_size`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> 37.88511
CMH-17-1G Section 8.3.11.2.2 provides output from STAT17 for the ETW2
condition from the present data set. STAT17 reports A- and B-Basis
values based on the ANOVA method of 34.6 and 63.2, respectively. The
results from cmstatr
are similar.
res <- dat_8_3_11_1_1 %>%
filter(condition == "ETW2") %>%
basis_anova(strength, batch, override = "number_of_groups",
p = 0.99, conf = 0.95)
expect_equal(res$basis, 34.6, tolerance = 0.001)
res
#>
#> Call:
#> basis_anova(data = ., x = strength, groups = batch, p = 0.99,
#> conf = 0.95, override = "number_of_groups")
#>
#> Distribution: ANOVA ( n = 20, r = 3 )
#> The following diagnostic tests were overridden:
#> `number_of_groups`
#> A-Basis: ( p = 0.99 , conf = 0.95 )
#> 34.57763
res <- dat_8_3_11_1_1 %>%
filter(condition == "ETW2") %>%
basis_anova(strength, batch, override = "number_of_groups")
expect_equal(res$basis, 63.2, tolerance = 0.001)
res
#>
#> Call:
#> basis_anova(data = ., x = strength, groups = batch, override = "number_of_groups")
#>
#> Distribution: ANOVA ( n = 20, r = 3 )
#> The following diagnostic tests were overridden:
#> `number_of_groups`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> 63.20276
[2] provides an example data set and results from ASAP [1]. This example data set is duplicated below:
dat_8_3_11_1_2 <- tribble(
~batch, ~strength, ~condition,
1, 79.04517, "CTD", 1, 103.2006, "RTD", 1, 63.22764, "ETW", 1, 54.09806, "ETW2",
1, 102.6014, "CTD", 1, 105.1034, "RTD", 1, 70.84454, "ETW", 1, 58.87615, "ETW2",
1, 97.79372, "CTD", 1, 105.1893, "RTD", 1, 66.43223, "ETW", 1, 61.60167, "ETW2",
1, 92.86423, "CTD", 1, 100.4189, "RTD", 1, 75.37771, "ETW", 1, 60.23973, "ETW2",
1, 117.218, "CTD", 2, 85.32319, "RTD", 1, 72.43773, "ETW", 1, 61.4808, "ETW2",
1, 108.7168, "CTD", 2, 92.69923, "RTD", 1, 68.43073, "ETW", 1, 64.55832, "ETW2",
1, 112.2773, "CTD", 2, 98.45242, "RTD", 1, 69.72524, "ETW", 2, 57.76131, "ETW2",
1, 114.0129, "CTD", 2, 104.1014, "RTD", 2, 66.20343, "ETW", 2, 49.91463, "ETW2",
2, 106.8452, "CTD", 2, 91.51841, "RTD", 2, 60.51251, "ETW", 2, 61.49271, "ETW2",
2, 112.3911, "CTD", 2, 101.3746, "RTD", 2, 65.69334, "ETW", 2, 57.7281, "ETW2",
2, 115.5658, "CTD", 2, 101.5828, "RTD", 2, 62.73595, "ETW", 2, 62.11653, "ETW2",
2, 87.40657, "CTD", 2, 99.57384, "RTD", 2, 59.00798, "ETW", 2, 62.69353, "ETW2",
2, 102.2785, "CTD", 2, 88.84826, "RTD", 2, 62.37761, "ETW", 3, 61.38523, "ETW2",
2, 110.6073, "CTD", 3, 92.18703, "RTD", 3, 64.3947, "ETW", 3, 60.39053, "ETW2",
3, 105.2762, "CTD", 3, 101.8234, "RTD", 3, 72.8491, "ETW", 3, 59.17616, "ETW2",
3, 110.8924, "CTD", 3, 97.68909, "RTD", 3, 66.56226, "ETW", 3, 60.17616, "ETW2",
3, 108.7638, "CTD", 3, 101.5172, "RTD", 3, 66.56779, "ETW", 3, 46.47396, "ETW2",
3, 110.9833, "CTD", 3, 100.0481, "RTD", 3, 66.00123, "ETW", 3, 51.16616, "ETW2",
3, 101.3417, "CTD", 3, 102.0544, "RTD", 3, 59.62108, "ETW",
3, 100.0251, "CTD", 3, 60.61167, "ETW",
3, 57.65487, "ETW",
3, 66.51241, "ETW",
3, 64.89347, "ETW",
3, 57.73054, "ETW",
3, 68.94086, "ETW",
3, 61.63177, "ETW"
)
CMH-17-1G Table 8.3.11.2(k) provides outputs from ASAP for the data
set above. ASAP uses the pooled SD method. ASAP produces the following
results, which are quite similar to those produced by
cmstatr
.
Condition | CTD | RTD | ETW | ETW2 |
---|---|---|---|---|
B-Basis | 93.64 | 87.30 | 54.33 | 47.12 |
A-Basis | 89.19 | 79.86 | 46.84 | 39.69 |
res <- basis_pooled_sd(dat_8_3_11_1_2, strength, condition,
override = "all")
expect_equal(res$basis$value[res$basis$group == "CTD"],
93.64, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "RTD"],
87.30, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "ETW"],
54.33, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "ETW2"],
47.12, tolerance = 0.001)
res
#>
#> Call:
#> basis_pooled_sd(data = dat_8_3_11_1_2, x = strength, groups = condition,
#> override = "all")
#>
#> Distribution: Normal - Pooled Standard Deviation ( n = 83, r = 4 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_group_variability`,
#> `outliers_within_group`,
#> `pooled_data_normal`,
#> `pooled_variance_equal`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> CTD 93.63504
#> ETW 54.32706
#> ETW2 47.07669
#> RTD 87.29555
res <- basis_pooled_sd(dat_8_3_11_1_2, strength, condition,
p = 0.99, conf = 0.95,
override = "all")
expect_equal(res$basis$value[res$basis$group == "CTD"],
86.19, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "RTD"],
79.86, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "ETW"],
46.84, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "ETW2"],
39.69, tolerance = 0.001)
res
#>
#> Call:
#> basis_pooled_sd(data = dat_8_3_11_1_2, x = strength, groups = condition,
#> p = 0.99, conf = 0.95, override = "all")
#>
#> Distribution: Normal - Pooled Standard Deviation ( n = 83, r = 4 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_group_variability`,
#> `outliers_within_group`,
#> `pooled_data_normal`,
#> `pooled_variance_equal`
#> A-Basis: ( p = 0.99 , conf = 0.95 )
#> CTD 86.19301
#> ETW 46.84112
#> ETW2 39.65214
#> RTD 79.86205
After removal of the ETW2 condition, CMH17-STATS reports the pooled
A- and B-Basis (mod CV) shown in the following table.
cmstatr
computes very similar values.
Condition | CTD | RTD | ETW |
---|---|---|---|
B-Basis | 92.25 | 85.91 | 52.97 |
A-Basis | 83.81 | 77.48 | 44.47 |
res <- dat_8_3_11_1_2 %>%
filter(condition != "ETW2") %>%
basis_pooled_sd(strength, condition, modcv = TRUE, override = "all")
expect_equal(res$basis$value[res$basis$group == "CTD"],
92.25, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "RTD"],
85.91, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "ETW"],
52.97, tolerance = 0.001)
res
#>
#> Call:
#> basis_pooled_sd(data = ., x = strength, groups = condition, modcv = TRUE,
#> override = "all")
#>
#> Distribution: Normal - Pooled Standard Deviation ( n = 65, r = 3 )
#> Modified CV Approach Used
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_group_variability`,
#> `outliers_within_group`,
#> `pooled_data_normal`,
#> `pooled_variance_equal`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> CTD 92.24927
#> ETW 52.9649
#> RTD 85.90454
res <- dat_8_3_11_1_2 %>%
filter(condition != "ETW2") %>%
basis_pooled_sd(strength, condition,
p = 0.99, conf = 0.95, modcv = TRUE, override = "all")
expect_equal(res$basis$value[res$basis$group == "CTD"],
83.81, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "RTD"],
77.48, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "ETW"],
44.47, tolerance = 0.001)
res
#>
#> Call:
#> basis_pooled_sd(data = ., x = strength, groups = condition, p = 0.99,
#> conf = 0.95, modcv = TRUE, override = "all")
#>
#> Distribution: Normal - Pooled Standard Deviation ( n = 65, r = 3 )
#> Modified CV Approach Used
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_group_variability`,
#> `outliers_within_group`,
#> `pooled_data_normal`,
#> `pooled_variance_equal`
#> A-Basis: ( p = 0.99 , conf = 0.95 )
#> CTD 83.80981
#> ETW 44.47038
#> RTD 77.47589
This data set was input into CMH17-STATS and the Pooled CV method was
selected. The results from CMH17-STATS were as follows.
cmstatr
produces very similar results.
Condition | CTD | RTD | ETW | ETW2 |
---|---|---|---|---|
B-Basis | 90.89 | 85.37 | 56.79 | 50.55 |
A-Basis | 81.62 | 76.67 | 50.98 | 45.40 |
res <- basis_pooled_cv(dat_8_3_11_1_2, strength, condition, override = "all")
expect_equal(res$basis$value[res$basis$group == "CTD"],
90.89, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "RTD"],
85.37, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "ETW"],
56.79, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "ETW2"],
50.55, tolerance = 0.001)
res
#>
#> Call:
#> basis_pooled_cv(data = dat_8_3_11_1_2, x = strength, groups = condition,
#> override = "all")
#>
#> Distribution: Normal - Pooled CV ( n = 83, r = 4 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_group_variability`,
#> `outliers_within_group`,
#> `pooled_data_normal`,
#> `normalized_variance_equal`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> CTD 90.88018
#> ETW 56.78337
#> ETW2 50.54406
#> RTD 85.36756
res <- basis_pooled_cv(dat_8_3_11_1_2, strength, condition,
p = 0.99, conf = 0.95, override = "all")
expect_equal(res$basis$value[res$basis$group == "CTD"],
81.62, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "RTD"],
76.67, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "ETW"],
50.98, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "ETW2"],
45.40, tolerance = 0.001)
res
#>
#> Call:
#> basis_pooled_cv(data = dat_8_3_11_1_2, x = strength, groups = condition,
#> p = 0.99, conf = 0.95, override = "all")
#>
#> Distribution: Normal - Pooled CV ( n = 83, r = 4 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_group_variability`,
#> `outliers_within_group`,
#> `pooled_data_normal`,
#> `normalized_variance_equal`
#> A-Basis: ( p = 0.99 , conf = 0.95 )
#> CTD 81.60931
#> ETW 50.97801
#> ETW2 45.39158
#> RTD 76.66214
This data set was input into CMH17-STATS and the Pooled CV method was
selected with the modified CV transform. Additionally, the ETW2
condition was removed. The results from CMH17-STATS were as follows.
cmstatr
produces very similar results.
Condition | CTD | RTD | ETW |
---|---|---|---|
B-Basis | 90.31 | 84.83 | 56.43 |
A-Basis | 80.57 | 75.69 | 50.33 |
res <- dat_8_3_11_1_2 %>%
filter(condition != "ETW2") %>%
basis_pooled_cv(strength, condition, modcv = TRUE, override = "all")
expect_equal(res$basis$value[res$basis$group == "CTD"],
90.31, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "RTD"],
84.83, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "ETW"],
56.43, tolerance = 0.001)
res
#>
#> Call:
#> basis_pooled_cv(data = ., x = strength, groups = condition, modcv = TRUE,
#> override = "all")
#>
#> Distribution: Normal - Pooled CV ( n = 65, r = 3 )
#> Modified CV Approach Used
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_group_variability`,
#> `outliers_within_group`,
#> `pooled_data_normal`,
#> `normalized_variance_equal`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> CTD 90.30646
#> ETW 56.42786
#> RTD 84.82748
res <- dat_8_3_11_1_2 %>%
filter(condition != "ETW2") %>%
basis_pooled_cv(strength, condition, modcv = TRUE,
p = 0.99, conf = 0.95, override = "all")
expect_equal(res$basis$value[res$basis$group == "CTD"],
80.57, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "RTD"],
75.69, tolerance = 0.001)
expect_equal(res$basis$value[res$basis$group == "ETW"],
50.33, tolerance = 0.001)
res
#>
#> Call:
#> basis_pooled_cv(data = ., x = strength, groups = condition, p = 0.99,
#> conf = 0.95, modcv = TRUE, override = "all")
#>
#> Distribution: Normal - Pooled CV ( n = 65, r = 3 )
#> Modified CV Approach Used
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_group_variability`,
#> `outliers_within_group`,
#> `pooled_data_normal`,
#> `normalized_variance_equal`
#> A-Basis: ( p = 0.99 , conf = 0.95 )
#> CTD 80.56529
#> ETW 50.32422
#> RTD 75.6817
This section contains various small data sets. In most cases, these
data sets were generated randomly for the purpose of comparing
cmstatr
to other software.
The following data set was randomly generated. When this is entered
into STAT17 [3], that software gives the
value
,
which matches the result of cmstatr
within a small
margin.
dat <- data.frame(
strength = c(
137.4438, 139.5395, 150.89, 141.4474, 141.8203, 151.8821, 143.9245,
132.9732, 136.6419, 138.1723, 148.7668, 143.283, 143.5429,
141.7023, 137.4732, 152.338, 144.1589, 128.5218
)
)
res <- anderson_darling_normal(dat, strength)
expect_equal(res$osl, 0.465, tolerance = 0.001)
res
#>
#> Call:
#> anderson_darling_normal(data = dat, x = strength)
#>
#> Distribution: Normal ( n = 18 )
#> Test statistic: A = 0.2849726
#> OSL (p-value): 0.4648132 (assuming unknown parameters)
#> Conclusion: Sample is drawn from a Normal distribution ( alpha = 0.05 )
The following data set was randomly generated. When this is entered
into STAT17 [3], that software gives the
value
,
which matches the result of cmstatr
within a small
margin.
dat <- data.frame(
strength = c(
137.4438, 139.5395, 150.89, 141.4474, 141.8203, 151.8821, 143.9245,
132.9732, 136.6419, 138.1723, 148.7668, 143.283, 143.5429,
141.7023, 137.4732, 152.338, 144.1589, 128.5218
)
)
res <- anderson_darling_lognormal(dat, strength)
expect_equal(res$osl, 0.480, tolerance = 0.001)
res
#>
#> Call:
#> anderson_darling_lognormal(data = dat, x = strength)
#>
#> Distribution: Lognormal ( n = 18 )
#> Test statistic: A = 0.2774652
#> OSL (p-value): 0.4798148 (assuming unknown parameters)
#> Conclusion: Sample is drawn from a Lognormal distribution ( alpha = 0.05 )
The following data set was randomly generated. When this is entered
into STAT17 [3], that software gives the
value
,
which matches the result of cmstatr
within a small
margin.
dat <- data.frame(
strength = c(
137.4438, 139.5395, 150.89, 141.4474, 141.8203, 151.8821, 143.9245,
132.9732, 136.6419, 138.1723, 148.7668, 143.283, 143.5429,
141.7023, 137.4732, 152.338, 144.1589, 128.5218
)
)
res <- anderson_darling_weibull(dat, strength)
expect_equal(res$osl, 0.179, tolerance = 0.002)
res
#>
#> Call:
#> anderson_darling_weibull(data = dat, x = strength)
#>
#> Distribution: Weibull ( n = 18 )
#> Test statistic: A = 0.5113909
#> OSL (p-value): 0.1787882 (assuming unknown parameters)
#> Conclusion: Sample is drawn from a Weibull distribution ( alpha = 0.05 )
The following data was input into STAT17 and the A- and B-Basis
values were computed assuming normally distributed data. The results
were 120.336 and 129.287, respectively. cmstatr
reports
very similar values.
dat <- c(
137.4438, 139.5395, 150.8900, 141.4474, 141.8203, 151.8821, 143.9245,
132.9732, 136.6419, 138.1723, 148.7668, 143.2830, 143.5429, 141.7023,
137.4732, 152.3380, 144.1589, 128.5218
)
res <- basis_normal(x = dat, p = 0.99, conf = 0.95, override = "all")
expect_equal(res$basis, 120.336, tolerance = 0.0005)
res
#>
#> Call:
#> basis_normal(x = dat, p = 0.99, conf = 0.95, override = "all")
#>
#> Distribution: Normal ( n = 18 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_batch_variability`,
#> `outliers`,
#> `anderson_darling_normal`
#> A-Basis: ( p = 0.99 , conf = 0.95 )
#> 120.3549
res <- basis_normal(x = dat, p = 0.9, conf = 0.95, override = "all")
expect_equal(res$basis, 129.287, tolerance = 0.0005)
res
#>
#> Call:
#> basis_normal(x = dat, p = 0.9, conf = 0.95, override = "all")
#>
#> Distribution: Normal ( n = 18 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_batch_variability`,
#> `outliers`,
#> `anderson_darling_normal`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> 129.2898
The following data was input into STAT17 and the A- and B-Basis
values were computed assuming distributed according to a lognormal
distribution. The results were 121.710 and 129.664, respectively.
cmstatr
reports very similar values.
dat <- c(
137.4438, 139.5395, 150.8900, 141.4474, 141.8203, 151.8821, 143.9245,
132.9732, 136.6419, 138.1723, 148.7668, 143.2830, 143.5429, 141.7023,
137.4732, 152.3380, 144.1589, 128.5218
)
res <- basis_lognormal(x = dat, p = 0.99, conf = 0.95, override = "all")
expect_equal(res$basis, 121.710, tolerance = 0.0005)
res
#>
#> Call:
#> basis_lognormal(x = dat, p = 0.99, conf = 0.95, override = "all")
#>
#> Distribution: Lognormal ( n = 18 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_batch_variability`,
#> `outliers`,
#> `anderson_darling_lognormal`
#> A-Basis: ( p = 0.99 , conf = 0.95 )
#> 121.7265
res <- basis_lognormal(x = dat, p = 0.9, conf = 0.95, override = "all")
expect_equal(res$basis, 129.664, tolerance = 0.0005)
res
#>
#> Call:
#> basis_lognormal(x = dat, p = 0.9, conf = 0.95, override = "all")
#>
#> Distribution: Lognormal ( n = 18 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_batch_variability`,
#> `outliers`,
#> `anderson_darling_lognormal`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> 129.667
The following data was input into STAT17 and the A- and B-Basis
values were computed assuming data following the Weibull distribution.
The results were 109.150 and 125.441, respectively. cmstatr
reports very similar values.
dat <- c(
137.4438, 139.5395, 150.8900, 141.4474, 141.8203, 151.8821, 143.9245,
132.9732, 136.6419, 138.1723, 148.7668, 143.2830, 143.5429, 141.7023,
137.4732, 152.3380, 144.1589, 128.5218
)
res <- basis_weibull(x = dat, p = 0.99, conf = 0.95, override = "all")
expect_equal(res$basis, 109.150, tolerance = 0.005)
res
#>
#> Call:
#> basis_weibull(x = dat, p = 0.99, conf = 0.95, override = "all")
#>
#> Distribution: Weibull ( n = 18 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_batch_variability`,
#> `outliers`,
#> `anderson_darling_weibull`
#> A-Basis: ( p = 0.99 , conf = 0.95 )
#> 109.651
res <- basis_weibull(x = dat, p = 0.9, conf = 0.95, override = "all")
expect_equal(res$basis, 125.441, tolerance = 0.005)
res
#>
#> Call:
#> basis_weibull(x = dat, p = 0.9, conf = 0.95, override = "all")
#>
#> Distribution: Weibull ( n = 18 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_batch_variability`,
#> `outliers`,
#> `anderson_darling_weibull`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> 125.7338
The following data was input into STAT17 and the A- and B-Basis
values were computed using the nonparametric (small sample) method. The
results were 99.651 and 124.156, respectively. cmstatr
reports very similar values.
dat <- c(
137.4438, 139.5395, 150.8900, 141.4474, 141.8203, 151.8821, 143.9245,
132.9732, 136.6419, 138.1723, 148.7668, 143.2830, 143.5429, 141.7023,
137.4732, 152.3380, 144.1589, 128.5218
)
res <- basis_hk_ext(x = dat, p = 0.99, conf = 0.95,
method = "woodward-frawley", override = "all")
expect_equal(res$basis, 99.651, tolerance = 0.005)
res
#>
#> Call:
#> basis_hk_ext(x = dat, p = 0.99, conf = 0.95, method = "woodward-frawley",
#> override = "all")
#>
#> Distribution: Nonparametric (Extended Hanson-Koopmans, Woodward-Frawley method) ( n = 18 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_batch_variability`,
#> `outliers`,
#> `correct_method_used`,
#> `sample_size`
#> A-Basis: ( p = 0.99 , conf = 0.95 )
#> 99.65098
res <- basis_hk_ext(x = dat, p = 0.9, conf = 0.95,
method = "optimum-order", override = "all")
expect_equal(res$basis, 124.156, tolerance = 0.005)
res
#>
#> Call:
#> basis_hk_ext(x = dat, p = 0.9, conf = 0.95, method = "optimum-order",
#> override = "all")
#>
#> Distribution: Nonparametric (Extended Hanson-Koopmans, optimum two-order-statistic method) ( n = 18 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_batch_variability`,
#> `outliers`,
#> `correct_method_used`,
#> `sample_size`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> 124.1615
The following random numbers were generated.
dat <- c(
139.6734, 143.0032, 130.4757, 144.8327, 138.7818, 136.7693, 148.636,
131.0095, 131.4933, 142.8856, 158.0198, 145.2271, 137.5991, 139.8298,
140.8557, 137.6148, 131.3614, 152.7795, 145.8792, 152.9207, 160.0989,
145.1920, 128.6383, 141.5992, 122.5297, 159.8209, 151.6720, 159.0156
)
All of the numbers above were input into STAT17 and the reported
B-Basis value using the Optimum Order nonparametric method was
122.36798. This result matches the results of cmstatr
within a small margin.
res <- basis_hk_ext(x = dat, p = 0.9, conf = 0.95,
method = "optimum-order", override = "all")
expect_equal(res$basis, 122.36798, tolerance = 0.001)
res
#>
#> Call:
#> basis_hk_ext(x = dat, p = 0.9, conf = 0.95, method = "optimum-order",
#> override = "all")
#>
#> Distribution: Nonparametric (Extended Hanson-Koopmans, optimum two-order-statistic method) ( n = 28 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_batch_variability`,
#> `outliers`,
#> `correct_method_used`,
#> `sample_size`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> 122.3638
The last two observations from the above data set were discarded,
leaving 26 observations. This smaller data set was input into STAT17 and
that software calculated a B-Basis value of 121.57073 using the Optimum
Order nonparametric method. cmstatr
reports a very similar
number.
res <- basis_hk_ext(x = head(dat, 26), p = 0.9, conf = 0.95,
method = "optimum-order", override = "all")
expect_equal(res$basis, 121.57073, tolerance = 0.001)
res
#>
#> Call:
#> basis_hk_ext(x = head(dat, 26), p = 0.9, conf = 0.95, method = "optimum-order",
#> override = "all")
#>
#> Distribution: Nonparametric (Extended Hanson-Koopmans, optimum two-order-statistic method) ( n = 26 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_batch_variability`,
#> `outliers`,
#> `correct_method_used`,
#> `sample_size`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> 121.5655
The same data set was further reduced such that only the first 22
observations were included. This smaller data set was input into STAT17
and that software calculated a B-Basis value of 128.82397 using the
Optimum Order nonparametric method. cmstatr
reports a very
similar number.
res <- basis_hk_ext(x = head(dat, 22), p = 0.9, conf = 0.95,
method = "optimum-order", override = "all")
expect_equal(res$basis, 128.82397, tolerance = 0.001)
res
#>
#> Call:
#> basis_hk_ext(x = head(dat, 22), p = 0.9, conf = 0.95, method = "optimum-order",
#> override = "all")
#>
#> Distribution: Nonparametric (Extended Hanson-Koopmans, optimum two-order-statistic method) ( n = 22 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_batch_variability`,
#> `outliers`,
#> `correct_method_used`,
#> `sample_size`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> 128.8224
The following data was input into STAT17 and the B-Basis value was
computed using the nonparametric (large sample) method. The results was
122.738297. cmstatr
reports very similar values.
dat <- c(
137.3603, 135.6665, 136.6914, 154.7919, 159.2037, 137.3277, 128.821,
138.6304, 138.9004, 147.4598, 148.6622, 144.4948, 131.0851, 149.0203,
131.8232, 146.4471, 123.8124, 126.3105, 140.7609, 134.4875, 128.7508,
117.1854, 129.3088, 141.6789, 138.4073, 136.0295, 128.4164, 141.7733,
134.455, 122.7383, 136.9171, 136.9232, 138.8402, 152.8294, 135.0633,
121.052, 131.035, 138.3248, 131.1379, 147.3771, 130.0681, 132.7467,
137.1444, 141.662, 146.9363, 160.7448, 138.5511, 129.1628, 140.2939,
144.8167, 156.5918, 132.0099, 129.3551, 136.6066, 134.5095, 128.2081,
144.0896, 141.8029, 130.0149, 140.8813, 137.7864
)
res <- basis_nonpara_large_sample(x = dat, p = 0.9, conf = 0.95,
override = "all")
expect_equal(res$basis, 122.738297, tolerance = 0.005)
res
#>
#> Call:
#> basis_nonpara_large_sample(x = dat, p = 0.9, conf = 0.95, override = "all")
#>
#> Distribution: Nonparametric (large sample) ( n = 61 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`,
#> `between_batch_variability`,
#> `outliers`,
#> `sample_size`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> 122.7383
Results from cmstatr
’s equiv_mean_extremum
function were compared with results from HYTEQ. The summary statistics
for the qualification data were set as mean = 141.310
and
sd=6.415
. For a value of alpha=0.05
and
n = 9
, HYTEQ reported thresholds of 123.725 and 137.197 for
minimum individual and mean, respectively. cmstatr
produces
very similar results.
res <- equiv_mean_extremum(alpha = 0.05, mean_qual = 141.310, sd_qual = 6.415,
n_sample = 9)
expect_equal(res$threshold_min_indiv, 123.725, tolerance = 0.001)
expect_equal(res$threshold_mean, 137.197, tolerance = 0.001)
res
#>
#> Call:
#> equiv_mean_extremum(mean_qual = 141.31, sd_qual = 6.415, n_sample = 9,
#> alpha = 0.05)
#>
#> For alpha = 0.05 and n = 9
#> ( k1 = 2.741054 and k2 = 0.6410852 )
#> Min Individual Sample Mean
#> Thresholds: 123.7261 137.1974
Using the same parameters, but using the modified CV method, HYTEQ
produces thresholds of 117.024 and 135.630 for minimum individual and
mean, respectively. cmstatr
produces very similar
results.
res <- equiv_mean_extremum(alpha = 0.05, mean_qual = 141.310, sd_qual = 6.415,
n_sample = 9, modcv = TRUE)
expect_equal(res$threshold_min_indiv, 117.024, tolerance = 0.001)
expect_equal(res$threshold_mean, 135.630, tolerance = 0.001)
res
#>
#> Call:
#> equiv_mean_extremum(mean_qual = 141.31, sd_qual = 6.415, n_sample = 9,
#> alpha = 0.05, modcv = TRUE)
#>
#> Modified CV used: CV* = 0.06269832 ( CV = 0.04539665 )
#>
#> For alpha = 0.05 and n = 9
#> ( k1 = 2.741054 and k2 = 0.6410852 )
#> Min Individual Sample Mean
#> Thresholds: 117.0245 135.63
Results from cmstatr
’s equiv_change_mean
function were compared with results from HYTEQ. The following parameters
were used. A value of alpha = 0.05
was selected.
Parameter | Qualification | Sample |
---|---|---|
Mean | 9.24 | 9.02 |
SD | 0.162 | 0.15785 |
n | 28 | 9 |
HYTEQ gives an acceptance range of 9.115 to 9.365.
cmstatr
produces similar results.
res <- equiv_change_mean(alpha = 0.05, n_sample = 9, mean_sample = 9.02,
sd_sample = 0.15785, n_qual = 28, mean_qual = 9.24,
sd_qual = 0.162)
expect_equal(res$threshold, c(9.115, 9.365), tolerance = 0.001)
res
#>
#> Call:
#> equiv_change_mean(n_qual = 28, mean_qual = 9.24, sd_qual = 0.162,
#> n_sample = 9, mean_sample = 9.02, sd_sample = 0.15785, alpha = 0.05)
#>
#> For alpha = 0.05
#> Qualification Sample
#> Number 28 9
#> Mean 9.24 9.02
#> SD 0.162 0.15785
#> Pooled SD 0.1610609
#> t0 -3.564775
#> Result FAIL
#> Passing Range 9.114712 to 9.365288
After selecting the modified CV method, HYTEQ gives an acceptance
range of 8.857 to 9.623. cmstatr
produces similar
results.
res <- equiv_change_mean(alpha = 0.05, n_sample = 9, mean_sample = 9.02,
sd_sample = 0.15785, n_qual = 28, mean_qual = 9.24,
sd_qual = 0.162, modcv = TRUE)
expect_equal(res$threshold, c(8.857, 9.623), tolerance = 0.001)
res
#>
#> Call:
#> equiv_change_mean(n_qual = 28, mean_qual = 9.24, sd_qual = 0.162,
#> n_sample = 9, mean_sample = 9.02, sd_sample = 0.15785, alpha = 0.05,
#> modcv = TRUE)
#>
#> For alpha = 0.05
#> Modified CV used
#> Qualification Sample
#> Number 28 9
#> Mean 9.24 9.02
#> SD 0.162 0.15785
#> Pooled SD 0.4927484
#> t0 -1.16519
#> Result PASS
#> Passing Range 8.856695 to 9.623305
In this section, results from cmstatr
are compared with
values published in literature.
[4] provides example data that compares measurements obtained in four labs. Their paper gives values of the ADK test statistic as well as p-values.
The data in [4] is as follows:
dat_ss1987 <- data.frame(
smoothness = c(
38.7, 41.5, 43.8, 44.5, 45.5, 46.0, 47.7, 58.0,
39.2, 39.3, 39.7, 41.4, 41.8, 42.9, 43.3, 45.8,
34.0, 35.0, 39.0, 40.0, 43.0, 43.0, 44.0, 45.0,
34.0, 34.8, 34.8, 35.4, 37.2, 37.8, 41.2, 42.8
),
lab = c(rep("A", 8), rep("B", 8), rep("C", 8), rep("D", 8))
)
dat_ss1987
#> smoothness lab
#> 1 38.7 A
#> 2 41.5 A
#> 3 43.8 A
#> 4 44.5 A
#> 5 45.5 A
#> 6 46.0 A
#> 7 47.7 A
#> 8 58.0 A
#> 9 39.2 B
#> 10 39.3 B
#> 11 39.7 B
#> 12 41.4 B
#> 13 41.8 B
#> 14 42.9 B
#> 15 43.3 B
#> 16 45.8 B
#> 17 34.0 C
#> 18 35.0 C
#> 19 39.0 C
#> 20 40.0 C
#> 21 43.0 C
#> 22 43.0 C
#> 23 44.0 C
#> 24 45.0 C
#> 25 34.0 D
#> 26 34.8 D
#> 27 34.8 D
#> 28 35.4 D
#> 29 37.2 D
#> 30 37.8 D
#> 31 41.2 D
#> 32 42.8 D
[4] lists the corresponding test
statistics
and
with the p-value
.
These match the result of cmstatr
within a small
margin.
res <- ad_ksample(dat_ss1987, smoothness, lab)
expect_equal(res$ad, 8.3926, tolerance = 0.001)
expect_equal(res$sigma, 1.2038, tolerance = 0.001)
expect_equal(res$p, 0.00226, tolerance = 0.01)
res
#>
#> Call:
#> ad_ksample(data = dat_ss1987, x = smoothness, groups = lab)
#>
#> N = 32 k = 4
#> ADK = 8.39 p-value = 0.002255
#> Conclusion: Samples do not come from the same distribution (alpha = 0.025 )
Various factors, such as tolerance limit factors, are published in
various publications. This section compares those published factors with
those computed by cmstatr
.
B-Basis tolerance limit factors assuming a normal distribution are
published in CMH-17-1G. Those factors are reproduced below and are
compared with the results of cmstatr
. The published factors
and those computed by cmstatr
are quite similar.
tribble(
~n, ~kB_published,
2, 20.581, 36, 1.725, 70, 1.582, 104, 1.522,
3, 6.157, 37, 1.718, 71, 1.579, 105, 1.521,
4, 4.163, 38, 1.711, 72, 1.577, 106, 1.519,
5, 3.408, 39, 1.704, 73, 1.575, 107, 1.518,
6, 3.007, 40, 1.698, 74, 1.572, 108, 1.517,
7, 2.756, 41, 1.692, 75, 1.570, 109, 1.516,
8, 2.583, 42, 1.686, 76, 1.568, 110, 1.515,
9, 2.454, 43, 1.680, 77, 1.566, 111, 1.513,
10, 2.355, 44, 1.675, 78, 1.564, 112, 1.512,
11, 2.276, 45, 1.669, 79, 1.562, 113, 1.511,
12, 2.211, 46, 1.664, 80, 1.560, 114, 1.510,
13, 2.156, 47, 1.660, 81, 1.558, 115, 1.509,
14, 2.109, 48, 1.655, 82, 1.556, 116, 1.508,
15, 2.069, 49, 1.650, 83, 1.554, 117, 1.507,
16, 2.034, 50, 1.646, 84, 1.552, 118, 1.506,
17, 2.002, 51, 1.642, 85, 1.551, 119, 1.505,
18, 1.974, 52, 1.638, 86, 1.549, 120, 1.504,
19, 1.949, 53, 1.634, 87, 1.547, 121, 1.503,
20, 1.927, 54, 1.630, 88, 1.545, 122, 1.502,
21, 1.906, 55, 1.626, 89, 1.544, 123, 1.501,
22, 1.887, 56, 1.623, 90, 1.542, 124, 1.500,
23, 1.870, 57, 1.619, 91, 1.540, 125, 1.499,
24, 1.854, 58, 1.616, 92, 1.539, 126, 1.498,
25, 1.839, 59, 1.613, 93, 1.537, 127, 1.497,
26, 1.825, 60, 1.609, 94, 1.536, 128, 1.496,
27, 1.812, 61, 1.606, 95, 1.534, 129, 1.495,
28, 1.800, 62, 1.603, 96, 1.533, 130, 1.494,
29, 1.789, 63, 1.600, 97, 1.531, 131, 1.493,
30, 1.778, 64, 1.597, 98, 1.530, 132, 1.492,
31, 1.768, 65, 1.595, 99, 1.529, 133, 1.492,
32, 1.758, 66, 1.592, 100, 1.527, 134, 1.491,
33, 1.749, 67, 1.589, 101, 1.526, 135, 1.490,
34, 1.741, 68, 1.587, 102, 1.525, 136, 1.489,
35, 1.733, 69, 1.584, 103, 1.523, 137, 1.488
) %>%
arrange(n) %>%
mutate(kB_cmstatr = k_factor_normal(n, p = 0.9, conf = 0.95)) %>%
rowwise() %>%
mutate(diff = expect_equal(kB_published, kB_cmstatr, tolerance = 0.001)) %>%
select(-c(diff))
#> # A tibble: 136 × 3
#> # Rowwise:
#> n kB_published kB_cmstatr
#> <dbl> <dbl> <dbl>
#> 1 2 20.6 20.6
#> 2 3 6.16 6.16
#> 3 4 4.16 4.16
#> 4 5 3.41 3.41
#> 5 6 3.01 3.01
#> 6 7 2.76 2.76
#> 7 8 2.58 2.58
#> 8 9 2.45 2.45
#> 9 10 2.36 2.35
#> 10 11 2.28 2.28
#> # ℹ 126 more rows
A-Basis tolerance limit factors assuming a normal distribution are
published in CMH-17-1G. Those factors are reproduced below and are
compared with the results of cmstatr
. The published factors
and those computed by cmstatr
are quite similar.
tribble(
~n, ~kA_published,
2, 37.094, 36, 2.983, 70, 2.765, 104, 2.676,
3, 10.553, 37, 2.972, 71, 2.762, 105, 2.674,
4, 7.042, 38, 2.961, 72, 2.758, 106, 2.672,
5, 5.741, 39, 2.951, 73, 2.755, 107, 2.671,
6, 5.062, 40, 2.941, 74, 2.751, 108, 2.669,
7, 4.642, 41, 2.932, 75, 2.748, 109, 2.667,
8, 4.354, 42, 2.923, 76, 2.745, 110, 2.665,
9, 4.143, 43, 2.914, 77, 2.742, 111, 2.663,
10, 3.981, 44, 2.906, 78, 2.739, 112, 2.662,
11, 3.852, 45, 2.898, 79, 2.736, 113, 2.660,
12, 3.747, 46, 2.890, 80, 2.733, 114, 2.658,
13, 3.659, 47, 2.883, 81, 2.730, 115, 2.657,
14, 3.585, 48, 2.876, 82, 2.727, 116, 2.655,
15, 3.520, 49, 2.869, 83, 2.724, 117, 2.654,
16, 3.464, 50, 2.862, 84, 2.721, 118, 2.652,
17, 3.414, 51, 2.856, 85, 2.719, 119, 2.651,
18, 3.370, 52, 2.850, 86, 2.716, 120, 2.649,
19, 3.331, 53, 2.844, 87, 2.714, 121, 2.648,
20, 3.295, 54, 2.838, 88, 2.711, 122, 2.646,
21, 3.263, 55, 2.833, 89, 2.709, 123, 2.645,
22, 3.233, 56, 2.827, 90, 2.706, 124, 2.643,
23, 3.206, 57, 2.822, 91, 2.704, 125, 2.642,
24, 3.181, 58, 2.817, 92, 2.701, 126, 2.640,
25, 3.158, 59, 2.812, 93, 2.699, 127, 2.639,
26, 3.136, 60, 2.807, 94, 2.697, 128, 2.638,
27, 3.116, 61, 2.802, 95, 2.695, 129, 2.636,
28, 3.098, 62, 2.798, 96, 2.692, 130, 2.635,
29, 3.080, 63, 2.793, 97, 2.690, 131, 2.634,
30, 3.064, 64, 2.789, 98, 2.688, 132, 2.632,
31, 3.048, 65, 2.785, 99, 2.686, 133, 2.631,
32, 3.034, 66, 2.781, 100, 2.684, 134, 2.630,
33, 3.020, 67, 2.777, 101, 2.682, 135, 2.628,
34, 3.007, 68, 2.773, 102, 2.680, 136, 2.627,
35, 2.995, 69, 2.769, 103, 2.678, 137, 2.626
) %>%
arrange(n) %>%
mutate(kA_cmstatr = k_factor_normal(n, p = 0.99, conf = 0.95)) %>%
rowwise() %>%
mutate(diff = expect_equal(kA_published, kA_cmstatr, tolerance = 0.001)) %>%
select(-c(diff))
#> # A tibble: 136 × 3
#> # Rowwise:
#> n kA_published kA_cmstatr
#> <dbl> <dbl> <dbl>
#> 1 2 37.1 37.1
#> 2 3 10.6 10.6
#> 3 4 7.04 7.04
#> 4 5 5.74 5.74
#> 5 6 5.06 5.06
#> 6 7 4.64 4.64
#> 7 8 4.35 4.35
#> 8 9 4.14 4.14
#> 9 10 3.98 3.98
#> 10 11 3.85 3.85
#> # ℹ 126 more rows
Vangel [5] provides extensive tables of
for the case where
and
is the median observation. This section checks the results of
cmstatr
’s function against those tables. Only the odd
values of
are checked so that the median is a single observation. The unit tests
for the cmstatr
package include checks of a variety of
values of
and confidence, but only the factors for B-Basis are checked here.
tribble(
~n, ~z,
3, 28.820048,
5, 6.1981307,
7, 3.4780112,
9, 2.5168762,
11, 2.0312134,
13, 1.7377374,
15, 1.5403989,
17, 1.3979806,
19, 1.2899172,
21, 1.2048089,
23, 1.1358259,
25, 1.0786237,
27, 1.0303046,
) %>%
rowwise() %>%
mutate(
z_calc = hk_ext_z(n, 1, ceiling(n / 2), p = 0.90, conf = 0.95)
) %>%
mutate(diff = expect_equal(z, z_calc, tolerance = 0.0001)) %>%
select(-c(diff))
#> # A tibble: 13 × 3
#> # Rowwise:
#> n z z_calc
#> <dbl> <dbl> <dbl>
#> 1 3 28.8 28.8
#> 2 5 6.20 6.20
#> 3 7 3.48 3.48
#> 4 9 2.52 2.52
#> 5 11 2.03 2.03
#> 6 13 1.74 1.74
#> 7 15 1.54 1.54
#> 8 17 1.40 1.40
#> 9 19 1.29 1.29
#> 10 21 1.20 1.20
#> 11 23 1.14 1.14
#> 12 25 1.08 1.08
#> 13 27 1.03 1.03
CMH-17-1G provides Table 8.5.15, which contains factors for
calculating A-Basis values using the Extended Hanson–Koopmans
nonparametric method. That table is reproduced in part here and the
factors are compared with those computed by cmstatr
. More
extensive checks are performed in the unit test of the
cmstatr
package. The factors computed by
cmstatr
are very similar to those published in
CMH-17-1G.
tribble(
~n, ~k,
2, 80.0038,
4, 9.49579,
6, 5.57681,
8, 4.25011,
10, 3.57267,
12, 3.1554,
14, 2.86924,
16, 2.65889,
18, 2.4966,
20, 2.36683,
25, 2.131,
30, 1.96975,
35, 1.85088,
40, 1.75868,
45, 1.68449,
50, 1.62313,
60, 1.5267,
70, 1.45352,
80, 1.39549,
90, 1.34796,
100, 1.30806,
120, 1.24425,
140, 1.19491,
160, 1.15519,
180, 1.12226,
200, 1.09434,
225, 1.06471,
250, 1.03952,
275, 1.01773
) %>%
rowwise() %>%
mutate(z_calc = hk_ext_z(n, 1, n, 0.99, 0.95)) %>%
mutate(diff = expect_lt(abs(k - z_calc), 0.0001)) %>%
select(-c(diff))
#> # A tibble: 29 × 3
#> # Rowwise:
#> n k z_calc
#> <dbl> <dbl> <dbl>
#> 1 2 80.0 80.0
#> 2 4 9.50 9.50
#> 3 6 5.58 5.58
#> 4 8 4.25 4.25
#> 5 10 3.57 3.57
#> 6 12 3.16 3.16
#> 7 14 2.87 2.87
#> 8 16 2.66 2.66
#> 9 18 2.50 2.50
#> 10 20 2.37 2.37
#> # ℹ 19 more rows
CMH-17-1G Table 8.5.14 provides ranks orders and factors for
computing nonparametric B-Basis values. This table is reproduced below
and compared with the results of cmstatr
. The results are
similar. In some cases, the rank order
(
in CMH-17-1G or
in cmstatr
) and the the factor
()
are different. These differences are discussed in detail in the vignette
Extended Hanson-Koopmans.
tribble(
~n, ~r, ~k,
2, 2, 35.177,
3, 3, 7.859,
4, 4, 4.505,
5, 4, 4.101,
6, 5, 3.064,
7, 5, 2.858,
8, 6, 2.382,
9, 6, 2.253,
10, 6, 2.137,
11, 7, 1.897,
12, 7, 1.814,
13, 7, 1.738,
14, 8, 1.599,
15, 8, 1.540,
16, 8, 1.485,
17, 8, 1.434,
18, 9, 1.354,
19, 9, 1.311,
20, 10, 1.253,
21, 10, 1.218,
22, 10, 1.184,
23, 11, 1.143,
24, 11, 1.114,
25, 11, 1.087,
26, 11, 1.060,
27, 11, 1.035,
28, 12, 1.010
) %>%
rowwise() %>%
mutate(r_calc = hk_ext_z_j_opt(n, 0.90, 0.95)$j) %>%
mutate(k_calc = hk_ext_z_j_opt(n, 0.90, 0.95)$z)
#> # A tibble: 27 × 5
#> # Rowwise:
#> n r k r_calc k_calc
#> <dbl> <dbl> <dbl> <int> <dbl>
#> 1 2 2 35.2 2 35.2
#> 2 3 3 7.86 3 7.86
#> 3 4 4 4.50 4 4.51
#> 4 5 4 4.10 4 4.10
#> 5 6 5 3.06 5 3.06
#> 6 7 5 2.86 5 2.86
#> 7 8 6 2.38 6 2.38
#> 8 9 6 2.25 6 2.25
#> 9 10 6 2.14 6 2.14
#> 10 11 7 1.90 7 1.90
#> # ℹ 17 more rows
CMH-17-1G Table 8.5.12 provides factors for computing B-Basis values
using the nonparametric binomial rank method. Part of that table is
reproduced below and compared with the results of cmstatr
.
The results of cmstatr
are similar to the published values.
A more complete comparison is performed in the units tests of the
cmstatr
package.
tribble(
~n, ~rb,
29, 1,
46, 2,
61, 3,
76, 4,
89, 5,
103, 6,
116, 7,
129, 8,
142, 9,
154, 10,
167, 11,
179, 12,
191, 13,
203, 14
) %>%
rowwise() %>%
mutate(r_calc = nonpara_binomial_rank(n, 0.9, 0.95)) %>%
mutate(test = expect_equal(rb, r_calc)) %>%
select(-c(test))
#> # A tibble: 14 × 3
#> # Rowwise:
#> n rb r_calc
#> <dbl> <dbl> <dbl>
#> 1 29 1 1
#> 2 46 2 2
#> 3 61 3 3
#> 4 76 4 4
#> 5 89 5 5
#> 6 103 6 6
#> 7 116 7 7
#> 8 129 8 8
#> 9 142 9 9
#> 10 154 10 10
#> 11 167 11 11
#> 12 179 12 12
#> 13 191 13 13
#> 14 203 14 14
CMH-17-1G Table 8.5.13 provides factors for computing B-Basis values
using the nonparametric binomial rank method. Part of that table is
reproduced below and compared with the results of cmstatr
.
The results of cmstatr
are similar to the published values.
A more complete comparison is performed in the units tests of the
cmstatr
package.
tribble(
~n, ~ra,
299, 1,
473, 2,
628, 3,
773, 4,
913, 5
) %>%
rowwise() %>%
mutate(r_calc = nonpara_binomial_rank(n, 0.99, 0.95)) %>%
mutate(test = expect_equal(ra, r_calc)) %>%
select(-c(test))
#> # A tibble: 5 × 3
#> # Rowwise:
#> n ra r_calc
#> <dbl> <dbl> <dbl>
#> 1 299 1 1
#> 2 473 2 2
#> 3 628 3 3
#> 4 773 4 4
#> 5 913 5 5
Vangel’s 2002 paper provides factors for calculating limits for
sample mean and sample extremum for various values of
and sample size
().
A subset of those factors are reproduced below and compared with results
from cmstatr
. The results are very similar for values of
and
that are common for composite materials.
read.csv(system.file("extdata", "k1.vangel.csv",
package = "cmstatr")) %>%
gather(n, k1, X2:X10) %>%
mutate(n = as.numeric(substring(n, 2))) %>%
inner_join(
read.csv(system.file("extdata", "k2.vangel.csv",
package = "cmstatr")) %>%
gather(n, k2, X2:X10) %>%
mutate(n = as.numeric(substring(n, 2))),
by = c("n" = "n", "alpha" = "alpha")
) %>%
filter(n >= 5 & (alpha == 0.01 | alpha == 0.05)) %>%
group_by(n, alpha) %>%
nest() %>%
mutate(equiv = map2(alpha, n, ~k_equiv(.x, .y))) %>%
mutate(k1_calc = map(equiv, function(e) e[1]),
k2_calc = map(equiv, function(e) e[2])) %>%
select(-c(equiv)) %>%
unnest(cols = c(data, k1_calc, k2_calc)) %>%
mutate(check = expect_equal(k1, k1_calc, tolerance = 0.0001)) %>%
select(-c(check)) %>%
mutate(check = expect_equal(k2, k2_calc, tolerance = 0.0001)) %>%
select(-c(check))
#> # A tibble: 12 × 6
#> # Groups: n, alpha [12]
#> alpha n k1 k2 k1_calc k2_calc
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.05 5 2.53 0.852 2.53 0.853
#> 2 0.01 5 3.07 1.14 3.07 1.14
#> 3 0.05 6 2.60 0.781 2.60 0.781
#> 4 0.01 6 3.13 1.04 3.13 1.04
#> 5 0.05 7 2.65 0.725 2.65 0.725
#> 6 0.01 7 3.18 0.968 3.18 0.968
#> 7 0.05 8 2.7 0.679 2.70 0.679
#> 8 0.01 8 3.22 0.906 3.22 0.906
#> 9 0.05 9 2.74 0.641 2.74 0.641
#> 10 0.01 9 3.25 0.854 3.25 0.855
#> 11 0.05 10 2.78 0.609 2.78 0.609
#> 12 0.01 10 3.28 0.811 3.28 0.811
This copy of this vignette was build on the following system.
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] testthat_3.2.1.1 tidyr_1.3.1 purrr_1.0.2 dplyr_1.1.4
#> [5] cmstatr_0.10.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 jsonlite_1.8.9 compiler_4.4.2 brio_1.1.5
#> [5] tidyselect_1.2.1 jquerylib_0.1.4 systemfonts_1.1.0 scales_1.3.0
#> [9] textshaping_0.4.0 yaml_2.3.10 fastmap_1.2.0 ggplot2_3.5.1
#> [13] R6_2.5.1 kSamples_1.2-10 generics_0.1.3 knitr_1.49
#> [17] MASS_7.3-61 tibble_3.2.1 desc_1.4.3 rprojroot_2.0.4
#> [21] munsell_0.5.1 bslib_0.8.0 pillar_1.9.0 SuppDists_1.1-9.8
#> [25] rlang_1.1.4 utf8_1.2.4 cachem_1.1.0 xfun_0.49
#> [29] fs_1.6.5 sass_0.4.9 pkgload_1.4.0 cli_3.6.3
#> [33] withr_3.0.2 pkgdown_2.1.1 magrittr_2.0.3 digest_0.6.37
#> [37] grid_4.4.2 lifecycle_1.0.4 waldo_0.6.1 vctrs_0.6.5
#> [41] evaluate_1.0.1 glue_1.8.0 ragg_1.3.3 fansi_1.0.6
#> [45] colorspace_2.1-1 rmarkdown_2.29 tools_4.4.2 pkgconfig_2.0.3
#> [49] htmltools_0.5.8.1