Calculate the basis value for a given data set. There are various functions to calculate the basis values for different distributions. The basis value is the lower one-sided tolerance bound of a certain proportion of the population. For more information on tolerance bounds, see Meeker, et. al. (2017). For B-Basis, set the content of tolerance bound to \(p=0.90\) and the confidence level to \(conf=0.95\); for A-Basis, set \(p=0.99\) and \(conf=0.95\). While other tolerance bound contents and confidence levels may be computed, they are infrequently needed in practice.
These functions also perform some automated diagnostic tests of the data prior to calculating the basis values. These diagnostic tests can be overridden if needed.
basis_normal(
data = NULL,
x,
batch = NULL,
p = 0.9,
conf = 0.95,
override = c()
)
basis_lognormal(
data = NULL,
x,
batch = NULL,
p = 0.9,
conf = 0.95,
override = c()
)
basis_weibull(
data = NULL,
x,
batch = NULL,
p = 0.9,
conf = 0.95,
override = c()
)
basis_pooled_cv(
data = NULL,
x,
groups,
batch = NULL,
p = 0.9,
conf = 0.95,
modcv = FALSE,
override = c()
)
basis_pooled_sd(
data = NULL,
x,
groups,
batch = NULL,
p = 0.9,
conf = 0.95,
modcv = FALSE,
override = c()
)
basis_hk_ext(
data = NULL,
x,
batch = NULL,
p = 0.9,
conf = 0.95,
method = c("optimum-order", "woodward-frawley"),
override = c()
)
basis_nonpara_large_sample(
data = NULL,
x,
batch = NULL,
p = 0.9,
conf = 0.95,
override = c()
)
basis_anova(data = NULL, x, groups, p = 0.9, conf = 0.95, override = c())
a data.frame
the variable in the data.frame for which to find the basis value
the variable in the data.frame that contains the batches.
the content of the tolerance bound. Should be 0.90 for B-Basis and 0.99 for A-Basis
confidence level Should be 0.95 for both A- and B-Basis
a list of names of diagnostic tests to override, if desired. Specifying "all" will override all diagnostic tests applicable to the current method.
the variable in the data.frame representing the groups
a logical value indicating whether the modified CV approach should be used. Only applicable to pooling methods.
the method for Hanson–Koopmans nonparametric basis values. should be "optimum-order" for B-Basis and "woodward-frawley" for A-Basis.
an object of class basis
This object has the following fields:
call
the expression used to call this function
distribution
the distribution used (normal, etc.)
p
the value of \(p\) supplied
conf
the value of \(conf\) supplied
modcv
a logical value indicating whether the modified
CV approach was used. Only applicable to pooling methods.
data
a copy of the data used in the calculation
groups
a copy of the groups variable.
Only used for pooling and ANOVA methods.
batch
a copy of the batch data used for diagnostic tests
modcv_transformed_data
the data after the modified CV transformation
override
a vector of the names of diagnostic tests that
were overridden. NULL
if none were overridden
diagnostic_results
a named character vector containing the
results of all the diagnostic tests. See the Details section for
additional information
diagnostic_obj
a named list containing the objects produced by the
diagnostic tests.
diagnostic_failures
a vector containing any diagnostic tests
that produced failures
n
the number of observations
r
the number of groups, if a pooling method was used.
Otherwise it is NULL.
basis
the basis value computed. This is a number
except when pooling methods are used, in which case it is a data.frame.
data
is an optional argument. If data
is given, it should
be a
data.frame
(or similar object). When data
is specified, the
value of x
is expected to be a variable within data
. If
data
is not specified, x
must be a vector.
When modcv=TRUE
is set, which is only applicable to the
pooling methods,
the data is first modified according to the modified coefficient
of variation (CV)
rules. This modified data is then used when both calculating the
basis values and
also when performing the diagnostic tests. The modified CV approach
is a way of
adding extra variance to datasets with unexpectedly low variance.
basis_normal
calculate the basis value by subtracting \(k\) times
the standard deviation from the mean. \(k\) is given by
the function k_factor_normal()
. The equations in
Krishnamoorthy and Mathew (2008) are used.
basis_normal
also
performs a diagnostic test for outliers (using
maximum_normed_residual()
)
and a diagnostic test for normality (using
anderson_darling_normal()
).
If the argument batch
is given, this function also performs
a diagnostic test for outliers within
each batch (using maximum_normed_residual()
)
and a diagnostic test for between batch variability (using
ad_ksample()
). The argument batch
is only used
for these diagnostic tests.
basis_lognormal
calculates the basis value in the same way
that basis_normal
does, except that the natural logarithm of the
data is taken.
basis_lognormal
function also performs
a diagnostic test for outliers (using
maximum_normed_residual()
)
and a diagnostic test for normality (using
anderson_darling_lognormal()
).
If the argument batch
is given, this function also performs
a diagnostic test for outliers within
each batch (using maximum_normed_residual()
)
and a diagnostic test for between batch variability (using
ad_ksample()
). The argument batch
is only used
for these diagnostic tests.
basis_weibull
calculates the basis value for data distributed
according to a Weibull distribution. The confidence level for the
content requested is calculated using the conditional method, as
described in Lawless (1982) Section 4.1.2b. This has good agreement
with tables published in CMH-17-1G. Results differ between this function
and STAT17 by approximately 0.5\
basis_weibull
function also performs
a diagnostic test for outliers (using
maximum_normed_residual()
)
and a diagnostic test for normality (using
anderson_darling_weibull()
).
If the argument batch
is given, this function also performs
a diagnostic test for outliers within
each batch (using maximum_normed_residual()
)
and a diagnostic test for between batch variability (using
ad_ksample()
). The argument batch
is only used
for these diagnostic tests.
basis_hk_ext
calculates the basis value using the Extended
Hanson–Koopmans method, as described in CMH-17-1G and Vangel (1994).
For nonparametric distributions, this function should be used for samples
up to n=28 for B-Basis and up to \(n=299\) for A-Basis.
This method uses a pair of order statistics to determine the basis value.
CMH-17-1G suggests that for A-Basis, the first and last order statistic
is used: this is called the "woodward-frawley" method in this package,
after the paper in which this approach is described (as referenced
by Vangel (1994)). For B-Basis, another approach is used whereby the
first and j-th
order statistic are used to calculate the basis value.
In this approach, the j-th
order statistic is selected to minimize
the difference between the tolerance limit (assuming that the order
statistics are equal to the expected values from a standard normal
distribution) and the population quantile for a standard normal
distribution. This approach is described in Vangel (1994). This second
method (for use when calculating B-Basis values) is called
"optimum-order" in this package.
The results of basis_hk_ext
have been
verified against example results from the program STAT-17. Agreement is
typically well within 0.2%.
Note that the implementation of hk_ext_z_j_opt
changed after cmstatr
version 0.8.0. This function is used internally by basis_hk_ext
when method = "optimum-order"
. This implementation change may mean
that basis values computed using this method may change slightly
after version 0.8.0. However, both implementations seem to be equally
valid. See the included vignette
for a discussion of the differences between the implementation before
and after version 0.8.0, as well as the factors given in CMH-17-1G.
To access this vignette, run: vignette("hk_ext", package = "cmstatr")
basis_hk_ext
also performs
a diagnostic test for outliers (using
maximum_normed_residual()
)
and performs a pair of tests that the sample size and method selected
follow the guidance described above.
If the argument batch
is given, this function also performs
a diagnostic test for outliers within
each batch (using maximum_normed_residual()
)
and a diagnostic test for between batch variability (using
ad_ksample()
). The argument batch
is only used
for these diagnostic tests.
basis_nonpara_large_sample
calculates the basis value
using the large sample method described in CMH-17-1G. This method uses
a sum of binomials to determine the rank of the ordered statistic
corresponding with the desired tolerance limit (basis value). Results
of this function have been verified against results of the STAT-17
program.
basis_nonpara_large_sample
also performs
a diagnostic test for outliers (using
maximum_normed_residual()
)
and performs a test that the sample size is sufficiently large.
If the argument batch
is given, this function also performs
a diagnostic test for outliers within
each batch (using maximum_normed_residual()
)
and a diagnostic test for between batch variability (using
ad_ksample()
). The argument batch
is only used
for these diagnostic tests.
basis_anova
calculates basis values using the ANOVA method.
x
specifies the data (normally strength) and groups
indicates the group corresponding to each observation. This method is
described in CMH-17-1G, but when the ratio of between-batch mean
square to the within-batch mean square is less than or equal
to one, the tolerance factor is calculated based on pooling the data
from all groups. This approach is recommended by Vangel (1992)
and by Krishnamoorthy and Mathew (2008), and is also implemented
by the software CMH17-STATS and STAT-17.
This function automatically performs a diagnostic
test for outliers within each group
(using maximum_normed_residual()
) and a test for between
group variability (using ad_ksample()
) as well as checking
that the data contains at least 5 groups.
This function has been verified against the results of the STAT-17 program.
basis_pooled_sd
calculates basis values by pooling the data from
several groups together. x
specifies the data (normally strength)
and group
indicates the group corresponding to each observation.
This method is described in CMH-17-1G and matches the pooling method
implemented in ASAP 2008.
basis_pooled_cv
calculates basis values by pooling the data from
several groups together. x
specifies the data (normally strength)
and group
indicates the group corresponding to each observation.
This method is described in CMH-17-1G.
basis_pooled_sd
and basis_pooled_cv
both automatically
perform a number of diagnostic tests. Using
maximum_normed_residual()
, they check that there are no
outliers within each group and batch (provided that batch
is
specified). They check the between batch variability using
ad_ksample()
. They check that there are no outliers within
each group (pooling all batches) using
maximum_normed_residual()
. They check for the normality
of the pooled data using anderson_darling_normal()
.
basis_pooled_sd
checks for equality of variance of all
data using levene_test()
and basis_pooled_cv
checks for equality of variances of all data after transforming it
using normalize_group_mean()
using levene_test()
.
The object returned by these functions includes the named vector
diagnostic_results
. This contains all of the diagnostic tests
performed. The name of each element of the vector corresponds with the
name of the diagnostic test. The contents of each element will be
"P" if the diagnostic test passed, "F" if the diagnostic test failed,
"O" if the diagnostic test was overridden and NA
if the
diagnostic test was skipped (typically because an optional
argument was not supplied).
The objects produced by the diagnostic tests are included in the named
list diagnostic_obj
. The name of each element in the list corresponds with
the name of the test. This can be useful when evaluating diagnostic test
failures.
The following list summarizes the diagnostic tests automatically performed by each function.
basis_normal
outliers_within_batch
between_batch_variability
outliers
anderson_darling_normal
basis_lognormal
outliers_within_batch
between_batch_variability
outliers
anderson_darling_lognormal
basis_weibull
outliers_within_batch
between_batch_variability
outliers
anderson_darling_weibull
basis_pooled_cv
outliers_within_batch
between_group_variability
outliers_within_group
pooled_data_normal
normalized_variance_equal
basis_pooled_sd
outliers_within_batch
between_group_variability
outliers_within_group
pooled_data_normal
pooled_variance_equal
basis_hk_ext
outliers_within_batch
between_batch_variability
outliers
sample_size
basis_nonpara_large_sample
outliers_within_batch
between_batch_variability
outliers
sample_size
basis_anova
outliers_within_group
equality_of_variance
number_of_groups
J. F. Lawless, Statistical Models and Methods for Lifetime Data. New York: John Wiley & Sons, 1982.
“Composite Materials Handbook, Volume 1. Polymer Matrix Composites Guideline for Characterization of Structural Materials,” SAE International, CMH-17-1G, Mar. 2012.
M. Vangel, “One-Sided Nonparametric Tolerance Limits,” Communications in Statistics - Simulation and Computation, vol. 23, no. 4. pp. 1137–1154, 1994.
K. Krishnamoorthy and T. Mathew, Statistical Tolerance Regions: Theory, Applications, and Computation. Hoboken: John Wiley & Sons, 2008.
W. Meeker, G. Hahn, and L. Escobar, Statistical Intervals: A Guide for Practitioners and Researchers, Second Edition. Hoboken: John Wiley & Sons, 2017.
M. Vangel, “New Methods for One-Sided Tolerance Limits for a One-Way Balanced Random-Effects ANOVA Model,” Technometrics, vol. 34, no. 2. Taylor & Francis, pp. 176–185, 1992.
library(dplyr)
# A single-point basis value can be calculated as follows
# in this example, three failed diagnostic tests are
# overridden.
res <- carbon.fabric %>%
filter(test == "FC") %>%
filter(condition == "RTD") %>%
basis_normal(strength, batch,
override = c("outliers",
"outliers_within_batch",
"anderson_darling_normal"))
print(res)
#>
#> Call:
#> basis_normal(data = ., x = strength, batch = batch, override = c("outliers",
#> "outliers_within_batch", "anderson_darling_normal"))
#>
#> Distribution: Normal ( n = 18 )
#> The following diagnostic tests were overridden:
#> `outliers`,
#> `outliers_within_batch`,
#> `anderson_darling_normal`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> 76.94656
#>
## Call:
## basis_normal(data = ., x = strength, batch = batch,
## override = c("outliers", "outliers_within_batch",
## "anderson_darling_normal"))
##
## Distribution: Normal ( n = 18 )
## The following diagnostic tests were overridden:
## `outliers`,
## `outliers_within_batch`,
## `anderson_darling_normal`
## B-Basis: ( p = 0.9 , conf = 0.95 )
## 76.94656
print(res$diagnostic_obj$between_batch_variability)
#>
#> Call:
#> ad_ksample(x = x, groups = batch, alpha = 0.025)
#>
#> N = 18 k = 3
#> ADK = 1.73 p-value = 0.52151
#> Conclusion: Samples come from the same distribution ( alpha = 0.025 )
#>
## Call:
## ad_ksample(x = x, groups = batch, alpha = 0.025)
##
## N = 18 k = 3
## ADK = 1.73 p-value = 0.52151
## Conclusion: Samples come from the same distribution ( alpha = 0.025 )
# A set of pooled basis values can also be calculated
# using the pooled standard deviation method, as follows.
# In this example, one failed diagnostic test is overridden.
carbon.fabric %>%
filter(test == "WT") %>%
basis_pooled_sd(strength, condition, batch,
override = c("outliers_within_batch"))
#>
#> Call:
#> basis_pooled_sd(data = ., x = strength, groups = condition, batch = batch,
#> override = c("outliers_within_batch"))
#>
#> Distribution: Normal - Pooled Standard Deviation ( n = 54, r = 3 )
#> The following diagnostic tests were overridden:
#> `outliers_within_batch`
#> B-Basis: ( p = 0.9 , conf = 0.95 )
#> CTD 127.6914
#> ETW 125.0698
#> RTD 132.1457
#>
## Call:
## basis_pooled_sd(data = ., x = strength, groups = condition,
## batch = batch, override = c("outliers_within_batch"))
##
## Distribution: Normal - Pooled Standard Deviation ( n = 54, r = 3 )
## The following diagnostic tests were overridden:
## `outliers_within_batch`
## B-Basis: ( p = 0.9 , conf = 0.95 )
## CTD 127.6914
## ETW 125.0698
## RTD 132.1457