Anderson–Darling test for goodness of fit — anderson

Calculates the Anderson–Darling test statistic for a sample given a particular distribution, and determines whether to reject the hypothesis that a sample is drawn from that distribution.

anderson_darling_normal(data = NULL, x, alpha = 0.05)

anderson_darling_lognormal(data = NULL, x, alpha = 0.05)

anderson_darling_weibull(data = NULL, x, alpha = 0.05)

Arguments

data: a data.frame-like object (optional)
x: a numeric vector or a variable in the data.frame
alpha: the required significance level of the test. Defaults to 0.05.

Value

an object of class anderson_darling. This object has the following fields.

call the expression used to call this function
dist the distribution used
data a copy of the data analyzed
n the number of observations in the sample
A the Anderson–Darling test statistic
osl the observed significance level (p-value), assuming the parameters of the distribution are estimated from the data
alpha the required significance level for the test. This value is given by the user.
reject_distribution a logical value indicating whether the hypothesis that the data is drawn from the specified distribution should be rejected

Details

The Anderson–Darling test statistic is calculated for the distribution given by the user.

The observed significance level (OSL), or p-value, is calculated assuming that the parameters of the distribution are unknown; these parameters are estimate from the data.

The function anderson_darling_normal computes the Anderson–Darling test statistic given a normal distribution with mean and standard deviation equal to the sample mean and standard deviation.

The function anderson_darling_lognormal is the same as anderson_darling_normal except that the data is log transformed first.

The function anderson_darling_weibull computes the Anderson–Darling test statistic given a Weibull distribution with shape and scale parameters estimated from the data using a maximum likelihood estimate.

The test statistic, A, is modified to account for the fact that the parameters of the population are not known, but are instead estimated from the sample. This modification is a function of the sample size only, and is different for each distribution (normal/lognormal or Weibull). Several such modifications have been proposed. This function uses the modification published in Stephens (1974), Lawless (1982) and CMH-17-1G. Some other implementations of the Anderson-Darling test, such as the implementation in the nortest package, use other modifications, such as the one published in D'Agostino and Stephens (1986). As such, the p-value reported by this function may differ from the p-value reported by implementations of the Anderson–Darling test that use different modifiers. Only the unmodified test statistic is reported in the result of this function, but the modified test statistic is used to compute the OSL (p-value).

This function uses the formulae for observed significance level (OSL) published in CMH-17-1G. These formulae depend on the particular distribution used.

The results of this function have been validated against published values in Lawless (1982).

References

J. F. Lawless, Statistical models and methods for lifetime data. New York: Wiley, 1982.

"Composite Materials Handbook, Volume 1. Polymer Matrix Composites Guideline for Characterization of Structural Materials," SAE International, CMH-17-1G, Mar. 2012.

M. A. Stephens, “EDF Statistics for Goodness of Fit and Some Comparisons,” Journal of the American Statistical Association, vol. 69, no. 347. pp. 730–737, 1974.

R. D’Agostino and M. Stephens, Goodness-of-Fit Techniques. New York: Marcel Dekker, 1986.

Examples

library(dplyr)

carbon.fabric %>%
  filter(test == "FC") %>%
  filter(condition == "RTD") %>%
  anderson_darling_normal(strength)
#> 
#> Call:
#> anderson_darling_normal(data = ., x = strength)
#> 
#> Distribution:  Normal ( n = 18 ) 
#> Test statistic:  A = 0.9224776 
#> OSL (p-value):  0.01212193  (assuming unknown parameters)
#> Conclusion: Sample is not drawn from a Normal distribution ( alpha = 0.05 )
## Call:
## anderson_darling_normal(data = ., x = strength)
##
## Distribution:  Normal ( n = 18 )
## Test statistic:  A = 0.9224776
## OSL (p-value):  0.01212193  (assuming unknown parameters)
## Conclusion: Sample is not drawn from a Normal distribution (alpha = 0.05)