vignettes/adktest.Rmd
adktest.Rmd
This vignette explores the Anderson–Darling k-Sample test. CMH-17-1G [1] provides a formulation for this test that appears different than the formulation given by Scholz and Stephens in their 1987 paper [2].
Both references use different nomenclature, which is summarized as follows:
Term | CMH-17-1G | Scholz and Stephens |
---|---|---|
A sample | ||
The number of samples | ||
An observation within a sample | ||
The number of observations within the sample | ||
The total number of observations within all samples | ||
Distinct values in combined data, ordered | … | … |
The number of distinct values in the combined data |
Given the possibility of ties in the data, the discrete version of the test must be used Scholz and Stephens (1987) give the test statistic as:
CMH-17-1G gives the test statistic as:
By inspection, the CMH-17-1G version of this test statistic contains an extra factor of .
Scholz and Stephens indicate that one rejects at a significance level of when:
This can be rearranged to give a critical value:
CHM-17-1G gives the critical value for for as:
The definition of from the two sources differs by a factor of .
The value in parentheses in the CMH-17-1G critical value corresponds to the interpolation formula for given in Scholz and Stephen’s paper. It should be noted that this is not the student’s t-distribution, but rather a distribution referred to as the distribution.
The cmstatr
package use the package
kSamples
to perform the k-sample Anderson–Darling tests.
This package uses the original formulation from Scholz and Stephens, so
the test statistic will differ from that given software based on the
CMH-17-1G formulation by a factor of
.
For comparison, SciPy’s
implementation also uses the original Scholz and Stephens
formulation. The statistic that it returns, however, is the normalized
statistic,
,
rather than kSamples
’s
value. To be consistent, SciPy also returns the critical values
directly. (Currently, SciPy also floors/caps the returned p-value at
0.1% / 25%.) The values of
and
are available in cmstatr
’s ad_ksample
return
value, if an exact comparison to Python SciPy is necessary.
The conclusions about the null hypothesis drawn, however, will be the same, whether R or CMH-17-1G or SciPy.