This vignette explores the Anderson–Darling k-Sample test. CMH-17-1G [1] provides a formulation for this test that appears different than the formulation given by Scholz and Stephens in their 1987 paper [2].

Both references use different nomenclature, which is summarized as follows:

Term CMH-17-1G Scholz and Stephens
A sample ii ii
The number of samples kk kk
An observation within a sample jj jj
The number of observations within the sample ii nin_i nin_i
The total number of observations within all samples nn NN
Distinct values in combined data, ordered z(1)z_{(1)}z(L)z_{(L)} Z1*Z_1^*ZL*Z_L^*
The number of distinct values in the combined data LL LL

Given the possibility of ties in the data, the discrete version of the test must be used Scholz and Stephens (1987) give the test statistic as:

AakN2=N1Ni=1k1nij=1LljN(NMaijniBaj)2Baj(NBaj)Nlj/4 A_{a k N}^2 = \frac{N - 1}{N}\sum_{i=1}^k \frac{1}{n_i}\sum_{j=1}^{L}\frac{l_j}{N}\frac{\left(N M_{a i j} - n_i B_{a j}\right)^2}{B_{a j}\left(N - B_{a j}\right) - N l_j / 4}

CMH-17-1G gives the test statistic as:

ADK=n1n2(k1)i=1k1nij=1Lhj(nFijniHj)2Hj(nHj)nhj/4 ADK = \frac{n - 1}{n^2\left(k - 1\right)}\sum_{i=1}^k\frac{1}{n_i}\sum_{j=1}^L h_j \frac{\left(n F_{i j} - n_i H_j\right)^2}{H_j \left(n - H_j\right) - n h_j / 4}

By inspection, the CMH-17-1G version of this test statistic contains an extra factor of 1(k1)\frac{1}{\left(k - 1\right)}.

Scholz and Stephens indicate that one rejects H0H_0 at a significance level of α\alpha when:

AakN2(k1)σNtk1(α) \frac{A_{a k N}^2 - \left(k - 1\right)}{\sigma_N} \ge t_{k - 1}\left(\alpha\right)

This can be rearranged to give a critical value:

Acrit2=(k1)+σNtk1(α) A_{c r i t}^2 = \left(k - 1\right) + \sigma_N t_{k - 1}\left(\alpha\right)

CHM-17-1G gives the critical value for ADKADK for α=0.025\alpha=0.025 as:

ADC=1+σn(1.96+1.149k10.391k1) ADC = 1 + \sigma_n \left(1.96 + \frac{1.149}{\sqrt{k - 1}} - \frac{0.391}{k - 1}\right)

The definition of σn\sigma_n from the two sources differs by a factor of (k1)\left(k - 1\right).

The value in parentheses in the CMH-17-1G critical value corresponds to the interpolation formula for tm(α)t_m\left(\alpha\right) given in Scholz and Stephen’s paper. It should be noted that this is not the student’s t-distribution, but rather a distribution referred to as the TmT_m distribution.

The cmstatr package use the package kSamples to perform the k-sample Anderson–Darling tests. This package uses the original formulation from Scholz and Stephens, so the test statistic will differ from that given software based on the CMH-17-1G formulation by a factor of (k1)\left(k-1\right).

For comparison, SciPy’s implementation also uses the original Scholz and Stephens formulation. The statistic that it returns, however, is the normalized statistic, [AakN2(k1)]/σN\left[A_{a k N}^2 - \left(k - 1\right)\right] / \sigma_N, rather than kSamples’s AakN2A_{a k N}^2 value. To be consistent, SciPy also returns the critical values tk1(α)t_{k-1}(\alpha) directly. (Currently, SciPy also floors/caps the returned p-value at 0.1% / 25%.) The values of kk and σN\sigma_N are available in cmstatr’s ad_ksample return value, if an exact comparison to Python SciPy is necessary.

The conclusions about the null hypothesis drawn, however, will be the same, whether R or CMH-17-1G or SciPy.

References

[1]
“Composite Materials Handbook, Volume 1. Polymer Matrix Composites Guideline for Characterization of Structural Materials,” SAE International, CMH-17-1G, Mar. 2012.
[2]
F. W. Scholz and M. A. Stephens, “K-Sample Anderson--Darling Tests,” Journal of the American Statistical Association, vol. 82, no. 399. pp. 918–924, Sep-1987.