| Title: | Robust Change-Point Tests |
|---|---|
| Description: | Provides robust methods to detect change-points in uni- or multivariate time series. They can cope with corrupted data and heavy tails. Focus is on the detection of abrupt changes in location, but changes in the scale or dependence structure can be detected as well. This package provides tests for change detection in uni- and multivariate time series based on Huberized versions of CUSUM tests proposed in Duerre and Fried (2019) <DOI:10.48550/arXiv.1905.06201>, and tests for change detection in univariate time series based on 2-sample U-statistics or 2-sample U-quantiles as proposed by Dehling et al. (2015) <DOI:10.1007/978-1-4939-3076-0_12> and Dehling, Fried and Wendler (2020) <DOI:10.1093/biomet/asaa004>. Furthermore, the packages provides tests on changes in the scale or the correlation as proposed in Gerstenberger, Vogel and Wendler (2020) <DOI:10.1080/01621459.2019.1629938>, Dehling et al. (2017) <DOI:10.1017/S026646661600044X>, and Wied et al. (2014) <DOI:10.1016/j.csda.2013.03.005>. |
| Authors: | Sheila Goerz [aut, cre], Alexander Duerre [aut], Roland Fried [ctb, ths] |
| Maintainer: | Sheila Goerz <[email protected]> |
| License: | GPL-3 |
| Version: | 0.3.10 |
| Built: | 2026-06-09 05:50:48 UTC |
| Source: | https://github.com/sgoerz/robcp |
The package includes robust change point tests designed to identify structural changes in time series.
As external influences might affect the characteristics of a time series,
such as location, variability, and correlation structure, changes caused by them
and the corresponding time point need to be detected reliably.
Standard methods can struggle when dealing with heavy-tailed data or outliers;
therefore, this package comprises robust methods that effectively manage extreme
values by either ignoring them or assigning them less weight.
Examples of these robust techniques include the Median, Huber M-estimator, mean deviation, Gini's mean difference, and .
The package contains the following tests and test statistics:
Tests on changes in the location
huber_cusum (test), CUSUM (CUSUM test statistic), psi (transformation function).hl_test (test), HodgesLehmann (test statistic).wmw_test (test), wilcox_stat (test statistic).Tests on changes in the variability
:scale_cusum (test), scale_stat (test statistic).Tests on changes in the correlation
cor_cusum (test), cor_stat (test statistic).Maintainer: Sheila Görz [email protected]
Author: Alexander Dürre [email protected]
Thesis Advisor: Roland Fried [email protected]
Performs a CUSUM-based test on changes in Spearman's rho or Kendall's tau.
cor_cusum(x, version = c("tau", "rho"), method = "kernel", control = list(), fpc = TRUE, tol = 1e-08, plot = FALSE)cor_cusum(x, version = c("tau", "rho"), method = "kernel", control = list(), fpc = TRUE, tol = 1e-08, plot = FALSE)
x |
time series (matrix or ts object with numeric/integer values). |
version |
version of the test. Either |
method |
method for estimating the long run variance. |
control |
a list of control parameters. |
fpc |
finite population correction (boolean). |
tol |
tolerance of the distribution function (numeric), which is used do compute p-values. |
plot |
should the test statistic be plotted (cf. |
The function perform a CUSUM-type test on changes in the correlation of a time series . Formally, the hypothesis pair can be written as
where is a fluctuation measure (either Spearman's rho or Kendall's tau) for the -th observations and is the length of the time series. is called a 'change point'.
The test statistic is computed using cor_stat and asymptotically follows a Kolmogorov distribution. To derive the p-value, the funtion pKSdist is used.
A list of the class "htest" containing the following components:
statistic |
return value of the function |
p.value |
p-value (numeric). |
alternative |
alternative hypothesis (character string). |
method |
name of the performed test (character string). |
cp.location |
index of the estimated change point location (integer). |
data.name |
name of the data (character string). |
lrv |
list containing the compontents |
Sheila Görz
Wied, D., Dehling, H., Van Kampen, M., and Vogel, D. (2014). A fluctuation test for constant Spearman’s rho with nuisance-free limit distribution. Computational Statistics & Data Analysis, 76, 723-736.
Dürre, A. (2022+). "Finite sample correction for cusum tests", unpublished manuscript
### first: generate a time series with a burn-in period of m and a change point k = n/2 require(mvtnorm) n <- 500 m <- 100 N <- n + m k <- m + floor(n * 0.5) n1 <- N - k ## Spearman's rho: rho <- c(0.4, -0.9) # serial dependence: theta1 <- 0.3 theta2 <- 0.2 theta <- cbind(c(theta1, 0), c(0, theta2)) q <- rho * sqrt( (theta1^2 + 1) * (theta2^2 + 1) / (theta1 * theta2 + 1)) # shape matrices of the innovations: S0 <- cbind(c(1, q[1]), c(q[1], 1)) S1 <- cbind(c(1, q[2]), c(q[2], 1)) e0 <- rmvt(k, S0, 5) e1 <- rmvt(n1, S1, 5) e <- rbind(e0, e1) # generate the data: x <- matrix(numeric(N * 2), ncol = 2) x[1, ] <- e[1, ] invisible(sapply(2:N, function(i) x[i, ] <<- e[i, ] + theta %*% e[i-1, ])) x <- x[-(1:m), ] cor_cusum(x, "rho") ## Kendall's tau S0 <- cbind(c(1, rho[1]), c(rho[1], 1)) S1 <- cbind(c(1, rho[2]), c(rho[2], 1)) e0 <- rmvt(k, S0, 5) e1 <- rmvt(n1, S1, 5) e <- rbind(e0, e1) x <- matrix(numeric(N * 2), ncol = 2) x[1, ] <- e[1, ] # AR(1): invisible(sapply(2:N, function(i) x[i, ] <<- 0.8 * x[i-1, ] + e[i, ])) x <- x[-(1:m), ] cor_cusum(x, version = "tau")### first: generate a time series with a burn-in period of m and a change point k = n/2 require(mvtnorm) n <- 500 m <- 100 N <- n + m k <- m + floor(n * 0.5) n1 <- N - k ## Spearman's rho: rho <- c(0.4, -0.9) # serial dependence: theta1 <- 0.3 theta2 <- 0.2 theta <- cbind(c(theta1, 0), c(0, theta2)) q <- rho * sqrt( (theta1^2 + 1) * (theta2^2 + 1) / (theta1 * theta2 + 1)) # shape matrices of the innovations: S0 <- cbind(c(1, q[1]), c(q[1], 1)) S1 <- cbind(c(1, q[2]), c(q[2], 1)) e0 <- rmvt(k, S0, 5) e1 <- rmvt(n1, S1, 5) e <- rbind(e0, e1) # generate the data: x <- matrix(numeric(N * 2), ncol = 2) x[1, ] <- e[1, ] invisible(sapply(2:N, function(i) x[i, ] <<- e[i, ] + theta %*% e[i-1, ])) x <- x[-(1:m), ] cor_cusum(x, "rho") ## Kendall's tau S0 <- cbind(c(1, rho[1]), c(rho[1], 1)) S1 <- cbind(c(1, rho[2]), c(rho[2], 1)) e0 <- rmvt(k, S0, 5) e1 <- rmvt(n1, S1, 5) e <- rbind(e0, e1) x <- matrix(numeric(N * 2), ncol = 2) x[1, ] <- e[1, ] # AR(1): invisible(sapply(2:N, function(i) x[i, ] <<- 0.8 * x[i-1, ] + e[i, ])) x <- x[-(1:m), ] cor_cusum(x, version = "tau")
Computes the test statistic for a CUSUM-based tests on changes in Spearman's rho or Kendall's tau.
cor_stat(x, version = c("tau", "rho"), method = "kernel", control = list())cor_stat(x, version = c("tau", "rho"), method = "kernel", control = list())
x |
time series (numeric or ts vector). |
version |
version of the test. Either |
method |
methods of long run variance estimation. Options are |
control |
a list of control parameters. |
Let be the length of the time series, i.e. the number of rows in x. In general, the (scaled) CUSUM test statistic is defined as
where is an estimator for the property on which to test, and is an estimator for the square root of the corresponding long run variance (cf. lrv).
If version = "tau", the function tests if the correlation between and of the bivariate time series stays constant for all by considering Kendall's tau. Therefore, is the the sample version of Kendall's tau:
The default bandwidth for the kernel-based long run variance estimation is and the default kernel function is the quatratic kernel.
If version = "rho", the function tests if the correlation of a time series of an arbitrary dimension (>= 2) stays constant by considering a multivariate version of Spearman's rho. Therefore, is the sample version of Spearman's rho:
where (rank of in and . Here it is essential to use instead of . The default bandwidth for the kernel-based long run variance estimation is and the default kernel function is the Bartlett kernel.
Test statistic (numeric value) with the following attributes:
cp-location |
indicating at which index a change point is most likely. |
teststat |
test process (before taking the maximum). |
lrv-estimation |
long run variance estimation method. |
sigma |
estimated long run variance. |
param |
parameter used for the lrv estimation. |
kFun |
kernel function used for the lrv estimation. |
Is an S3 object of the class "cpStat".
Sheila Görz
Wied, D., Dehling, H., Van Kampen, M., and Vogel, D. (2014). A fluctuation test for constant Spearman’s rho with nuisance-free limit distribution. Computational Statistics & Data Analysis, 76, 723-736.
Computes the test statistic for the CUSUM change point test.
CUSUM(x, method = "kernel", control = list(), inverse = "Cholesky", ...)CUSUM(x, method = "kernel", control = list(), inverse = "Cholesky", ...)
x |
vector or matrix with each column representing a time series (numeric). |
method |
method of long run variance estimation. Options are |
control |
a list of control parameters for the estimation of the long run variance (cf. |
inverse |
character string specifying the method of inversion. Options are "Cholesky" for inverting over |
... |
further arguments passed to the inverse-computing functions. |
Let n be the length of the time series .
In case of a univariate time series the test statistic can be written as
where is the square root of lrv.
Default method is "kernel" and the default kernel function is "TH". If no bandwidth value is supplied, first the time series is corrected for the estimated change point and Spearman's autocorrelation to lag 1 () is computed. Then the default bandwidth follows as
In case of a multivariate time series the test statistic follows as
where denotes the i-th row of x and is the inverse of lrv.
Test statistic (numeric value) with the following attributes:
cp-location |
indicating at which index a change point is most likely. |
teststat |
test process (before taking the maximum). |
lrv-estimation |
long run variance estimation method. |
sigma |
estimated long run variance. |
param |
parameter used for the lrv estimation. |
kFun |
kernel function used for the lrv estimation. |
Is an S3 object of the class "cpStat".
Sheila Görz
# time series with a location change at t = 20 ts <- c(rnorm(20, 0), rnorm(20, 2)) # Huberized CUSUM change point test statistic CUSUM(psi(ts))# time series with a location change at t = 20 ts <- c(rnorm(20, 0), rnorm(20, 2)) # Huberized CUSUM change point test statistic CUSUM(psi(ts))
Performs the two-sample Hodges-Lehmann change point test.
hl_test(x, b_u = "nrd0", method = "kernel", control = list(), tol = 1e-8, plot = FALSE)hl_test(x, b_u = "nrd0", method = "kernel", control = list(), tol = 1e-8, plot = FALSE)
x |
time series (numeric or ts vector). |
b_u |
bandwidth for |
method |
method for estimating the long run variance. |
control |
a list of control parameters (cf. |
tol |
tolerance of the distribution function (numeric), which is used to compute p-values. |
plot |
should the test statistic be plotted (cf. |
The function performs the two-sample Hodges-Lehmann change point test. It tests the hypothesis pair
where and is the length of the time series. is called a 'change point'.
The test statistic is computed using HodgesLehmann and asymptotically follows a Kolmogorov distribution. To derive the p-value, the function pKSdist is used.
A list of the class "htest" containing the following components:
statistic |
value of the test statistic (numeric). |
p.value |
p-value (numeric). |
alternative |
alternative hypothesis (character string). |
method |
name of the performed test (character string). |
cp.location |
index of the estimated change point location (integer). |
data.name |
name of the data (character string). |
Sheila Görz
Dehling, H., Fried, R., and Wendler, M. "A robust method for shift detection in time series." Biometrika 107.3 (2020): 647-660.
HodgesLehmann, medianDiff, lrv, pKSdist
#time series with a structural break at t = 20 Z <- c(rnorm(20, 0), rnorm(20, 2)) hl_test(Z, control = list(overlapping = TRUE, b_n = 5, b_u = 0.05))#time series with a structural break at t = 20 Z <- c(rnorm(20, 0), rnorm(20, 2)) hl_test(Z, control = list(overlapping = TRUE, b_n = 5, b_u = 0.05))
Computes the test statistic for the Hodges-Lehmann change point test.
HodgesLehmann(x, b_u = "nrd0", method = "kernel", control = list()) u_hat(x, b_u = "nrd0")HodgesLehmann(x, b_u = "nrd0", method = "kernel", control = list()) u_hat(x, b_u = "nrd0")
x |
time series (numeric or |
b_u |
bandwidth for |
method |
method of long run variance estimation. Options are |
control |
a list of control parameters for the estimation of the long run variance (cf. |
Let be the length of the time series. The Hodges-Lehmann test statistic is then computed as
where is the estimated long run variance, computed by the square root of lrv. By default the long run variance is estimated kernel-based with the following bandwidth: first the time series is corrected for the estimated change point and Spearman's autocorrelation to lag 1 () is computed. Then the default block length follows as
is estimated by u_hat on data , where was subtracted from . Then density with the arguments na.rm = TRUE, from = 0, to = 0, n = 1 and bw = b_u is applied to .
HodgesLehmann returns a test statistic (numeric value) with the following attributes:
cp-location |
indicating at which index a change point is most likely. |
teststat |
test process (before taking the maximum). |
lrv-estimation |
long run variance estimation method. |
sigma |
estimated long run variance. |
param |
parameter used for the lrv estimation. |
kFun |
kernel function used for the lrv estimation. |
Is an S3 object of the class "cpStat".
u_hat returns a numeric value.
Sheila Görz
Dehling, H., Fried, R., and Wendler, M. "A robust method for shift detection in time series." Biometrika 107.3 (2020): 647-660.
# time series with a location change at t = 20 x <- c(rnorm(20, 0), rnorm(20, 2)) # Hodges-Lehmann change point test statistic HodgesLehmann(x, b_u = 0.01)# time series with a location change at t = 20 x <- c(rnorm(20, 0), rnorm(20, 2)) # Hodges-Lehmann change point test statistic HodgesLehmann(x, b_u = 0.01)
Performs a CUSUM test on data transformed by psi. Depending on the chosen psi-function different types of changes can be detected.
huber_cusum(x, fun = "HLm", k, constant = 1.4826, method = "kernel", control = list(), fpc = TRUE, tol = 1e-8, plot = FALSE, ...)huber_cusum(x, fun = "HLm", k, constant = 1.4826, method = "kernel", control = list(), fpc = TRUE, tol = 1e-8, plot = FALSE, ...)
x |
numeric vector containing a single time series or a numeric matrix containing multiple time series (column-wise). |
fun |
character string specifying the transformation function |
k |
numeric bound used in |
constant |
scale factor of the MAD. Default is 1.4826. |
method |
method for estimating the long run variance. |
control |
a list of control parameters for the estimation of the long run variance (cf. |
fpc |
finite population correction (boolean). |
tol |
tolerance of the distribution function (numeric), which is used to compute p-values. |
plot |
should the test statistic be plotted (cf. |
... |
further arguments to be passed to |
The function performs a Huberized CUSUM test. It tests the null hypothesis does not change for against the alternative of a change, where is the parameter vector of interest. is called a 'change point'. First the data is transformed by a suitable psi-function. To detect changes in location one can apply fun = "HLm", "HLg", "SLm" or "SLg" and the hypothesis pair is
where and is the length of the time series. For changes in scale fun = "HCm" is available and for changes in the dependence respectively covariance structure fun = "HCm", "HCg", "SCm" and "SCg" are possible. The hypothesis pair is the same as in the location case, only with being replaced by , . Exact definitions of the psi-functions can be found on the help page of psi.
Denote by the transformed time series. If is one-dimensional, then the test statistic
is calculated, where is an estimator for the long run variance, see the help function of lrv for details. is asymptotically Kolmogorov-Smirnov distributed. If fpc is TRUE we use a finite population correction to improve finite sample performance (Dürre, 2021+).
If is multivariate, then the test statistic
is computed, where is the long run covariance, see also lrv for details. is asymptotically distributed like the maximum of a squared Bessel bridge. We use the identity derived by Kiefer (1959) to derive p-values. Like in the one dimensional case if fpc is TRUE we use a finite sample correction .
The change point location is estimated as the time point for which the CUSUM process takes its maximum.
A list of the class "htest" containing the following components:
statistic |
value of the test statistic (numeric). |
p.value |
p-value (numeric). |
alternative |
alternative hypothesis (character string). |
method |
name of the performed test (character string). |
cp.location |
index of the estimated change point location (integer). |
data.name |
name of the data (character string). |
Sheila Görz
Hušková, M., & Marušiaková, M. (2012). M-procedures for detection of changes for dependent observations. Communications in Statistics-Simulation and Computation, 41(7), 1032-1050.
Dürre, A. and Fried, R. (2019). "Robust change point tests by bounded transformations", https://arxiv.org/abs/1905.06201
Dürre, A. (2021+). "Finite sample correction for cusum tests", unpublished manuscript
Kiefer, J. (1959). "K-sample analogues of the Kolmogorov-Smirnov and Cramer-V. Mises tests", The Annals of Mathematical Statistics, 420–447.
lrv,
psi, psi_cumsum, CUSUM,
pKSdist
set.seed(1895) #time series with a structural break at t = 20 Z <- c(rnorm(20, 0), rnorm(20, 2)) huber_cusum(Z) # two time series with a structural break at t = 20 timeSeries <- matrix(c(rnorm(20, 0), rnorm(20, 2), rnorm(20, 1), rnorm(20, 3), ncol = 2)) huber_cusum(timeSeries)set.seed(1895) #time series with a structural break at t = 20 Z <- c(rnorm(20, 0), rnorm(20, 2)) huber_cusum(Z) # two time series with a structural break at t = 20 timeSeries <- matrix(c(rnorm(20, 0), rnorm(20, 2), rnorm(20, 1), rnorm(20, 3), ncol = 2)) huber_cusum(timeSeries)
Selects the k-th largest element of X + Y, a sum of sets. X + Y denotes the set .
kthPair(X, Y, k, k2 = NA)kthPair(X, Y, k, k2 = NA)
X |
Numeric vector. |
Y |
Numeric vector. |
k |
Index of element to be selected. Must be an integer and between 1 and the product of the lengths of x and y. |
k2 |
Optional second index. |
A generalized version of the algorithm of Johnson and Mizoguchi (1978), where and are allowed to be of different lengths. The optional argument k2 allows the computation of the mean of two consecutive value without running the algorithm twice.
K-th largest value (numeric).
Sheila Görz
Johnson, D. B., & Mizoguchi, T. (1978). Selecting the K-th Element in X+Y and X_1+X_2+ ... +X_m. SIAM Journal on Computing, 7(2), 147-153.
set.seed(1895) x <- rnorm(100) y <- runif(100) kthPair(x, y, 5000) kthPair(x, y, 5000, 5001)set.seed(1895) x <- rnorm(100) y <- runif(100) kthPair(x, y, 5000) kthPair(x, y, 5000, 5001)
Estimates the long run variance respectively covariance matrix of the supplied time series.
lrv(x, method = c("kernel", "subsampling", "bootstrap", "none"), control = list())lrv(x, method = c("kernel", "subsampling", "bootstrap", "none"), control = list())
x |
vector or matrix with each column representing a time series (numeric). |
method |
method of estimation. Options are |
control |
a list of control parameters. See 'Details'. |
The long run variance equals the limit of times the variance of the arithmetic mean of a short range dependent time series, where is the length of the time series. It is used to standardize tests concering the mean on dependent data.
If method = "none", no long run variance estimation is performed and the value 1 is returned (i.e. it does not alterate the test statistic).
The control argument is a list that can supply any of the following components:
kFunKernel function (character string). More in 'Notes'.
b_nBandwidth (numeric > 0 and smaller than sample size).
gamma0Only use estimated variance if estimated long run variance is < 0? Boolean.
lBlock length (numeric > 0 and smaller than sample size).
overlappingOverlapping subsampling estimation? Boolean.
distrTranform observations by their empirical distribution function? Boolean. Default is FALSE.
BBootstrap repetitions (integer).
seedRNG seed (numeric).
versionWhat property does the CUSUM test test for? Character string, details below.
locEstimated location corresponding to version. Numeric value, details below.
scaleEstimated scale corresponding to version. Numeric value, details below.
Kernel-based estimation
The kernel-based long run variance estimation is available for various testing scenarios (set by control$version) and both for one- and multi-dimensional data. It uses the bandwidth control$b_n and kernel function control$kFun. For tests on certain properties also a corresponding location control$loc () and scale control$scale () estimation needs to be supplied. Supported testing scenarios are:
"mean"
1-dim. data:
If control$distr = TRUE, then the long run variance is estimated on the empirical distribution of . The resulting value is then multiplied with .
Default values: b_n = , kFun = "bartlett".
multivariate time series:
The -element of is estimated by
.
Default values: b_n = , kFun = "bartlett".
"empVar" for tests on changes in the empirical variance.
Default values: mean(x), var(x).
"MD" for tests on a change in the median deviation.
Default values: median(x), .
"GMD" for tests on changes in Gini's mean difference.
with .
Default value:
"Qalpha" for tests on changes in Qalpha.
where and
the kernel density estimation of the densitiy corresponding to the distribution function , IQR(x) and is the quatratic kernel function.
Default values: , Qalpha(x, m_n)[n-1].
"tau" for tests in changes in Kendall's tau.
Only available for bivariate data: assume that the given data x has the format .
where and , and are the empirical distribution functions of , and .
Default value: .
"rho" for tests on changes in Spearman's rho.
Only availabe for -variate data with : assume that the given data x has the format .
where , and , (rank of in .
When control$gamma0 = TRUE (default) then negative estimates of the long run variance are replaced by the autocovariance at lag 0 (= ordinary variance of the data). The function will then throw a warning.
Subsampling estimation
For method = "subsampling" there are an overlapping and a non-overlapping version (parameter control$overlapping). Also it can be specified if the observations x were transformed by their empirical distribution function (parameter control$distr). Via control$l the block length can be controlled.
If control$overlapping = TRUE and control$distr = TRUE:
Otherwise, if control$distr = FALSE, the estimator is
If control$overlapping = FALSE and control$distr = TRUE:
Otherwise, if control$distr = FALSE, the estimator is
Default values: overlapping = TRUE, the block length is chosen adaptively:
where is the Spearman autocorrelation at lag 1.
Bootstrap estimation
If method = "bootstrap" a dependent wild bootstrap with the parameters control$B, control$l and control$kFun is performed:
A single is generated by where are independent from the data x and are generated from a multivariate normal distribution with , and . Via control$seed a seed can optionally be specified (cf. set.seed). Only "bartlett", "parzen" and "QS" are supported as kernel functions. Uses the function sqrtm from package pracma.
Default values: B = 1000, kFun = "bartlett", l is the same as for subsampling.
long run variance (numeric) resp. (numeric matrix)
Kernel functions
bartlett:
FT:
parzen:
QS:
TH:
truncated:
SFT:
Epanechnikov:
quatratic:
Sheila Görz
Andrews, D.W. "Heteroskedasticity and autocorrelation consistent covariance matrix estimation." Econometrica: Journal of the Econometric Society (1991): 817-858.
Dehling, H., et al. "Change-point detection under dependence based on two-sample U-statistics." Asymptotic laws and methods in stochastics. Springer, New York, NY, (2015). 195-220.
Dehling, H., Fried, R., and Wendler, M. "A robust method for shift detection in time series." Biometrika 107.3 (2020): 647-660.
Parzen, E. "On consistent estimates of the spectrum of a stationary time series." The Annals of Mathematical Statistics (1957): 329-348.
Shao, X. "The dependent wild bootstrap." Journal of the American Statistical Association 105.489 (2010): 218-235.
CUSUM, HodgesLehmann, wilcox_stat
Z <- c(rnorm(20), rnorm(20, 2)) ## kernel density estimation lrv(Z) ## overlapping subsampling lrv(Z, method = "subsampling", control = list(overlapping = FALSE, distr = TRUE, l_n = 5)) ## dependent wild bootstrap estimation lrv(Z, method = "bootstrap", control = list(l_n = 5, kFun = "parzen"))Z <- c(rnorm(20), rnorm(20, 2)) ## kernel density estimation lrv(Z) ## overlapping subsampling lrv(Z, method = "subsampling", control = list(overlapping = FALSE, distr = TRUE, l_n = 5)) ## dependent wild bootstrap estimation lrv(Z, method = "bootstrap", control = list(l_n = 5, kFun = "parzen"))
Computes the median of the set X - Y. X - Y denotes the set }.
medianDiff(x, y)medianDiff(x, y)
x, y
|
Numeric vectors |
Special case of the function kthPair.
The median element of X - Y
Sheila Görz
Johnson, D. B., & Mizoguchi, T. (1978). Selecting the K-th Element in X+Y and X_1+X_2+ ... +X_m. SIAM Journal on Computing, 7(2), 147-153.
x <- rnorm(100) y <- runif(100) medianDiff(x, y)x <- rnorm(100) y <- runif(100) medianDiff(x, y)
Computes the revised modified Cholesky factorization described in Schnabel and Eskow (1999).
modifChol(x, tau = .Machine$double.eps^(1 / 3), tau_bar = .Machine$double.eps^(2 / 3), mu = 0.1)modifChol(x, tau = .Machine$double.eps^(1 / 3), tau_bar = .Machine$double.eps^(2 / 3), mu = 0.1)
x |
a symmetric matrix. |
tau |
(machine epsilon)^(1/3). |
tau_bar |
(machine epsilon^(2/3)). |
mu |
numeric, |
modif.chol computes the revised modified Cholesky Factorization of a symmetric, not necessarily positive definite matrix x + E such that for .
Upper triangular matrix of the form .
Sheila Görz
Schnabel, R. B., & Eskow, E. (1999). "A revised modified Cholesky factorization algorithm" SIAM Journal on optimization, 9(4), 1135-1148.
y <- matrix(runif(9), ncol = 3) x <- psi(y) modifChol(lrv(x))y <- matrix(runif(9), ncol = 3) x <- psi(y) modifChol(lrv(x))
Computes the asymptotic cumulative distribution of the statistic of CUSUM.
pKSdist(tn, tol = 1e-8) pBessel(tn, p)pKSdist(tn, tol = 1e-8) pBessel(tn, p)
tn |
vector of test statistics (numeric). For |
p |
dimension of time series (integer). If |
tol |
tolerance (numeric). |
For a single time series, the distribution is the same distribution as in the two sample Kolmogorov Smirnov Test, namely the distribution of the maximal value of the absolute values of a Brownian bridge. It is computated as follows (Durbin, 1973 and van Mulbregt, 2018):
For :
up to , where
else:
until
In case of multiple time series, the distribution equals that of the maximum of an p dimensional squared Bessel bridge. It can be computed by (Kiefer, 1959)
where is the Bessel function of first kind and p-th order, is the gamma function and denotes the n-th zero of .
vector of .
Sheila Görz, Alexander Dürre
Durbin, James. (1973) "Distribution theory for tests based on the sample distribution function." Society for Industrial and Applied Mathematics.
van Mulbregt, P. (2018) "Computing the Cumulative Distribution Function and Quantiles of the limit of the Two-sided Kolmogorov-Smirnov Statistic." arXiv preprint arXiv:1803.00426.
/src/library/stats/src/ks.c rev60573
Kiefer, J. (1959). "K-sample analogues of the Kolmogorov-Smirnov and Cramer-V. Mises tests", The Annals of Mathematical Statistics, 420–447.
psi, CUSUM, psi_cumsum, huber_cusum
# single time series timeSeries <- c(rnorm(20, 0), rnorm(20, 2)) tn <- CUSUM(timeSeries) pKSdist(tn) # two time series timeSeries <- matrix(c(rnorm(20, 0), rnorm(20, 2), rnorm(20, 1), rnorm(20, 3)), ncol = 2) tn <- CUSUM(timeSeries) pBessel(tn, 2)# single time series timeSeries <- c(rnorm(20, 0), rnorm(20, 2)) tn <- CUSUM(timeSeries) pKSdist(tn) # two time series timeSeries <- matrix(c(rnorm(20, 0), rnorm(20, 2), rnorm(20, 1), rnorm(20, 3)), ncol = 2) tn <- CUSUM(timeSeries) pBessel(tn, 2)
Plots the trajectory of the test statistic process together with a red line indicating the critical value (alpha = 0.05) and a blue line indicating the most probable change point location
## S3 method for class 'cpStat' plot(x, ylim, xaxt, crit.val, ...)## S3 method for class 'cpStat' plot(x, ylim, xaxt, crit.val, ...)
x |
object of the class 'cpStat'. |
ylim |
the y limits of the plot. |
xaxt |
a character which specifies the x axis type (see |
crit.val |
critical value of the test. Default: 1.358. |
... |
other graphical parameters (see |
Default for ylim is c(min(c(data, 1.358)), max(c(data, 1.358))).
Default for xaxt is the simliar to the option "s", only that there
is a red labelled tick at the most probable change point location. Ticks too
close to this will be suppressed.
No return value; called for side effects.
Sheila Görz
plot, par, CUSUM, HodgesLehmann,
wilcox_stat
Prints the value of the test statistic and adds the most likely change point location.
## S3 method for class 'cpStat' print(x, ...)## S3 method for class 'cpStat' print(x, ...)
x |
object of the class 'cpStat'. |
... |
further arguments passed to or from other methods. |
Object x is return; function is called for its side effect.
Sheila Görz
print, wilcox_stat, CUSUM, HodgesLehmann,
Standardizes (multivariate) time series by their median, MAD and transforms the standardized time series by a function.
psi(y, fun = c("HLm", "HLg", "SLm", "SLg", "HCm", "HCg", "SCm", "SCg"), k, constant = 1.4826)psi(y, fun = c("HLm", "HLg", "SLm", "SLg", "HCm", "HCg", "SCm", "SCg"), k, constant = 1.4826)
y |
vector or matrix with each column representing a time series (numeric). |
fun |
character string specifying the transformation function |
k |
numeric bound used for Huber type psi-functions which determines robustness and efficiency of the test. Default for |
constant |
scale factor of the MAD. |
Let be the standardized values of a univariate time series.
Available functions are:
marginal Huber for location: fun = "HLm". .
global Huber for location: fun = "HLg". .
marginal sign for location: fun = "SLm". .
global sign for location: fun = "SLg". .
marginal Huber for covariance: fun = "HCm". .
global Huber for covariance: fun = "HCg". .
marginal sign covariance: fun = "SCm". .
gloabl sign covariance: fun = "SCg". .
Note that for all covariances only the upper diagonal is used and turned into a vector. In case of the marginal sign covariance, the main diagonal is also left out. For the global sign covariance matrix the last element of the main diagonal is left out.
Transformed numeric vector or matrix with the same number of rows as y.
Sheila Görz
psi(rnorm(100))psi(rnorm(100))
Computes the cumulative sum of a transformed numeric vector or matrix. Transformation function is psi.
psi_cumsum(y, fun = "HLm", k, constant)psi_cumsum(y, fun = "HLm", k, constant)
y |
numeric vector containing a single time series or a numeric matrix containing multiple time series (column-wise). |
fun |
character string specifying the transformation function |
k |
numeric bound used in |
constant |
scale factor of the MAD. Default is 1.4826. |
Prior to computing the sums, y is being transformed by the function fun.
Numeric vector or matrix containing the cumulative sums of the transformed values. In case of a matrix, cumulative sums are being computed for each time series (column) independently.
Sheila Görz
psi.
psi_cumsum(rnorm(100))psi_cumsum(rnorm(100))
Estimates Q-alpha using the first elements of a time series.
Qalpha(x, alpha = 0.8)Qalpha(x, alpha = 0.8)
x |
numeric vector. |
alpha |
quantile. Numeric value in (0, 1] |
where is the empirical distribtion function of .
Numeric vector of -s estimated using x[1], ..., x[k], , being the length of x.
x <- rnorm(10) Qalpha(x, 0.5)x <- rnorm(10) Qalpha(x, 0.5)
Performs the CUSUM-based test on changes in the scale.
scale_cusum(x, version = c("empVar", "MD", "GMD", "Qalpha"), method = "kernel", control = list(), alpha = 0.8, fpc = TRUE, tol, plot = FALSE, level = 0.05)scale_cusum(x, version = c("empVar", "MD", "GMD", "Qalpha"), method = "kernel", control = list(), alpha = 0.8, fpc = TRUE, tol, plot = FALSE, level = 0.05)
x |
time series (numeric or ts vector). |
version |
variance estimation method (see |
method |
method for estimating the long run variance. |
control |
a list of control parameters. |
alpha |
quantile of the distribution function of all absolute pairwise differences used in |
fpc |
finite population correction (boolean). |
tol |
tolerance for the computation of the p-value (numeric). Default for kernel-based long run variance estimation: 1e-8. Default for bootstrap: 1e-3. |
plot |
should the test statistic be plotted (cf. |
level |
significance level of the test (numeric between 0 and 1). |
The function performs a CUSUM-type test on changes in the scale. Formally, the hypothesis pair is
where and is the length of the time series. is called a 'change point'. The hypotheses do not include since then the variance of one single observation would need to be estimated.
The test statistic is computed using scale_stat and in the case of method = "kernel" asymptotically follows a Kolmogorov distribution. To derive the p-value, the function pKSdist is used.
If method = "bootstrap", a dependent block bootstrap with parameters tol and control$l is performed in order to derive the p-value of the test. First, select a boostrap sample , , the following way: Uniformly draw a random iid sample from and concatenate the blocks for . Then apply the test statistic to the bootstrap sample. Repeat the procedure times. The p-value is can be obtained as the proportion of the bootstrapped test statistics which is larger than the test statistic on the full sample.
A list of the class "htest" containing the following components:
statistic |
return value of the function |
p.value |
p-value (numeric). |
alternative |
alternative hypothesis (character string). |
method |
name of the performed test (character string). |
cp.location |
index of the estimated change point location (integer). |
data.name |
name of the data (character string). |
Plus if method = "kernel":
lrv |
list containing the compontents |
else if method = "bootstrap":
bootstrap |
list containing the compontents |
Sheila Görz
Gerstenberger, C., Vogel, D., and Wendler, M. (2020). Tests for scale changes based on pairwise differences. Journal of the American Statistical Association, 115(531), 1336-1348.
Dürre, A. (2022+). "Finite sample correction for cusum tests", unpublished manuscript
scale_stat, lrv, pKSdist, Qalpha
x <- arima.sim(list(ar = 0.5), 100) # under H_0: scale_cusum(x) scale_cusum(x, "MD") # under the alternative: x[51:100] <- x[51:100] * 3 scale_cusum(x) scale_cusum(x, "MD")x <- arima.sim(list(ar = 0.5), 100) # under H_0: scale_cusum(x) scale_cusum(x, "MD") # under the alternative: x[51:100] <- x[51:100] * 3 scale_cusum(x) scale_cusum(x, "MD")
Computes the test statistic for CUSUM-based tests on scale changes.
scale_stat(x, version = c("empVar", "MD", "GMD", "Qalpha"), method = "kernel", control = list(), alpha = 0.8)scale_stat(x, version = c("empVar", "MD", "GMD", "Qalpha"), method = "kernel", control = list(), alpha = 0.8)
x |
time series (numeric or ts vector). |
version |
variance estimation method. |
method |
either |
control |
a list of control parameters. |
alpha |
quantile of the distribution function of all absolute pairwise differences used in |
Let be the length of the time series. The CUSUM test statistic for testing on a change in the scale is then defined as
where is a scale estimator computed using only the first elements of the time series .
If method = "kernel", the test statistic is divided by the estimated long run variance so that it asymptotically follows a Kolmogorov distribution. is computed by the function lrv using kernel-based estimation.
For the scale estimator , there are five different options which can be set via the version parameter:
Empirical variance (empVar)
Mean deviation (MD)
where is the median of .
Gini's mean difference (GMD)
(Qalpha)
where is the empirical distribtion function of (cf. Qalpha).
For the kernel-based long run variance estimation, the default bandwidth is determined as follows:
If is the estimated autocorrelation to lag , a maximal lag is selected to be the smallest integer so that
. This is determined for both the original data and the squared data and the maximum is taken. Then the bandwidth is the minimum of and .
Test statistic (numeric value) with the following attributes:
cp-location |
indicating at which index a change point is most likely. |
teststat |
test process (before taking the maximum). |
lrv-estimation |
long run variance estimation method. |
If method = "kernel" the following attributes are also included:
sigma |
estimated long run variance. |
param |
parameter used for the lrv estimation. |
kFun |
kernel function used for the lrv estimation. |
Is an S3 object of the class "cpStat".
Sheila Görz
Gerstenberger, C., Vogel, D., and Wendler, M. (2020). Tests for scale changes based on pairwise differences. Journal of the American Statistical Association, 115(531), 1336-1348.
x <- arima.sim(list(ar = 0.5), 100) # under H_0: scale_stat(x, "GMD") scale_stat(x, "Qalpha", method = "bootstrap") # under the alternative: x[51:100] <- x[51:100] * 3 scale_stat(x) scale_stat(x, "Qalpha", method = "bootstrap")x <- arima.sim(list(ar = 0.5), 100) # under H_0: scale_stat(x, "GMD") scale_stat(x, "Qalpha", method = "bootstrap") # under the alternative: x[51:100] <- x[51:100] * 3 scale_stat(x) scale_stat(x, "Qalpha", method = "bootstrap")
Computes the weighted median of a numeric vector.
weightedMedian(x, w)weightedMedian(x, w)
x |
Numeric vector. |
w |
Integer vector of weights. |
Here, the median of an even length of is defined as if is the -th largest element in , i.e. the larger value is taken.
Weighted median of x with respect to w.
x <- c(1, 4, 9) w <- c(5, 1, 1) weightedMedian(x, w)x <- c(1, 4, 9) w <- c(5, 1, 1) weightedMedian(x, w)
Computes the test statistic for the Wilcoxon-Mann-Whitney change point test
wilcox_stat(x, h = 1L, method = "kernel", control = list())wilcox_stat(x, h = 1L, method = "kernel", control = list())
x |
Time series (numeric or ts vector). |
h |
Kernel function of the U statistic (1L or 2L, or a function with two parameters). |
method |
Method for estimating the long run variance. Options are |
control |
A list of control parameters for the estimation of the long run variance (cf. |
Let n be the length of x, i.e. the number of observations.
h = 1L:
h = 2L:
is estimated by the square root of lrv. The denominator corresponds to that in the ordinary CUSUM change point test.
By default, kernel-based estimation is used.
If h = 1L, the default for distr is TRUE. If no block length is supplied, first the time series is corrected for the estimated change point and Spearman's autocorrelation to lag 1 () is computed. Then the default bandwidth follows as
Otherwise, the default for distr is FALSE and the default bandwidth is
Test statistic (numeric value) with the following attributes:
cp-location |
indicating at which index a change point is most likely. |
teststat |
test process (before taking the maximum). |
lrv-estimation |
long run variance estimation method. |
sigma |
estimated long run variance. |
param |
parameter used for the lrv estimation. |
kFun |
kernel function used for the lrv estimation. |
Is an S3 object of the class "cpStat".
Sheila Görz
Dehling, H., et al. "Change-point detection under dependence based on two-sample U-statistics." Asymptotic laws and methods in stochastics. Springer, New York, NY, 2015. 195-220.
# time series with a location change at t = 20 x <- c(rnorm(20, 0), rnorm(20, 2)) # Wilcoxon-Mann-Whitney change point test statistic wilcox_stat(x, h = 1L, control = list(b_n = length(x)^(1/3)))# time series with a location change at t = 20 x <- c(rnorm(20, 0), rnorm(20, 2)) # Wilcoxon-Mann-Whitney change point test statistic wilcox_stat(x, h = 1L, control = list(b_n = length(x)^(1/3)))
Performs the Wilcoxon-Mann-Whitney change point test.
wmw_test(x, h = 1L, method = "kernel", control = list(), tol = 1e-8, plot = FALSE)wmw_test(x, h = 1L, method = "kernel", control = list(), tol = 1e-8, plot = FALSE)
x |
time series (numeric or ts vector). |
h |
version of the test (integer, 1L or 2L) |
method |
method for estimating the long run variance. |
control |
a list of control parameters (cf. |
tol |
tolerance of the distribution function (numeric), which is used to compute p-values. |
plot |
should the test statistic be plotted (cf. |
The function performs a Wilcoxon-Mann-Whitney change point test. It tests the hypothesis pair
where and is the length of the time series. is called a 'change point'.
The test statistic is computed using wilcox_stat and asymptotically follows a Kolmogorov distribution. To derive the p-value, the function pKSdist is used.
A list of the class "htest" containing the following components:
statistic |
value of the test statistic (numeric). |
p.value |
p-value (numeric). |
alternative |
alternative hypothesis (character string). |
method |
name of the performed test (character string). |
cp.location |
index of the estimated change point location (integer). |
data.name |
name of the data (character string). |
Sheila Görz
Dehling, H., et al. "Change-point detection under dependence based on two-sample U-statistics." Asymptotic laws and methods in stochastics. Springer, New York, NY, 2015. 195-220.
#time series with a structural break at t = 20 Z <- c(rnorm(20, 0), rnorm(20, 2)) wmw_test(Z, h = 1L, control = list(overlapping = TRUE, b_n = 5))#time series with a structural break at t = 20 Z <- c(rnorm(20, 0), rnorm(20, 2)) wmw_test(Z, h = 1L, control = list(overlapping = TRUE, b_n = 5))
Contains the zeros of the Bessel function of first kind.
data("zeros")data("zeros")
A data frame where the i-th column contains the first 50 zeros of the Bessel function of the first kind and ((i - 1) / 2)th order, i = 1, ..., 5001.
The zeros are computed by the mathematical software octave.
Eaton, J., Bateman, D., Hauberg, S., Wehbring, R. (2015). "GNU Octave version 4.0.0 manual: a high-level interactive language for numerical computations".