Cluster Data in Blocks of Similar x Values and Summarize y Values per Block
clusterX.Rd
Cluster Data in Blocks of Similar x Values and Summarize y Values per Block
clusterX(
x,
y = NULL,
groupsize = 5,
resolution = 0.1,
lambda = 1,
iterlim = 100,
log = FALSE
)
statXY(x, y = NULL, ..., quantiles = c(0.05, 0.95))
Arguments
- x
x values or data.frame of x and y values
- y
y values or NULL, if x is a data.frame of x and y values
- groupsize
smallest expected group size
- resolution
gaps between groups of data points greater than
resolution
lead to separation of groups.- lambda
penalization of intra-group variance, set to 1 to have more groups and set to 0 to get less but larger groups.
- iterlim
maximum number of iterations the algorithm takes.
- log
cluster on
log(x)
or onx
. Does not change the value ofresolution
.- ...
arguments going to
clusterX()
.- quantiles
the requested quantiles, usually 0.05 and 0.95. Quantiles are returned in columns named
PX.VALUE
, whereX = round(100*quantiles)
.
Value
clusterX()
returns a data.frame with x, y and group values. Group is returned as a factor with numerically sorted levels.
statXY()
returns summary information as a data frame. The output of clusterX()
is returned in the attribute "clusterOut".
Details
Data points are sorted by increasing x value and assigned into groups of size groupsize
.
Next, groups separated by less than resolution
are merged. In the following iterative algorithm,
the L1-distance of each data point to each of the groups is computed and wheighted by the groups geometric
standard deviation. Data points are then reassigned to the closest group. The procedure is repeated until
group membership does not change any more.
statXY()
computes a data.frame with the following columns
GROUP = group
, group identifierTIME = mean(x)
, mean group x value, usually timeMEAN.VALUE = mean(y)
, mean group y valueMEDIAN.VALUE = median(y)
, median of group y valuesSD.VALUE = sd(y)
, standard deviation of group y valuesSE.VALUE = sd(y)/sqrt(length(y))
, standard error of group y valuesGEOMMEAN.VALUE = exp(mean(log(y)))
, geometrical mean of y valuesGEOMSD.VALUE = exp(sd(log(y)))
, geometrical standard deviation of y valuesPX.VALUE = quantile(y, probs = X/100)
, X\
See also
Other Auxiliary:
IQRloadCSVdata()
,
IQRsaveCSVdata()
,
and()
,
aux_explode()
,
aux_explodePC()
,
aux_fileparts()
,
aux_fileread()
,
aux_filewrite()
,
aux_getRelPath()
,
aux_mkdir()
,
aux_na_locf()
,
aux_postFillChar()
,
aux_preFillChar()
,
aux_quantilenumber()
,
aux_rmdir()
,
aux_simplifypath()
,
aux_splitVectorEqualPieces()
,
aux_strFindAll()
,
aux_strrep()
,
aux_strtrim()
,
aux_unlevel()
,
aux_version()
,
calcAICBIC()
,
compare_IQRmodel_IQRsysModel_simulation()
,
fit_EmaxModel()
,
format_GUM()
,
ge()
,
gen_aux_version()
,
geocv()
,
geomean()
,
geosd()
,
ginv()
,
gt()
,
interp0()
,
interp1()
,
interpcs()
,
inv_logit()
,
le()
,
logit()
,
lt()
,
mod()
,
mvrnorm()
,
norm_M3()
,
or()
,
piecewise()
,
progressBar()
,
remove_duplicates()
,
run_silent_IQR()
,
stopIQR()
,
tempdirIQR()
,
tempfileIQR()
,
warningIQR()
Other Auxiliary:
IQRloadCSVdata()
,
IQRsaveCSVdata()
,
and()
,
aux_explode()
,
aux_explodePC()
,
aux_fileparts()
,
aux_fileread()
,
aux_filewrite()
,
aux_getRelPath()
,
aux_mkdir()
,
aux_na_locf()
,
aux_postFillChar()
,
aux_preFillChar()
,
aux_quantilenumber()
,
aux_rmdir()
,
aux_simplifypath()
,
aux_splitVectorEqualPieces()
,
aux_strFindAll()
,
aux_strrep()
,
aux_strtrim()
,
aux_unlevel()
,
aux_version()
,
calcAICBIC()
,
compare_IQRmodel_IQRsysModel_simulation()
,
fit_EmaxModel()
,
format_GUM()
,
ge()
,
gen_aux_version()
,
geocv()
,
geomean()
,
geosd()
,
ginv()
,
gt()
,
interp0()
,
interp1()
,
interpcs()
,
inv_logit()
,
le()
,
logit()
,
lt()
,
mod()
,
mvrnorm()
,
norm_M3()
,
or()
,
piecewise()
,
progressBar()
,
remove_duplicates()
,
run_silent_IQR()
,
stopIQR()
,
tempdirIQR()
,
tempfileIQR()
,
warningIQR()
Examples
if (FALSE) { # \dontrun{
library(ggplot2)
# Center timers for simulation and function to produce nice curve
timesD <- c(2, 10, 15, 30, 60, 120)
myfn <- function(x) 100*(1-exp(-.03*x))*exp(-.1*x)
# Randomly sampled times around center times with variable sample size
times <- unlist(lapply(timesD, function(x) stats::rnorm(runif(1, 2, 10), x, 0.2*x)))
# Simulate noisy data
x <- data.frame(
TIME = times,
VALUE = stats::rnorm(length(times), myfn(times), 0.1*myfn(times) + 1)
)
x <- subset(x, TIME > 0)
# Run cluster algorithm and get statistics
stat <- statXY(x, groupsize = 5, resolution = .01)
out <- attr(stat, "clusterOut")
# Plot result
P <- ggplot(out, aes(x = TIME, y = VALUE, color = block, pch = block)) + geom_point() +
annotate("line", x = stat$TIME, y = stat$MEDIAN.VALUE) +
annotate("line", x = stat$TIME, y = c(stat$P5.VALUE), lty = 2) +
annotate("line", x = stat$TIME, y = c(stat$P95.VALUE), lty = 2)
print(P)
print(P + scale_x_log10())
} # }