Detect a discrepancy between the empirical normal distribution of a sample of (transformed to normal scale) NLME parameters and a specified "target" normal distribution.

In brief, this function tests whether the Bhattacharyya distances to the normal distribution defined by targetMuNormal and targetSigmaNormal of (the transformed to normal scale) values in testSample and targetSample are significantly different. The Bhattacharyya distance measures the overlap between two (multivariate) normal distributions. It is assumed that targetSample is consistent with targetMuNormal and targetSigmaNormal. A discrepancy is then reported for each parameter X (column X in testSample and targetSample), and for each pair of parameters X,Y, such that the T-test Bonferoni-corrected p-value for differing mean Bhattacharyya distances is significant at the level tAlpha. Specifically, for each parameter X and pair of parameters X,Y among the variable parameter columns in testSample and targetSample, the procedure goes as follows:

  1. Transform testSample and targetSample to normal scale based on the parameter transformation in spec.

  2. Divide the two transformed samples into Ntests chunks of Npop rows.

  3. Calculate the Bhattacharyya distances between the normal distributions defined by the empirical mean vector and covariance matrix of each (test and target) chunk and the normal distribution defined by targetMuNormal and targetSigmaNormal. The result of this calculation are two vectors of length Ntests containing the Bhattacharyya for the test- and target-sample: btestSample and btargetSample.

  4. Perform a statistical T-test for significant difference between the means of btestSample and btargetSample.

  5. Perform a Bonferoni p-value correction (multiplying the p-value by the number of single parameters or by the number of parameter pairs.

  6. If X is a single parameter with a significant Bonferoni-corrected p-value at the tAlpha critical level, generate a combined histogram plot comparing the X-values in testSample against targetSample. Otherwise, if X is pair of parameters with a significant Bonferoni-corrected p-value at the tAlpha critical level, generate a combined 2d density plot, comparing the X[1],X[2]-values in testSample and targetSample.

detectSampleDiscrepancy(
  obj,
  spec = specifyParamSampling(obj),
  testSample = sampleParamFromUncertainty(spec, Npop = Npop * Ntests),
  targetMuNormal = unlist(transformParamToNormal(spec, getParamEstimates(spec))[1,
    getNamesAllParameters(spec)]),
  targetSigmaNormal = getCovMatrixUncertainty(spec),
  targetSample = untransformParamFromNormal(spec, as.data.frame(MASS::mvrnorm(Npop *
    Ntests, mu = targetMuNormal, Sigma = targetSigmaNormal))),
  Npop = 200,
  Ntests = 5,
  tAlpha = 0.01,
  FLAGverbose = FALSE
)

Arguments

obj

a filename (character string) denoting the path to a GPF file, or a GPF object, or an IQRnlmeParamSpec object. This argument is ignored if spec is specified.

spec

an IQRnlmeParamSpec object. This argument overwrites the argument obj. Default: specifyParamSampling(obj).

testSample

a data.frame of Ntests*Npop rows with columns named as (some of) the parameters in spec (see getNamesAllParameters) and rows corresponding to different sampled tupplets of such parameters. The parameter values in this data.frame are at the original scale (not transformed to normal scale). Default: sampleParamFromUncertainty(spec, Npop = Npop*Ntests).

targetMuNormal

a named numeric vector specifying the mean of the target normal distribution (on the normal scale). Default: unlist(transformParamToNormal(spec, getParamEstimates(spec))[1L, getNamesAllParameters(spec)])

targetSigmaNormal

a square matrix specifying the variance covariance matrix of the target normal distribution (on the normal scale). Default: getCovMatrixUncertainty(spec).

targetSample

a data.frame of the same format as testSample. Default: untransformParamFromNormal(spec, as.data.frame(MASS::mvrnorm(Npop * Ntests, mu = targetMuNormal, Sigma = targetSigmaNormal)))

Npop, Ntests

integers defining the number of rows in one parameter sample for one T-test. The number of rows in testSample and targetSample is equal to Npop*Ntests. If testSample and targetSample are specified Npop is set to nrow(testSample) %/% Ntests.

tAlpha

a double between 0 and 1 specifying the critical p-value level for the T-tests. Note: p-values are Bonferoni corrected.

FLAGverbose

a logical indicating if information should be printed to the console.

Value

A named list with at least the following elements:

  • detectedSingleParameters: a character vector of the names of detected parameters with apparent sampling discrepancy;

  • listHistogramsDetectedSingleParams: a list of ggplot objects;

  • detectedPairParameters: a character vector of the names of detected parameter-pairs with apparent sampling discrepancy;

  • listDensPlotsDetectedPairParams: a list of ggplot objects.

Author

Venelin Mitov

Examples

if (FALSE) {
discr <- detectSampleDiscrepancy(
    system.file("extdata", "nlme_param_sampling", "PKparameters.xlsx",
                package = "IQRtools"),
    Npop = 500, Ntests = 10)
discr$histogramsDetectedSingleParams + ggplot2::theme(legend.position = "bottom")
discr$densplotsDetectedPairParams + ggplot2::theme(legend.position = "bottom")
}