8 Analysis dataset preparation

IQR Tools implements functionality to facilitate the creation of analysis or modeling datasets based on a general dataset format. The general dataset format aims to at least contain a minimum if inforamtion required and - at the same time - to be flexible to store all information that is relevant for the analysis.

The functions provided by IQRtools enable a traceable programming of the analysis data set. Metadata (e.g., covariate value units, or covariate category labels) is retained and available for documentation and producing meaningful graphs and tables. When manipulating the data for a particular task, e.g., removal of outliers, imputing covariates, log files are written.

In a first section (Example workflow) of this chapter a basic example for establishing a analysis dataset in the following steps is given:

Dataset specification of origin
- Import to the IQRdataGENERAL format
- Graphical data exploration
- Cleaning the data to establish analysis dataset
- Export to .csv and .xpt

The second section, (Further options) discusses more features for the following steps with which the basic workflow can be customized.

Settings for dataset import
Statistical exploration
Graphical exploration

8.1 Example workflow

8.1.1 Original dataset in general row-based format

In our example, the source data contains data from a single ascending dose study in healthy male subjects that should be prepared for a population PK analysis. The dataset is provided as a comma separated file:

# Define dataset location
dataFile <- "material/01-01-DataProgAnal/dataSource01.csv"
sourceData <- read.csv(dataFile)

Table 8.1: Table 8.2: Source example dataset 1
USUBJID	TRTNAME	TIME	NT	TIMEUNIT	NAME	VALUE	VALUETXT	UNIT	LLOQ	ROUTE	CENTER	VISIT	STUDY
IQ00701-0100-0001	Placebo	-167.88	-168.00	Hours	Age	34	.	Years	.	.	100	1	FIH
IQ00701-0100-0001	Placebo	-167.88	-168.00	Hours	Gender	1	male	.	.	.	100	1	FIH
IQ00701-0100-0001	Placebo	-1.00	-1.00	Hours	Bodyweight	89	.	kg	.	.	100	2	FIH
IQ00701-0100-0001	Placebo	-1.00	-1.00	Hours	Height	185	.	cm	.	.	100	2	FIH
IQ00701-0100-0001	Placebo	23.00	23.00	Hours	Height	200	.	cm	.	.	100	2	FIH
IQ00701-0100-0001	Placebo	39.00	39.00	Hours	Height	250	.	cm	.	.	100	2	FIH
IQ00701-0100-0001	Placebo	-0.08	-0.08	Hours	Plasma concentration IQ0815	0	.	ug/mL	0.001	.	100	2	FIH
IQ00701-0100-0001	Placebo	0.00	0.00	Hours	Dose IQ0815	0	.	mg	.	oral	100	2	FIH
IQ00701-0100-0001	Placebo	0.08	0.08	Hours	Plasma concentration IQ0815	0	.	ug/mL	0.001	.	100	2	FIH
IQ00701-0100-0001	Placebo	0.25	0.25	Hours	Plasma concentration IQ0815	0	.	ug/mL	0.001	.	100	2	FIH
IQ00701-0100-0002	1mg oral single dose	-167.35	-168.00	Hours	Age	29	.	Years	.	.	100	1	FIH
IQ00701-0100-0002	1mg oral single dose	-167.35	-168.00	Hours	Gender	1	male	.	.	.	100	1	FIH

The source dataset contains all information (doses, observations, covariates) in a row-based format and has preserved column names used for the data import in IQR Tools (see table below for a description of the columns of the example dataset). Of the columns in the data only seven (written in bold) are required, but there exist more preserved column names such as “STUDY”, “PART”, or “COMPOUND” to cover the typical information used in pharmacometric or systems pharamacology analyses. The general dataset format is described in detail in the chapter General Dataset Format

An important column is the NAME column identifying which type of event is recorded, e.g., whether the row reports an dosing event, measured plasma concentration value or the age of the subject. Typical entries for these example could be “Aspririn dose”, “Aspirin concentration”, or “Age at screening visit”. The VALUE column exclusively contains numerical values. For categorical covariates the VALUETXT column is used instead or in addition to note the category value, e.g., “yes”/“no” or “red”/“blue”/“green”.

Column	Description
USUBJID	Unique subject identifier
TRTNAME	Name of actual treatment given to subject
TIME	(NUMERIC) Actual time of event relative to first dose administration
NT	(NUMERIC) Nominal event time
TIMEUNIT	Unit of all numeric time definitions in the dataset (HOURS, MINUTES, DAYS, SECONDS, WEEKS, MONTHS, YEARS)
NAME	Unique short name of event
VALUE	(NUMERIC) Value of event defined by NAME
VALUETXT	Text version of value (If VALUETXT is defined, VALUE can be undefined and is used to code categorical information
UNIT	Unit of the value reported in the VALUE column
LLOQ	(NUMERIC) Lower limit of quantification for event defined by NAME (value only interpreted for observation events)
ROUTE	Route of administration (e.g., IV,SUBCUT,ORAL,TOPICAL) (value only interpreted for dosing events)
CENTER	Center number
VISIT	Visit number

8.1.2 Import as IQRdataGENERAL format

To create an analysis dataset, we need to reformat the dataset and add columns needed for the model-based analysis:

Numerical columns need to be created
- to discriminate dosing from observation records,
- to identify different observation and dosing types, and
- to flag whether records are valid, missing or out of the measureable range
Covariates need to be provided in (numerical) columns

Based on the NAME column, we can specify which records are dosing information, observations, and covariate information. For the dosing and observation records, they only need to be listed in a vector. For the covariate records, we need to have a named vector. The names are used as the column names of the covariate columns that will be created.

# Define the names (NAME column) of the records you want to consider as dose records
doseNAMES <- "Dose IQ0815"
# Define the names (NAME column) of the records you want to consider as observation records
obsNAMES <- "Plasma concentration IQ0815"
# Define the CONTINUOUS covariate columns (time INDEPENDENT)
cov0 <- c(
  WT0  = "Bodyweight",          # COLNAME = "NAME of the event"
  AGE0 = "Age",
  HT0  = "Height"
)
# Define the CATEGORICAL covariate columns (time INDEPENDENT) that you want to generate
# from event records in the general dataset
cat0 <- c(
  SEX  = "Gender"
)
# Convert general dataset as an IQRdataGENERAL object
data1 <- IQRdataGENERAL(input=dataFile,
                        doseNAMES=doseNAMES, obsNAMES=obsNAMES,
                        cov0=cov0,cat0=cat0)

The resulting data object data1 contains additional columns that are needed for model-based anylsis. Some important columns that have been created are

the column EVID indicating dosing events (EVID = 1) and observarion records (EVID = 0),
the column YTYPE discriminating different observation types,
the column ADM discriminating different dosing types,
the column AMT to annotate dosing amount,
the column MDV flagging missing data values (MDV = 1),
the column CENS flagging data values outside the measurable range (e.g., CENS = 1 for BLQ values)
the column IXGDF with a unique record number for each row of the dataset.

The covariate columns WT0, HT0, AGE0, and SEX have been added as well containing numerical values.

Table 8.3: Table 8.4: Example dataset 1 imported as IQRdataGENERAL (selected columns and first 12 rows)
	IXGDF	USUBJID	TRTNAME	TIME	TIMEUNIT	VALUE	AMT	EVID	YTYPE	ADM	MDV	WT0	HT0	AGE0	SEX
5	7	IQ00701-0100-0001	Placebo	-0.08	HOURS	0	0	0	1	0	1	89	185	34	1
6	8	IQ00701-0100-0001	Placebo	0.00	HOURS	0	0	1	0	1	1	89	185	34	1
7	9	IQ00701-0100-0001	Placebo	0.08	HOURS	0	0	0	1	0	1	89	185	34	1
8	10	IQ00701-0100-0001	Placebo	0.25	HOURS	0	0	0	1	0	1	89	185	34	1
15	15	IQ00701-0100-0002	1mg oral single dose	0.00	HOURS	1	1	1	0	1	1	65	186	29	1
20	20	IQ00701-0100-0003	1mg oral single dose	-0.08	HOURS	0	0	0	1	0	1	80	180	28	1
21	21	IQ00701-0100-0003	1mg oral single dose	0.00	HOURS	1	1	1	0	1	1	80	180	28	1
22	22	IQ00701-0100-0003	1mg oral single dose	0.08	HOURS	0	0	0	1	0	1	80	180	28	1
23	23	IQ00701-0100-0003	1mg oral single dose	0.25	HOURS	0	0	0	1	0	0	80	180	28	1
24	24	IQ00701-0100-0003	1mg oral single dose	0.50	HOURS	0	0	0	1	0	0	80	180	28	1
25	25	IQ00701-0100-0003	1mg oral single dose	1.00	HOURS	0	0	0	1	0	0	80	180	28	1
26	26	IQ00701-0100-0003	1mg oral single dose	2.00	HOURS	0	0	0	1	0	0	80	180	28	1

The variable data1 adopted the class IQRdataGENERAL that stores metadata in its attributes. Amongst others, the attributes store the information on units, category and covariate names and can be called using the convenience functions covInfo() and catInfo(). Importantly, if present in the source data, the columns STUDY and TRTNAME containing character values covariate columns were used to create covariate columns STUDYN and TRT containing numerical values.

# Display information on continuous covariates
covInfo(data1)

COLNAME	NAME	UNIT	TIME.VARYING
WT0	Bodyweight	kg	FALSE
AGE0	Age	Years	FALSE
HT0	Height	cm	FALSE

# Display information on categorical covariates
catInfo(data1)

COLNAME	NAME	UNIT	VALUETXT	VALUES	TIME.VARYING
SEX	Gender	NA	male	1	FALSE
STUDYN	Study		FIH,FIH extension	1,2	FALSE
TRT	TRTNAME		Placebo,1mg oral single dose,2mg oral single dose,5mg oral single dose,10mg oral single dose,20mg oral single dose,50mg oral single dose,100mg oral single dose,200mg oral single dose	9,3,6,8,2,5,7,1,4	FALSE

Using the summary function useful information on the data is displayed. In addition, data integrity checks are performed.

summary(data1)

##    INFO                                          | NAME                        | VALUE                                              
##    ---------------------------------------------------------------------------------------------------------------------------------
##    Dose events                                   | Dose IQ0815                 | Ntotal: 48,  Nindiv (min/median/max): 1/1/1)       
##    Observation events (all)                      | Plasma concentration IQ0815 | Ntotal: 543,  Nindiv (min/median/max): 3/12/12)    
##    Observation events (MDV=0)                    | Plasma concentration IQ0815 | Ntotal: 410,  Nindiv (min/median/max): 5/10/11)    
##    Doses AMT=0 present                           | ALL dose events             | TRUE (N=1)                                         
##    Placebo subjects present (AMT=0 or no doses)  | ALL dose events             | TRUE (N=1)                                         
##    IGNORED (MDV=1) observation records present   | ALL observation events      | TRUE (N=133)                                       
##    Subjects without observations (MDV=0) present | ALL observation events      | TRUE (N=3)                                         
##    Total BLLOQ information                       | Plasma concentration IQ0815 | N=133 / 24.5%                                      
##    Max % BLLOQ values in a subject               | Plasma concentration IQ0815 | 8.33%                                              
##    BLLOQ handling method                         | All observation events      | M1                                                 
##    NLME columns containing NA                    | All events                  | WT0, AGE0, HT0, SEX                                
##    Issues present in the data                    | Minor                       | YES (see text below the table for more information)
##    Issues present in the data                    | Warnings                    | NONE                                               
##    Issues present in the data                    | Errors                      | NONE                                               
## 
## 
## IQRoutputTable object

## 
## MINOR issues in the dataset that might be addressed
## ===================================================
## SUBJECT LEVEL (IQ00701-0100-0005): Subject has records of NAME "Plasma concentration IQ0815" at same TIME points

The summary and checks help to detect issues in the dataset that should be fixed before an analysis starts. In this example, there is only one placebo subject, but there are 3 individuals without any observation records. Thus, we may want to remove these individuals as well as a duplicated plasma record for subject “IQ00701_0100_0005”. Also, the covariate columns contain NA values and we probably need to impute values for some individuals.

8.1.3 Source data exploration

An recommended step before any data analysis or modeling activity is to explore the data visually. IQR Tools provides a range of standard plotting function. Here, we make use of two of them, the function plotSpaghetti_IQRdataGENERAL() producing line plots per treatment group that is very useful to get an overview on the observation time courses and the function plotIndiv_IQRdataGENERAL() producing much more detailed plots for evry individual in a pdf file.

# Line plots stratified by treatment group
out <- plotSpaghetti_IQRdataGENERAL(data1)
out$unstratified$`Plasma concentration IQ0815`

# Generation of a pdf file containing detailed plots for each individual
plotIndiv_IQRdataGENERAL(data1, filename = "material/01-01-DataProgAnal/IndivPlots01.pdf")

From the overview plot we can see that one subject who received 5mg of IQ0815 has a implausible PK profile. Also, we can spot some outlying records. The detailed individual plots in the pdf file help to exactly identify the subject with implausible observations and the outlying records: The plots display the unique subject identifiers, all observations and dosing records as well as the treatment group name. The observation records are labelled with the unique record identifier IXGDF.

8.1.4 Cleaning to create an analysis dataset

Based on the data exploration, we want to clean the dataset before finally producing the modeling dataset.

Specification of records to be removed

Named lists are created containing either unique subject identifiers (USUBJID) for removing all data from specific subjects or unique record identifiers (IXGDF) for removing particular records. The list names can is to annotate the reason for removal.

removeSubjects <- list("No PK data" = c("IQ00701-0100-0002","IQ00701-0100-0008"),
                       "Nonsense profile" = c("IQ00701-0100-0018"))
removeRecords  <- list("Implausible value" = c(59, 98, 301))

Specification of covariate imputation

To impute covariates, we use names vectors for continuous and categorical covaraites respectively. The values given will be imputed for all missing covariates. For the continuous covariates, it is also possible to provide suitable function names like “mean” or “median” to calculat the imputation value based on the available values.

imputeContinuous = c("HT0" = "median", "WT0" = 70)
imputeCategory   = c("SEX"=1)

Perform documented dataset cleaning

For generating a cleaned analysis dataset, the function clean_IQRdataGENERAL() is used. It not only accepts the information on data removal or covariate imputation, but has more functionality and options. In this example, the method to handle BLLOQ data is chosen (methodBLLOQ = "M3"), ignored records are decided to be kept in the dataset (FLAGrmIGNOREDrecords = FALSE), but the placebo subjects are removed (FLAGrmPlacebo = TRUE). Very importantly, information on the cleaning process is written to the folder specified as pathname input argument.

data1CleanM3 <- clean_IQRdataGENERAL(data1,
                                  pathname = "material/01-01-DataProgAnal/DataCleaning01",
                                  methodBLLOQ = "M3",
                                  subjects = removeSubjects,
                                  records = removeRecords,
                                  FLAGrmIGNOREDrecords = FALSE,
                                  FLAGrmPlacebo = TRUE,
                                  continuousCovs = imputeContinuous, categoricalCovs = imputeCategory)

8.1.5 Export

To make the dataset available for parameter estimation with NONMEM or MONOLIX, it is exported with the function exportNLME_IQRdataGENERAL(). In this step, some adjustments to the data are done, e.g., removing spaces in character strings, such that the data set is accepted by the softwares.

# Export the NLME data set with BLQ method M3
exportNLME_IQRdataGENERAL(data1CleanM3,
                          filename = "material/01-01-DataProgAnal/dataNLME01/data.csv",
                          FLAGxpt = TRUE,
                          FLAGdefine = TRUE)

This function also has some options what export is performed. In any case, there will be a csv-file generated. However, we can use the option FLAGxpt = TRUE to additionally write an xpt-file and the option FLAGdefine = TRUE to produce a define file with data set specifications for the analysis dataset. Click here to download the example define file.

8.2 Workflow customization

8.2.1 Dataset handling

For showing more examples to import a dataset as IQRdataGENERAL object, we load another dataset from file.

dataSource2 <- IQRloadCSVdata("material/01-01-DataProgAnal/dataSource02.csv")

Multiple dose or observation types

This data contains dosings of two different drugs and two observations types. Note that the IQRdataGENERAL() function accepts the path to the source data (see example workflow above) but also the loaded data frame as input.

# Dose records
doseNAMES <- c("Dose Z","Dose X")
# Observation records
obsNAMES <- c("Plasma concentration Z","Efficacy marker")
# Import as IQRdataGENERAL
data2 <- IQRdataGENERAL(dataSource2, doseNAMES = doseNAMES, obsNAMES = obsNAMES)

Table 8.5: Table 8.6: Example dataset 2 (selected columns and first 12 rows)
	USUBJID	TRTNAME	NAME	TIME	VALUE	AMT	EVID	YTYPE	ADM
1	ZY1000101066	SD IV 15 mg/kg	Efficacy marker	-7.1118056	21.152	0	0	2	0
3	ZY1000101066	SD IV 15 mg/kg	Dose X	0.0000000	918.000	918	1	0	2
4	ZY1000101066	SD IV 15 mg/kg	Plasma concentration Z	0.0826389	998.001	0	0	1	0
6	ZY1000101066	SD IV 15 mg/kg	Plasma concentration Z	16.2076389	159.399	0	0	1	0
7	ZY1000101066	SD IV 15 mg/kg	Plasma concentration Z	29.2006944	119.400	0	0	1	0
8	ZY1000101066	SD IV 15 mg/kg	Plasma concentration Z	42.1701389	42.201	0	0	1	0
9	ZY1000101066	SD IV 15 mg/kg	Efficacy marker	112.2256944	22.223	0	0	2	0
10	ZY1000101067	SD IV 15 mg/kg	Efficacy marker	-13.8138889	23.043	0	0	2	0
12	ZY1000101067	SD IV 15 mg/kg	Dose X	0.0000000	1311.000	1311	1	0	2
13	ZY1000101067	SD IV 15 mg/kg	Plasma concentration Z	0.0972222	1362.000	0	0	1	0
15	ZY1000101067	SD IV 15 mg/kg	Plasma concentration Z	15.0736111	176.199	0	0	1	0
16	ZY1000101067	SD IV 15 mg/kg	Plasma concentration Z	29.2506944	100.599	0	0	1	0

Time-varying covariates

Time-varying covariates are defined analogue to time-independent covariates. The PD observations are here used once as baseline covariate as well as time-dependent covariate. In the first case, all records are used in the covariate column according to the time of the observation records they are mapped onto. In case it is used as time independent covariate, the baseline value per individual (defined by columns BASE, SCREEN, or pre-first dose records, for details see ?IQRdataGENERAL) will be mapped to the observations.

# Define the CONTINUOUS covariate columns (time INDEPENDENT)
cov0 <- list(
  PDbase  = "Efficacy marker"
)
# Define the CONTINUOUS covariate columns (time DEPENDENT)
covT <- list(
  PDcont  = "Efficacy marker"
)
# Define the CATEGORICAL covariate columns (time INDEPENDENT)
cat0 <- list(
  SEX  = "Gender"
)
# Define the CATEGORICAL covariate columns (time DEPENDENT)
catT <- list(
  HSTAT   = "Health status"
)
# Import to IQRdataGENERAL object
data2 <- IQRdataGENERAL(dataSource2, doseNAMES = doseNAMES, obsNAMES = "Plasma concentration Z",
                        cov0 = cov0, cat0 = cat0,
                        covT = covT, catT = catT)

Table 8.7: Table 8.8: Example dataset 2 with various covariate columns (selected columns and first 12 rows)
	USUBJID	TRTNAME	NAME	TIME	VALUE	AMT	EVID	YTYPE	ADM	PDbase	PDcont	SEX	HSTAT
3	ZY1000101066	SD IV 15 mg/kg	Dose X	0.0000000	918.000	918	1	0	2	21.152	21.152	NA	3
4	ZY1000101066	SD IV 15 mg/kg	Plasma concentration Z	0.0826389	998.001	0	0	1	0	21.152	21.152	NA	3
6	ZY1000101066	SD IV 15 mg/kg	Plasma concentration Z	16.2076389	159.399	0	0	1	0	21.152	21.152	NA	3
7	ZY1000101066	SD IV 15 mg/kg	Plasma concentration Z	29.2006944	119.400	0	0	1	0	21.152	21.152	NA	3
8	ZY1000101066	SD IV 15 mg/kg	Plasma concentration Z	42.1701389	42.201	0	0	1	0	21.152	21.152	NA	3
12	ZY1000101067	SD IV 15 mg/kg	Dose X	0.0000000	1311.000	1311	1	0	2	23.043	23.043	NA	2
13	ZY1000101067	SD IV 15 mg/kg	Plasma concentration Z	0.0972222	1362.000	0	0	1	0	23.043	23.043	NA	2
15	ZY1000101067	SD IV 15 mg/kg	Plasma concentration Z	15.0736111	176.199	0	0	1	0	23.043	23.043	NA	3
16	ZY1000101067	SD IV 15 mg/kg	Plasma concentration Z	29.2506944	100.599	0	0	1	0	23.043	23.043	NA	3
17	ZY1000101067	SD IV 15 mg/kg	Plasma concentration Z	42.1562500	57.201	0	0	1	0	23.043	23.043	NA	3
21	ZY1000101068	SD IV 15 mg/kg	Dose X	0.0000000	0.000	0	1	0	2	23.364	23.364	NA	4
26	ZY1000101069	SD IV 15 mg/kg	Plasma concentration Z	-0.0013889	0.933	0	0	1	0	16.993	16.993	NA	1

Existing covariate columns

Datasets may also contain (numerical) covariate columns instead of row-records for covariates. In this case the user needs to provide the metadata (verbose name, units, mapping of category names and values, …) such that it is added to the covariate information in the attributes.

# Additional CONTINUOUS covariates
covInfoAdd <- list(
  COLNAME      = c("AST0",         "BMI0"),
  NAME         = c("Aspartate transaminase","Body mass index"),
  UNIT         = c("UI/mL",      "kg/m2"),
  TIME.VARYING = c(FALSE,              FALSE)
)
# Additional CATEGORICAL covariates
catInfoAdd <- list(
  COLNAME      = c("FOOD"),
  NAME         = c("Food taken"),
  UNIT         = c("y/n"),
  VALUETXT     = c("No,Yes"),
  VALUES       = c("0,1"),
  TIME.VARYING = c(FALSE)
)
# Import to IQRdataGENERAL object
data2 <- IQRdataGENERAL(dataSource2, doseNAMES = doseNAMES, obsNAMES = obsNAMES,
                        covInfoAdd = covInfoAdd, catInfoAdd = catInfoAdd)

# Extended continuous covariate information
covInfo(data2)

COLNAME	NAME	UNIT	TIME.VARYING
AST0	Aspartate transaminase	UI/mL	FALSE
BMI0	Body mass index	kg/m2	FALSE

# Extended categorical covariate information
catInfo(data2)

COLNAME	NAME	UNIT	VALUETXT	VALUES	TIME.VARYING
FOOD	Food taken	y/n	No,Yes	0,1	FALSE
STUDYN	Study		Y10,Y1,Y3,Y8	2,1,3,4	FALSE
TRT	TRTNAME		SD IV 15 mg/kg,SD IV Placebo,SD IV 1.5 mg/kg,MD IV Placebo,MD IV 5mg/kg,SD or MD IV Placebo,MD IV 15mg/kg	5,6,4,3,2,7,1	FALSE

Existing NLME columns

The input argument FLAGforceOverwriteNLMEcols defines if “NLME” columns that already might be in the dataset are overwritten (TRUE=default) or not (FALSE). These NLME columns are the following numeric - NLME tool specific columns: ID, TIMEPOS, TAD, DV, MDV, EVID, CENS, AMT, ADM, TINF, RATE, YTYPE, and DOSE. Overwritting is good in a sense that these columns will be well-defined and aligned with the dataspec of IQRtools. Not over-writing them can be useful if the user manually wants to ensure certain things. But in this case the user should now what to do.

If this flag is set to TRUE than all already existing NLME columns will be overwritten. Non-present ones will be (in both cases) generated based on the default spec.

# Overwriting the NLME columns
data2 <- IQRdataGENERAL(dataSource2, doseNAMES, obsNAMES,FLAGforceOverwriteNLMEcols=TRUE)

Table 8.9: Table 8.10: Example dataset 2 with overwritten existing NLME columns (selected columns and first 12 rows)
	USUBJID	TIME	EVID	YTYPE	VALUE	ADM	AMT
1	ZY1000101066	-7.11	0	2	21.15	0	0
3	ZY1000101066	0.00	1	0	918.00	2	918
4	ZY1000101066	0.08	0	1	998.00	0	0
6	ZY1000101066	16.21	0	1	159.40	0	0
7	ZY1000101066	29.20	0	1	119.40	0	0
8	ZY1000101066	42.17	0	1	42.20	0	0

# Keeping existing NLME columns
data2 <- IQRdataGENERAL(dataSource2, doseNAMES, obsNAMES,FLAGforceOverwriteNLMEcols=FALSE)

Table 8.11: Table 8.12: Example dataset 2 with kept existing NLME columns (selected columns and first 12 rows)
	USUBJID	TIME	EVID	YTYPE	VALUE	ADM	AMT
1	ZY1000101066	-7.11	0	2	21.15	0	0
3	ZY1000101066	0.00	1	0	918.00	2	918
4	ZY1000101066	0.08	0	1	998.00	0	0
6	ZY1000101066	16.21	0	1	159.40	0	0
7	ZY1000101066	29.20	0	1	119.40	0	0
8	ZY1000101066	42.17	0	1	42.20	0	0

8.2.2 Import/export options

Export IQRdataGENERAL objects

An IQRdataGENERAL object can be exported with three different functions, export_IQRdataGENERAL(), exportNLME_IQRdataGENERAL(), or exportSYS_IQRdataGENERAL. The first will export the dataset without further modifications, the other apply estimation tool specific modifications (e.g., removal of whitespaces in strings) to be applicable as modeling dataset later on. All export function generate a .csv file and a .atr file storing the metadata and can additionally generate define files and .xpt files.

With all export functions a zipped file instead of the single files (.csv, .atr, …) can be generated by setting FLAGzip = TRUE.

export_IQRdataGENERAL(data=data2, filename="material/01-01-DataProgAnal/dataGEN", FLAGzip = TRUE)

The exportNLME_IQRdataGENERAL() and exportSYS_IQRdataGENERAL() provide the possibility to subset the data to specific dosing records and observation records (inputs doseNAMES and obsNAMES) and define columns as regressor variables while exporting the data. Regressor columns are ordered in the exported dataset as given in the input regressorNames which is crucial for matching regressor variables between data and model for some estimation tools.

exportNLME_IQRdataGENERAL(data=data2, filename="material/01-01-DataProgAnal/dataNLME", regressorNames = c("AST0", "FOOD"))

Load IQRdataGENERAL object

The load_IQRdataGENERAL() function is used to reload a dataset that was generated by the export_IQRdataGENERAL(), exportNLME_IQRdataGENERAL(), or exportSYS_IQRdataGENERAL function. In order for the loading to work the dataex.atr file needs to be present in the same folder as the .csv file. xpt files are not loaded. Also zip files can be reloaded.

data1reload <- load_IQRdataGENERAL("material/01-01-DataProgAnal/dataNLME01/data.csv")
data2reload <- load_IQRdataGENERAL("material/01-01-DataProgAnal/dataGeneral02.dat.zip")

8.2.3 Cleaning options

The cleaning functions clean_IQRdataGENERAL() is actually a wrapper function for various functions performing different steps during cleaning. They could be called individually. Please refer to the help for each function for detailed information.

Function	Description	Control in clean_IQRdataGENERAL	Logfile written
blloq _IQRdataGENERAL	Set the BLLOQ handling method	set by `methodBLLOQ`	No
setIGNORErecords _IQRdataGENERAL	Set user defined records to IGNORE	optional by setting `records`	Yes
rmMissingTIMEobsRecords _IQRdataGENERAL	Remove missing observation records with missing TIME	always applied	Yes
setMissingDVobsRecordsIGNORE _IQRdataGENERAL	Set missing observation records with missing DV to IGNORE	always applied	Yes
rmSubjects _IQRdataGENERAL	Remove user defined subjects	optional by setting `subjects`	Yes
rmNonTask _IQRdataGENERAL	Remove non-dose and non-observation records	always applied	Yes
rmPLACEBO _IQRdataGENERAL	Remove placebo subjects	optional by setting `FLAGrmPlacebo`	Yes
rmNOobsSUB _IQRdataGENERAL	Removal of subjects without observations	always applied	Yes
rmAMT0 _IQRdataGENERAL	Removal of dose records with AMT=0	always applied	Yes
rmIGNOREd _IQRdataGENERAL	Removal of ignored record (MDV=1)	optional by setting `FLAGrmIGNOREDrecords`	Yes
covImpute _IQRdataGENERAL	Imputation of missing covariates	optional by setting `continuousCovs` and `categoricalCovs`	Yes
rmDosePostLastObs _IQRdataGENERAL	Removal of doses post last observation	optional by setting `FLAGrmDosePostLastObs`	Yes

8.2.4 Data exploration

In the Example workflow discussed above some of the IQRtools functions to explore a dataset have been used to visualize and detect issues in the data and get an overview on the contained PK observations. In the following all available data exploration functions are introduced using the the cleaned dataset from the workflow as an example.

Summary tables

Summary tables can be generated for observations (summaryObservations_IQRdataGENERAL()) and for categorical or continuous covariates (summaryCat_IQRdataGENERAL() and summaryCov_IQRdataGENERAL). The tables can be stratified by a suitable dataset column (stratifyColumn with “STUDY” as default). Beside the actual content a table title and table footer are defined which can be modified by the user (tableTitle and footerAddText). They can be further customized by setting the number of digits values are rounded to (SIGNIF) and whether individuals should be termed as “subjects” or “patients” (FLAGpatients). If a filename is provided, the table is written to that file, otherwise a IQRoutputTable object is returned.

With this flexibility the summary tables are intended to be readily suitable to use in modeling reports in the data exploration section. The exported tables are prepared to be automatically imported to a report using IQReport (see Reporting in Microsoft Word).

Observations

For the observation summary table, numbers of subjects and numbers of observations are listed and subsetted for different criteria (e.g., number of observations below the limit of quantification). The input obsNames is available if only a subset of the contained observations should be summarized. In this example, there is only one observation, i.e., the “Plasma concentration IQ0815”. The first two rows give the numbers per study while the last row contains total counts in the entire dataset.

summaryObservations_IQRdataGENERAL(data1CleanM3)

##    Summary of available observations                                                                                                                                                                                 
##    ==================================================================================================================================================================================================================
## 
##    Data          | N subjects* | N samples | N BLOQ samples** | N BLOQ samples post first dose* | N missing observations | N missing time information | N total ignored observations | N samples included in analysis
##    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
##    FIH           | 32 / 32     | 384       | 117 (30.5%)      | 85 (22.1%)                      | 0 (0%)                 | 0 (0%)                     | 3 (0.781%)                   | 381 (99.2%)                   
##    FIH extension | 12 / 12     | 144       | 12 (8.33%)       | 0 (0%)                          | 0 (0%)                 | 0 (0%)                     | 0 (0%)                       | 144 (100%)                    
##    TOTAL         | 44 / 44     | 528       | 129 (24.4%)      | 85 (16.1%)                      | 0 (0%)                 | 0 (0%)                     | 3 (0.568%)                   | 525 (99.4%)                   
##    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## N: Number of
## * All subjects / subjects with at least one non missing (MDV==0) sample.
## ** These records are not excluded from the analysis but censored (M3 method).
##                                               
## 
## IQRoutputTable object

This summary table contains information about samples before and after first dosing regarding BLQ and zero/non-zero values which is mainly of interest for summarizing PK samples. These can be neglected setting FLAGpk = FALSE:

summaryObservations_IQRdataGENERAL(data1CleanM3, FLAGpk = FALSE)

##    Summary of available observations                                                                                                                                               
##    ================================================================================================================================================================================
## 
##    Data          | N subjects* | N samples | N BLOQ samples** | N missing observations | N missing time information | N total ignored observations | N samples included in analysis
##    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
##    FIH           | 32 / 32     | 384       | 117 (30.5%)      | 0 (0%)                 | 0 (0%)                     | 3 (0.781%)                   | 381 (99.2%)                   
##    FIH extension | 12 / 12     | 144       | 12 (8.33%)       | 0 (0%)                 | 0 (0%)                     | 0 (0%)                       | 144 (100%)                    
##    TOTAL         | 44 / 44     | 528       | 129 (24.4%)      | 0 (0%)                 | 0 (0%)                     | 3 (0.568%)                   | 525 (99.4%)                   
##    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## N: Number of
## * All subjects / subjects with at least one non missing (MDV==0) sample.
## ** These records are not excluded from the analysis but censored (M3 method).
##             
## 
## IQRoutputTable object

Covariates

Continuous covariates are summarized by the mean, standard deviation and range from minimum to maximum value while the number of individuals in therespective and the percent of total is given for categorical covariates. Summaries are given stratified defined in the stratification (e.g., stratifyColumns = "TRTNAME"; default: “STUDY”) and a total column can be required (FLAGtotal = TRUE).

summaryCov_IQRdataGENERAL(data1CleanM3, stratifyColumns = "TRTNAME")

##    Summary of demographic and baseline characteristics for continuous information                                                                                                                                                                                
##    ==============================================================================================================================================================================================================================================================
## 
##    Characteristic  | 1mg oral single dose [N=4] | 2mg oral single dose [N=5] | 5mg oral single dose [N=5] | 10mg oral single dose [N=6] | 20mg oral single dose [N=6] | 50mg oral single dose [N=6] | 100mg oral single dose [N=6] | 200mg oral single dose [N=6]
##    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
##    Bodyweight (kg) | 76 (8.49) [68-86]          | 80.2 (8.04) [68-88]        | 73.2 (10.1) [65-90]        | 81 (5.1) [75-88]            | 77 (6.2) [68-87]            | 81.3 (7.79) [74-94]         | 85.5 (5.58) [78-92]          | 77.2 (7.91) [67-91]         
##    Age (Years)     | 29.2 (4.19) [25-35]        | 29.4 (1.67) [28-32]        | 29.6 (5.32) [24-36]        | 26.3 (5.32) [18-34]         | 28.8 (3.54) [23-32]         | 30.3 (4.8) [23-35]          | 28.8 (5.78) [21-34]          | 30.3 (5.24) [22-38]         
##    Height (cm)     | 178 (7.77) [166-183]       | 177 (2.74) [174-181]       | 175 (2.65) [171-178]       | 179 (3.5) [175-183]         | 180 (6.31) [168-186]        | 179 (3.22) [174-182]        | 181 (4.1) [178-188]          | 176 (3.78) [172-183]        
##    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## N: Number of subjects
## Entries represent: Mean (Standard deviation) [Minimum-Maximum]                                                                                                                                                                            
## 
## IQRoutputTable object

summaryCat_IQRdataGENERAL(data1CleanM3, FLAGtotal = TRUE, catNames = c("STUDYN", "TRT"))

##    Summary of demographic and baseline characteristics for categorical information           
##    ==========================================================================================
## 
##    Characteristic | Category               | FIH [N=32] | FIH extension [N=12] | TOTAL [N=44]
##    ------------------------------------------------------------------------------------------
##    Study          | FIH                    | 32 (100%)  | 0 (0%)               | 32 (72.7%)  
##                   | FIH extension          | 0 (0%)     | 12 (100%)            | 12 (27.3%)  
##    TRTNAME        | Placebo                | 0 (0%)     | 0 (0%)               | 0 (0%)      
##                   | 1mg oral single dose   | 4 (12.5%)  | 0 (0%)               | 4 (9.09%)   
##                   | 2mg oral single dose   | 5 (15.6%)  | 0 (0%)               | 5 (11.4%)   
##                   | 5mg oral single dose   | 5 (15.6%)  | 0 (0%)               | 5 (11.4%)   
##                   | 10mg oral single dose  | 6 (18.8%)  | 0 (0%)               | 6 (13.6%)   
##                   | 20mg oral single dose  | 6 (18.8%)  | 0 (0%)               | 6 (13.6%)   
##                   | 50mg oral single dose  | 6 (18.8%)  | 0 (0%)               | 6 (13.6%)   
##                   | 100mg oral single dose | 0 (0%)     | 6 (50%)              | 6 (13.6%)   
##                   | 200mg oral single dose | 0 (0%)     | 6 (50%)              | 6 (13.6%)   
##    ------------------------------------------------------------------------------------------
## N: Number of subjects
## Number of subjects in each category and percentage within this category
## 
## IQRoutputTable object

Standard graphs

Details on individuals

One example for an indidual detail plot was already shown in the data preparation workflow above. This function creates a pdf with one page per individual and/or a list of graphs (one list element per individual.) It displays the observations along with the dose administrations and gives information on subject ID and treatment groups. Each data point is labeled with the IXGDF number. The data can be inspected in detail to check correctness of the data and problematic data points spotted and identified easily.

Please refer to the help of this function to look up possible customization (e.g., log-scale, selection of observations to include).

plotIndiv_IQRdataGENERAL(data1CleanM3, filename = "material/01-01-DataProgAnal/IndivPlotsClean01.pdf")

Dosing

The dosing schedule per individual can be inspected using the plotDoseSchedule_IQRdataGENERAL() function. The individual panels are distributed over multiple pages/graphs accoding to the number of individuals to be plot on one page (NperPage, defaults to 25).

plotDoseSchedule_IQRdataGENERAL(data1CleanM3, filename = "material/01-01-DataProgAnal/Dosing01.pdf")

Observations

The sampling schedule per individual can be inspected using the plotSampleSchedule_IQRdataGENERAL() function. The individual panels are distributed over multiple pages/graphs accoding to the number of individuals to be plot on one page (here set by NperPage = 6).

plotSampleSchedule_IQRdataGENERAL(data1CleanM3, filename = "material/01-01-DataProgAnal/Sampling01.pdf", NperPage = 6)

The actual observations can be visualized with plotRange_IQRdataGENERAL() and plotSpaghetti_IQRdataGENERAL(). In both cases, the data will be stratified to panels per treatment group. The first function provides the median and 90% range from the 5^th to the 95^th percentile of the data per nomnal time point. The second function will plot the data points at the actual times connected by lines per subjects. Besides changing the scale (scale = “log” or “lin”, not shown here), the plots can be subsetted by a stratification column and the non-stratified as well as stratified graphs are provided.

# Median and 90% interval per nominal time point
out <- plotRange_IQRdataGENERAL(data1CleanM3)
out$unstratified$`Plasma concentration IQ0815`

## Warning: No shared levels found between `names(values)` of the manual scale and the
## data's colour values.

# Lines per idividual
out <- plotSpaghetti_IQRdataGENERAL(data1CleanM3, stratify = "AGE0")

# Unstratified
out$unstratified$`Plasma concentration IQ0815`

# Stratified
out$stratified$`Plasma concentration IQ0815.AGE0CAT::1`

out$stratified$`Plasma concentration IQ0815.AGE0CAT::2`

Covariates

The function plotCovDistribution_IQRdataGENERAL() visualizes the distribution of continuous and covariates, thus being a graphical counterpart to the summaryCov_IQRdataGENERAL() and summaryCat_IQRdataGENERAL() functions. Graphically we can explore also the correlations among continuous covariates (plotCorCov_IQRdataGENERAL()), among categorical covariates (plotCorCat_IQRdataGENERAL()), and between continuous and categorical covariates (plotCorCovCat_IQRdataGENERAL()). In our example, we make use of the input arguments covNames/catNames to neglect the SEX covariate in the plots as it contains only one category (see summary table above).

# Distribution of continuous and categorical covariates
out <- plotCovDistribution_IQRdataGENERAL(data1CleanM3, covNames = c("STUDYN", "TRT"))
out$categorical

# Correlation of continuous covariates
plotCorCov_IQRdataGENERAL(data1CleanM3,covNames = c("AGE0","WT0","HT0"))

# Correlation of categorical covariates
plotCorCat_IQRdataGENERAL(data1CleanM3, catNames = c("STUDYN", "TRT"))

# Correlation of continuous and categorical covariates
plotCorCovCat_IQRdataGENERAL(data1CleanM3, catNames = c("STUDYN", "TRT"))