# Imports a CSV dataset as an IQRdataGENERAL object

The general format of an IQRdataGeneral dataset is documented in the first table below. Some columns are required, others are optional and do not need to be present in the CSV file. The table below defines default values for optional columns. These optional columns will be added to the IQRdataGENERAL object with the default settings. For more information, please see the Details section and visit https://iqrtools.intiquan.com/doc/book/analysis-dataset-preparation.html for the most recent version. The version valid for the installed version of IQR Tools is available from the function call doc_IQRtools().

IQRdataGENERAL(
input,
doseNAMES = NULL,
obsNAMES = NULL,
cov0 = NULL,
covT = NULL,
cat0 = NULL,
catT = NULL,
methodBLLOQ = "M1",
FLAGforceOverwriteNLMEcols = TRUE,
FLAGnoNAlocf = FALSE
)

## Arguments

input

Path to an datafile or data.frame

doseNAMES

Character string or vector with character strings, defining the names of the events that are to be considered as dose events. These names need to match the entries in the NAME column. If doseNAMES is not provided, doses are identified by ROUTE being not NA.

obsNAMES

Character string or vector with character strings, defining the names of the events that are to be considered as observations events. These names need to match the entries in the NAME column. Adverse event NAMEs cannot be selected. If obsNAMES is not provided all records that are not doses will be considered as observations.

cov0

(Handling covariates that are stored as events in the long format) List, defining the TIME INDEPENDENT CONTINUOUS covariates. Entries in list need to be named by a name that defines the name of the covariate column to create and the value needs to be a character string with the NAME of the event to consider as this covariate. The rule for the definition of these covariates is: If baseline assessments are defined in the BASE column then take the mean. If no baseline defined then use the mean of SCREEN observations. If no SCREEN observations use the mean of all pre-first dose values (TIME<=0). If no pre-first dose values set to NA. Adverse event NAMEs cannot be selected.

covT

(Handling covariates that are stored as events in the long format) List, defining the TIME DEPENDENT CONTINUOUS covariates. Entries in list need to be named by a name that defines the name of the covariate column to create and the value needs to be a character string with the NAME of the event to consider as this covariate. Carry-backward from first observation is used until first observation. Then carry-forward is used. Adverse event NAMEs cannot be selected.

cat0

(Handling covariates that are stored as events in the long format) Same as cov0 but for TIME INDEPENDENT CATEGORICAL covariates

catT

(Handling covariates that are stored as events in the long format) Same as covT but for TIME DEPENDENT CATEGORICAL covariates

(Handling covariates that are stored in additional columns in the dataset) List (no data.frame) with information about additional continuous covariates. The list needs to contain 4 named elements: COLNAME, NAME, UNIT, TIME.VARYING (in this order). Each value of these elements is a vector. COLNAME defines the column names of the additional covariates, NAME the "real" name, UNIT their UNIT, and TIME.VARYING is TRUE or FALSE depending ... Example:

covInfoAdd <- list(COLNAME=c("WT0","AGE0"),NAME=c("Bodyweight at baseline","Age"), UNIT=c("kg","years"),TIME.VARYING=c(FALSE,FALSE))

(Handling covariates that are stored additional columns in the dataset) List (no data.frame) with information about additional categorical covariates. The list needs to contain 6 named elements: COLNAME, NAME, UNIT, VALUETXT, VALUES, TIME.VARYING (in this order). Each value of these elements is a vector. COLNAME defines the column names of the additional covariates, NAME the "real" name, and UNIT their UNIT. VALUETXT is a string defining a vector in R notation, containing the text version of the different levels. VALUES is a string defining a numeric vector in notation, containing the numeric version of the different levels (used in the corresponding covariate column), and TIME.VARYING is TRUE or FALSE depending ... Example:

catInfoAdd <- list( COLNAME=c("SEX","FOOD"),NAME=c("Gender","Food taken"), UNIT=c("-","-"),VALUETXT=c("Male,Female", "NO,YES"), VALUES=c("1,2","0,1"),TIME.VARYING=c(FALSE,FALSE))

methodBLLOQ

Allows specification of the method for handling values below the lower limit of quantification. By default the "M1" method is used. Alternative settings are "M3", "M4", "M5", "M6", and "M7".

FLAGforceOverwriteNLMEcols

TRUE: if NLME columns (see second table above) are present in the dataset they will be overwritten with default values. FALSE: they will not be overwritten and kept as they are - user has to ensure that their contents do make sense!

For test function purpose only! Please do not change this setting - unless you know EXACTLY what you do!

FLAGnoNAlocf

If FALSE (default) the time dependent covariates will be imputed by a last observation carried forward (LOCF) approach for NA values. If TRUE, then no LOCF imputation will be done and values in the covariate column will be NA.

## Value

An IQRdataGENERAL object

## Details

During import, the following is done:

• Sanity checks are made on the dataset and if needed, the user is notified by errors, warnings, and normal messages

• Modeling tools related columns are added (see second table below) - if not yet present in the data

• Covariate columns might be added (see input arguments cov0, covT, cat0, catT)

• Observations below the LLOQ are handled based on the setting of the input argument methodBLLOQ.

• Records that are neiter dose nor observation records are removed (depending on setting of input argument FLAGtaskEventsOnly)

The resulting output argument is of class IQRdataGENERAL.

• It would be good practice to avoid the use of commata in any text entries, as this might interfer with subsequent handling of files where entries are separated by commata.

• The following elements will be interpreted as NA: "."," ","","NA","NaN"

• The loaded dataset will be sorted by (STUDY), USUBJID, TIME, (TYPENAME), NAME. Note that this might lead to non-sequential entries in the IXGDF column. The sorting by STUDY and/or TYPENAME is ignored if these columns are not present in the data.

• Columns in the imported dataset that do not match column names in the following two tables will be retained in the output argument

• In order to identify doses and observations, the user needs to provide the function with additional arguments that identify the NAMEs of doses and observations.

• Adverse events (by NAME) can not be used in the generation of the DV and covariate column information. This might be added in a future version.

• STUDYN and TRT will be added automatically as covariate if information available in dataset.

• Changes in the first data format update since 5 years (Version >=1.3.0):

• During import of a general dataset the non-required columns are not generated with default content anymore. This leads to considerable smaller and more pleasant datasets.

• Columns that were essentially never used have been deprecated by removing them from the first table below. The deprecated columns are listed in the third table below. They still are supported though to allow backward compatibility.

