Imports a CSV dataset as an IQRdataGENERAL object

The general format of an IQRdataGeneral dataset is documented in the first table below. Some columns are required, others are optional and do not need to be present in the CSV file. The table below defines default values for optional columns. These optional columns will be added to the IQRdataGENERAL object with the default settings. For more information, please see the Details section and visit https://iqrtools.intiquan.com/doc/book/analysis-dataset-preparation.html for the most recent version. The version valid for the installed version of IQR Tools is available from the function call doc_IQRtools().

IQRdataGENERAL(
  input,
  doseNAMES = NULL,
  obsNAMES = NULL,
  cov0 = NULL,
  covT = NULL,
  cat0 = NULL,
  catT = NULL,
  covInfoAdd = NULL,
  catInfoAdd = NULL,
  methodBLLOQ = "M1",
  FLAGforceOverwriteNLMEcols = TRUE,
  FLAGtaskEventsOnly = TRUE,
  FLAGnoNAlocf = FALSE
)

Arguments

input

Path to an datafile or data.frame

doseNAMES

Character string or vector with character strings, defining the names of the events that are to be considered as dose events. These names need to match the entries in the NAME column. If doseNAMES is not provided, doses are identified by ROUTE being not NA.

obsNAMES

Character string or vector with character strings, defining the names of the events that are to be considered as observations events. These names need to match the entries in the NAME column. Adverse event NAMEs cannot be selected. If obsNAMES is not provided all records that are not doses will be considered as observations.

cov0

(Handling covariates that are stored as events in the long format) List, defining the TIME INDEPENDENT CONTINUOUS covariates. Entries in list need to be named by a name that defines the name of the covariate column to create and the value needs to be a character string with the NAME of the event to consider as this covariate. The rule for the definition of these covariates is: If baseline assessments are defined in the BASE column then take the mean. If no baseline defined then use the mean of SCREEN observations. If no SCREEN observations use the mean of all pre-first dose values (TIME<=0). If no pre-first dose values set to NA. Adverse event NAMEs cannot be selected.

covT

(Handling covariates that are stored as events in the long format) List, defining the TIME DEPENDENT CONTINUOUS covariates. Entries in list need to be named by a name that defines the name of the covariate column to create and the value needs to be a character string with the NAME of the event to consider as this covariate. Carry-backward from first observation is used until first observation. Then carry-forward is used. Adverse event NAMEs cannot be selected.

cat0

(Handling covariates that are stored as events in the long format) Same as cov0 but for TIME INDEPENDENT CATEGORICAL covariates

catT

(Handling covariates that are stored as events in the long format) Same as covT but for TIME DEPENDENT CATEGORICAL covariates

covInfoAdd

(Handling covariates that are stored in additional columns in the dataset) List (no data.frame) with information about additional continuous covariates. The list needs to contain 4 named elements: COLNAME, NAME, UNIT, TIME.VARYING (in this order). Each value of these elements is a vector. COLNAME defines the column names of the additional covariates, NAME the "real" name, UNIT their UNIT, and TIME.VARYING is TRUE or FALSE depending ... Example:

covInfoAdd <- list(COLNAME=c("WT0","AGE0"),NAME=c("Bodyweight at baseline","Age"), UNIT=c("kg","years"),TIME.VARYING=c(FALSE,FALSE))

catInfoAdd

(Handling covariates that are stored additional columns in the dataset) List (no data.frame) with information about additional categorical covariates. The list needs to contain 6 named elements: COLNAME, NAME, UNIT, VALUETXT, VALUES, TIME.VARYING (in this order). Each value of these elements is a vector. COLNAME defines the column names of the additional covariates, NAME the "real" name, and UNIT their UNIT. VALUETXT is a string defining a vector in R notation, containing the text version of the different levels. VALUES is a string defining a numeric vector in notation, containing the numeric version of the different levels (used in the corresponding covariate column), and TIME.VARYING is TRUE or FALSE depending ... Example:

catInfoAdd <- list( COLNAME=c("SEX","FOOD"),NAME=c("Gender","Food taken"), UNIT=c("-","-"),VALUETXT=c("Male,Female", "NO,YES"), VALUES=c("1,2","0,1"),TIME.VARYING=c(FALSE,FALSE))

methodBLLOQ

Allows specification of the method for handling values below the lower limit of quantification. By default the "M1" method is used. Alternative settings are "M3", "M4", "M5", "M6", and "M7".

FLAGforceOverwriteNLMEcols

TRUE: if NLME columns (see second table above) are present in the dataset they will be overwritten with default values. FALSE: they will not be overwritten and kept as they are - user has to ensure that their contents do make sense!

FLAGtaskEventsOnly

For test function purpose only! Please do not change this setting - unless you know EXACTLY what you do!

FLAGnoNAlocf

If FALSE (default) the time dependent covariates will be imputed by a last observation carried forward (LOCF) approach for NA values. If TRUE, then no LOCF imputation will be done and values in the covariate column will be NA.

Value

An IQRdataGENERAL object

Details

During import, the following is done:

  • Sanity checks are made on the dataset and if needed, the user is notified by errors, warnings, and normal messages

  • Modeling tools related columns are added (see second table below) - if not yet present in the data

  • Covariate columns might be added (see input arguments cov0, covT, cat0, catT)

  • Observations below the LLOQ are handled based on the setting of the input argument methodBLLOQ.

  • Records that are neiter dose nor observation records are removed (depending on setting of input argument FLAGtaskEventsOnly)

The resulting output argument is of class IQRdataGENERAL.

Some additional information:

  • It would be good practice to avoid the use of commata in any text entries, as this might interfer with subsequent handling of files where entries are separated by commata.

  • The following elements will be interpreted as NA: "."," ","","NA","NaN"

  • The loaded dataset will be sorted by (STUDY), USUBJID, TIME, (TYPENAME), NAME. Note that this might lead to non-sequential entries in the IXGDF column. The sorting by STUDY and/or TYPENAME is ignored if these columns are not present in the data.

  • Columns in the imported dataset that do not match column names in the following two tables will be retained in the output argument

  • In order to identify doses and observations, the user needs to provide the function with additional arguments that identify the NAMEs of doses and observations.

  • Adverse events (by NAME) can not be used in the generation of the DV and covariate column information. This might be added in a future version.

  • STUDYN and TRT will be added automatically as covariate if information available in dataset.

  • Changes in the first data format update since 5 years (Version >=1.3.0):

    • During import of a general dataset the non-required columns are not generated with default content anymore. This leads to considerable smaller and more pleasant datasets.

    • Columns that were essentially never used have been deprecated by removing them from the first table below. The deprecated columns are listed in the third table below. They still are supported though to allow backward compatibility.

COLUMN REQUIRED DEFAULT DESCRIPTION
======== ======== =============================== ===============================================================
IXGDF - 1:nrows (NUMERIC) Index of record in dataset. Starting from 1, then 2,3,... until last record/row number
IGNORE - NA Reason/comment related to exclusion of the observation or dose from the analysis. If no entry then event is not ignored
USUBJID YES - Unique subject identifier
INDNAME - Not included in dataset Indication name
IND - 1-N for in alphabetic order. Not present if missing (NUMERIC) Numeric indication flag
COMPOUND - Not included in dataset Name of the investigational compound
STUDY - Not included in dataset Short study name/number
STUDYN - 1-N for in alphabetic order. Not present if missing (NUMERIC) Numeric study flag
TRTNAME - Not included in dataset Name of actual treatment given to subject
TRT - 1-N for in alphabetic order. Not present if missing (NUMERIC) Numeric treatment flag
VISIT - Not included in dataset Visit number
VISNAME - Not included in dataset Visit name
BASE - 0 Flag indicating assessments at baseline (0 for non-baseline, 1 for first, 2 for second, ...)
SCREEN - 0 Flag indicating assessments at screening (0 for non-screening, 1 for first, 2 for second, ...)
TIME YES d - (NUMERIC) Actual time of event relative to first dose administration.
DURATION - 0 (NUMERIC) Duration of event (-1 if ongoing longer than observation time).
TIMEUNIT YES - Unit of all numeric time definitions in the dataset ("HOURS", "MINUTES", "DAYS", "SECONDS", "WEEKS", "MONTHS", "YEARS"
NT - Not included in dataset (NUMERIC) Nominal event time
PROFNR - Not included in dataset (CHARACTER) Name or number of the profile
PROFTIME - Not included in dataset (NUMERIC) Profile time - relative to previously given dose
TYPENAME - Not included in dataset Unique name of type of event.
NAME YES - Unique short name of event
VALUE YES - (NUMERIC) Value of event defined by NAME. Not used for Adverse Event records (should be set to 0)
VALUETXT - NA Text version of value (Instead of a VALUE, a VALUETXT can be entered. VALUE can have also an entry but then for the same VALUE the same VALUETXT has to be used for a specific event NAME. If VALUETXT is defined, VALUE can be undefined (NA). VALUETXT makes only sense for categorical information
UNIT YES - Unit of the value reported in the VALUE column
OCC - Not included in dataset (NUMERIC) Integer values defining separate occasions for IOV (1-N)
ULOQ - Not included in dataset (NUMERIC) Upper limit of quantification for event defined by NAME (value only interpreted for observation events)
LLOQ - NA (NUMERIC) Lower limit of quantification for event defined by NAME (value only interpreted for observation events)
ROUTE YES - Route of administration ("IV","SUBCUT","ORAL","INHALED","INTRAMUSCULAR","INTRAARTICULAR","RECTAL","TOPICAL","GENERAL_IV","GENERAL_ABS1","GENERAL_ABS0"). If not a dosing record set to: NA
II - 0 (NUMERIC) Interval of dosing (value only interpreted for dosing events)
ADDL - 0 (NUMERIC) Number of ADDITIONAL doses given with the specified interval (value only interpreted for observation events)
AE - Not included in dataset (NUMERIC) Defines if the record codes an adverse event (0: no, 1: yes)
AEGRADE - NA (if present) (NUMERIC) Grade of adverse event
AESER - NA (if present) (NUMERIC) Flag (0 or 1) Seriousness of adverse event
AEDRGREL - NA (if present) (NUMERIC) Flag (0 or 1) Drug related adverse event or not
COMMENT - Not included in dataset Additional information for the observation/event

During import the following columns will be added to the dataset. If these columns are already present in the input dataset they can be kept or overwritten (set by the input argument FLAGforceOverwriteNLMEcols).

COLUMN DESCRIPTION ======== ===============================================================
ID (NUMERIC) Unique subject ID for modeling software TIMEPOS (NUMERIC) TIME shifted to have TIMEPOS=0 at first event in a subject
TAD (NUMERIC) Time since last dose (pre-first-dose values same as TIME). This columns does not make a difference between different dose names. It contains the time since last dose, independently of the DOSENAME. If no dose defined in subject it is NA. TAD before the first dose is TIME DV (NUMERIC) Observation value (0 for dosing events). Set to LLOQ if BLOQ handling method is M3 or M4. Set to LLOQ/2 is M5 or M6. Set to 0 if M7. If VALUE undefined but VALUETXT defined, DV will be determined as 1:N for alphabetic ordering of VALUETXT.
MDV (NUMERIC) Missing data value columns (0 if observation value is defined and IGNORE is NA, 1 for dose records and for NA observation values, 1 for all records that do have IGNORE not NA). MDV=1 for values below LLOQ if M1 method. If M6 method then MDV=0 for first DV<LLOQ in a sequence and MDV=1 for the following in a sequence. If Value of an observation or time information is missing, MDV will be set to 1 as well and an entry in IGNORE will be made. EVID (NUMERIC) Event ID. 0 for observations, 1 for dosing records
CENS (NUMERIC) Censoring column. Depending on the method for BLLOQ handling this column is set. If M3 or M4 method is chosen then CENS=1 if DV<LLOQ. If M1, M5, M6, or M7 then CENS=0. 0 for all dosing events. AMT (NUMERIC) Dose given at dosing instant (0 for observation records)
ADM (NUMERIC) Administration column. 0 for observation events. Number of input for dosing events. Usually defined by the user. Default values if not user defined: If more than one dose is considered, the order of the defined dose NAMEs defines the ADM number (1 for first, 2 for second, ...) If a single dose is considered, then ADM is selected according to the information in ROUTE: 1 for: SUBCUT, ORAL, INTRAMUSCULAR, INTRAARTICULAR, RECTAL, INHALED, GENERAL_ABS1 - 2 for: IV, GENERAL_IV - 3 for: TOPICAL, GENERAL_ABS0 TINF (NUMERIC) Infusion time (TIMEUNIT). (0 for observation records, DURATION for dose records)
RATE (NUMERIC) Calculated from AMT and TINF YTYPE (NUMERIC) Observation number. 0 for dosing records. 1,2,3,4, ... for observation records. If observations provided in obsNAMES then this order will be used. Non-doses that are not defined in obsNAMES will obtain YTYPE=0
DOSE (NUMERIC) Carry forward of the last defined AMT of a dose event. Values before first dose get the DOSE set to 0. If no dose present in subject DOSE is set to 0. This column does not make a difference between different dose names. It contains the AMT since last dose, independently of the selected doses in doseNAMES TADDx (NUMERIC) Dosing input specific TAD column. Only present if more than one dosing input defined in "doseNAMES". "x" defines the index of the dose NAME in doseNAMES. If a dose NAME does not appear in a subject the value is set to NA.

The following columns are deprecated. Their usefulness was going towards 0 and no code and functionality relied on them. The general dataset can still include them to ensure backward compatibility.

COLUMN REQUIRED DEFAULT DESCRIPTION
======== ======== =============================== ===============================================================
SUBJECT - Not included in dataset Subject number
CENTER - Not included in dataset Center number
STUDYDES - Not included in dataset Study title, short description
PART - Not included in dataset Part of study as defined per protocol (1=part 1, A=part A, ...)
EXTENS - Not included in dataset Extension of the core study (0=core, 1=extension 1, 2=extension 2, ...)
TRTNAMER - Not included in dataset Name of treatment to which subject was randomized
TRTR - Not included in dataset (NUMERIC) Numeric randomized treatment flag
DATEDAY - Not included in dataset Start date of event (dd-mmm-yyyy)
DATETIME - Not included in dataset Start time of event (HH:MM:SS)

See also

Other IQRdataGeneral: +.IQRdataGENERAL(), IQRcalcTAD(), IQRexpandADDLII(), IQRloadCSVdata(), IQRsaveCSVdata(), addIndivRegressors_IQRdataGENERAL(), addLabel_IQRdataGENERAL(), attributes0(), blloqInfo_IQRdataGENERAL(), blloq_IQRdataGENERAL(), check_IQRdataGENERAL(), clean_IQRdataGENERAL(), combine_IQRdataGENERAL(), convertCat2Text(), covImpute_IQRdataGENERAL(), date2dateday_IQRdataProgramming(), date2datetime_IQRdataProgramming(), date2time_IQRdataProgramming(), exportDEFINE_IQRaedataER(), exportDEFINE_IQRdataGENERAL(), exportDEFINEpdf_IQRdataGENERAL(), exportSYS_IQRdataGENERAL(), export_IQRdataGENERAL(), getLabels_dataframe(), getNAcolNLME_IQRdataGENERAL(), handleSameTimeObs_IQRdataGENERAL(), is_IQRdataGENERAL(), loadATRinfo_csvData(), loadAttributeFile(), load_IQRdataGENERAL(), mapCategoricalCovariate_IQRnlmeProject(), mapCategoricalCovariate_csvData(), mapContinuousCovariate_IQRnlmeProject(), mapContinuousCovariate_csvData(), mutateCov_IQRdataGENERAL(), obfuscate_IQRdataGENERAL(), plot.IQRdataGENERAL(), plotCorCat_IQRdataGENERAL(), plotCorCovCat_IQRdataGENERAL(), plotCorCov_IQRdataGENERAL(), plotCovDistribution_IQRdataGENERAL(), plotDoseSchedule_IQRdataGENERAL(), plotIndiv_IQRdataGENERAL(), plotRange_IQRdataGENERAL(), plotSampleSchedule_IQRdataGENERAL(), plotSpaghetti_IQRdataGENERAL(), print.IQRdataGENERAL(), removeCommata_dataframe(), rmAMT0_IQRdataGENERAL(), rmDosePostLastObs_IQRdataGENERAL(), rmIGNOREd_IQRdataGENERAL(), rmMissingTIMEobsRecords_IQRdataGENERAL(), rmNOobsSUB_IQRdataGENERAL(), rmNonTask_IQRdataGENERAL(), rmPLACEBO_IQRdataGENERAL(), rmSubjects_IQRdataGENERAL(), setIGNORErecords_IQRdataGENERAL(), setMissingDVobsRecordsIGNORE_IQRdataGENERAL(), subset.IQRdataGENERAL(), summary.IQRdataGENERAL(), summaryCat_IQRdataGENERAL(), summaryCov_IQRdataGENERAL(), summaryObservations_IQRdataGENERAL(), transformObs_IQRdataGENERAL(), unlabel_dataframe()