19 General Dataset Format

In the context of model-based drug development often many different analyses, or pharmacometric activities, need to be performed within the same project. This includes, but is not limited to, graphical exploration, dose-concentration and concentration–response modeling, where many different biomarkers or endpoints might need to be considered.

Requesting a separate modeling dataset for each modeling activity considerably strains the resources of the modeler who prepares these datasets or of the supporting programming group, especially in the case when a certain level of validation is required. The approach used by IQRtools is to use a general dataset format for all pharmacometric analyses.

Requirements for development of this dataset format have been a well-defined structure that is project independent (same across projects, compounds, and indications), independent of the modeling activity to be performed, and independent of the modeling tool to be used.

Additionally, the format had to have a minimum level of redundancy and include information that is typically not found within traditional pharmacometric analysis datasets:

  • names for variables
  • units
  • additional anotation
  • etc.

This renders the dataset immediately understandable without additional documentation, reduces mistakes, and facilitates project hand-over processes.

The IQR Tools have been enabled to work with this format; however, can also work with any traditional NONMEM or MONOLIX data format – directly, or by conversion of traditional data formats into the general dataset format.

Note, as of Version 1.3.0 of IQR Tools the data format has undergone an improvement. Datasets used in pre 1.3.0 can still be used.

19.1 General columns

The following table contains the “general” columns. “General” means that these columns are not modeling tools related (exception: ADDL and II).

Modeling tool related columns will be added by IQRtools upon import of the dataset (IQRdataGENERAL()) but can also be present in the dataset file before import. These additional columns are documented in the next section.

  • REQUIRED columns for IQR Tools are marked in red.
  • Columns marked in blue are created with default values if not existing in the imported dataset.
  • Columns marked in italic created are created (if not already existing) if referenced columns (in “Default Content”) exist.
  • Since datasets for pharmacometric analyses typically are comma-separated-files, no field in the general dataset is allowed to contain commata.
Column Name Column Content Default Content Description Type
General
IXGDF Index of record in dataset. Starting from 1, then 2, 3, etc. until last record/row number 1,2,3, etc. This index is kept during post processing of the general dataset format. When rows/records are removed, the kept ones will keep the index in IXGDF. In this way contents of derived datasets can always be compared to contents of the general dataset format. Numeric
IGNORE Reason/comment related to exclusion of the observation or dose from the analysis NA String
Identification of subject
USUBJID Unique subject identifier String
INDNAME Indication name Populated from the protocol or the provided specifications. String
IND Numeric indication flag 1-N for INDNAME in alphabetic order Numeric indication flag (unique for each entry in INDNAME) Numeric
Study information
COMPOUND Name of the investigational compound Populated from the source data or protocol or provided specifications. String
STUDY Short study name/number Populated from the source data. String
STUDYN Numeric study flag 1-N for STUDY in alphabetic order Numeric study flag (unique for each entry in STUDY) Numeric
Treatment group information
TRTNAME Name of actual treatment given to subject Populated from the source data, protocol or the provided specifications, and can be recoded for harmonization across trials, if needed. String
TRT Numeric treatment flag 1-N for TRTNAME in alphabetic order Numeric treatment flag (unique for each entry in TRTNAME) Numeric
Visit information
VISIT Visit number Populated from the source data.
When visit information not available in source data, it could be imputed from:
- Other assessments happening at the same date,
- PK log (in the case of PK with a sample number).
Field can only be left empty in special cases when nominal visits do not make sense. Examples:
-          adverse events
-          comedication
Numeric
VISNAME Visit name Populated from the source data. String
BASE Flag indicating assessments at baseline 0 Derived based on VISIT and VISNAME:
=0 for non-baseline visit,
=1 for first baseline visit,
=2 for second baseline visit,
etc.
Numeric
SCREEN Flag indicating assessments at screening 0 Derived based on VISIT and VISNAME:
=0 for non-screening visit,
=1 for first screening visit,
=2 for second screening visit,
etc.
Numeric
Event time information
TIMEUNIT Unit of all Numeric time definitions in the dataset Populated from the provided specification. Can be:
“HOURS”, “MINUTES”, “DAYS”, “SECONDS”, “WEEKS”, “MONTHS”, “YEARS”
String
DURATION Duration of event 0 Duration of event in same time units as TIMEUNIT.
Derived from start date/time and end date/time of an event.
Populated from source data and/or may need to be imputed when information is partial or missing.
If end time is same as start time (e.g. bolus administration, 0 is a perfectly acceptable value for DURATION).
If event continues post End of Study, set to -1.
Numeric
NT Nominal event time Planned time of event. Based on protocol, in the time unit defined in TIMEUNIT column. For repeated visits (e.g., 1.001, 2.001), use defined protocol time for the originally scheduled visit. For “End of study” visit, use defined nominal time for end of study – even if subject dropped out and completed End of study visit earlier.
Field cannot be left empty for per protocol planned time dependent assessments.
Field can be left empty, e.g., for adverse events.
Numeric
STDTC Start date and time of event Format: YYYY-MM-DDTHH:MM. Annotation only. String
ENDTC End date and time of event Format: YYYY-MM-DDTHH:MM. Annotation only. String
TIME Actual time of event relative to first dose administration In the time unit defined in TIMEUNIT column.
Derived as the difference between DATEDAY/DATETIME of the event and the DATEDAY/DATETIME of the first administration of selected primary study drug.
Field can be left empty if TIME unknown.
Numeric
PROFNR Profile number This column serves to define the number of the profile – which typically could be the day for a PK profile. It is information that would be used for the conduct of an NCA analysis. If provided, also PROFTIME needs to be defined. Numeric
PROFTIME Profile time This column serves to define the actual or nominal time points for a PK profile. Where times are defined from the dose given previous to this profile (so starting at 0 post dose). This information, together with PROFNR column information is used to identify a particular profile for NCA. If provided, also PROFNR needs to be defined. Numeric
Event value information
TYPENAME Unique type of event Populated from the specification.
For example: “Efficacy Readouts”, “Lab Values”, “Adverse Events”, etc.
String
NAME Unique short name of event Populated from the specification.
For example: “Plasma concentration Compound X”, “Dose Compound X”, etc.
String
VALUE Value of event defined by NAME Applicable to Numeric readouts.
Populated from the source data in the units defined in the UNIT column. Examples for the values: the given dose, the observed PK concentration, or the value of other readouts.
Special cases:
- For adverse events it is not used and should be set to 0
- For concomitant medications: the dose in pre-specified unit and frequency (e.g., mg/day)
- For BLOQ records: any value can be entered that is lower than the actual LLOQ. It is not acceptable to set this value to “NaN” or “NA” since then no discrimination can be made between “missing” and “*LLOQ”. For PK records on untransformed data “0” is suggested. On log transformed data 0 should not be used but log(LLOQ/2) would be acceptable.
Numeric
VALUETXT Text version of value NA Applicable to categorical readouts.
Populated from the source data.
Examples: “Male”, “Female”, “Asian”, etc.
Can only be left empty if VALUE is not missing.
String
UNIT Unit of the value reported in the VALUE column Populated/converted from the source data into a reference unit. For same event the same unit has to be used across the dataset. For Labs, Vitals, etc., SI unit is the default. For dimensionless readouts use “NONE”. String
ULOQ Upper limit of quantification for event defined by NAME Populated from source data /protocol for PK/LAB assessment only if it is applicable. Units as in UNIT column.
Field can be left empty if ULOQ unknown.
Numeric
LLOQ Lower limit of quantification for event defined by NAME NA Populated from source data /protocol for PK/LAB assessment only if it is applicable. Units as in UNIT column.
Field can be left empty if LLOQ unknown.
Numeric
Dose event additional information
ROUTE Route of administration Populated from the source data and/or protocol.
Field is to be left empty for observation events.
Field cannot be left empty for dosing events.
For dosing events the value should be one of the following:
(“IV”,“SUBCUT”,“ORAL”,“INHALED”,“INTRAMUSCULAR”,“INTRAARTICULAR”,“RECTAL”,“TOPICAL”,“GENERAL_IV”,“GENERAL_ABS1”,“GENERAL_ABS0”)
String
II Interval of additional dosing 0 Interval of dosing, if single row should define multiple dosings.
Allows for coding repeated dosing more efficiently.
Populated from source data, protocol or provided specification, in time units defined in the TIMEUNIT column.
If unused, set to 0. Observation records have a 0 entry.
Numeric
ADDL Number of ADDITIONAL doses given with the specified interval 0 Number of ADDITIONAL doses given with the specified interval, if single row should define multiple dosings,
Allows for coding repeated dosing more efficiently and linked to II. Number of doses coded by a single dose record will be (1+ADDL) with interval II.
Populated from source data, protocol or provided specification.
If unused, set to 0. Observation records have a 0 entry.
Numeric
Adverse event additional information
AE Adverse event flag Defines if the record codes an adverse event (0: no, 1: yes). This assumes that only Aes are coded (grades 1-5)! Numeric
AEGRADE Adverse event grade NA (if AE column not in data then no default used) Grade of adverse event (1-5) Numeric
AESER Seriousness of adverse event NA (if AE column not in data then no default used) Flag (0 not serious or 1 serious) Seriousness of adverse event Numeric
AEDRGREL Drug related adverse event or not NA (if AE column not in data then no default used) Flag (0 not drug related or 1 drug related) Drug related adverse event or not Numeric
Additional information
COMMENT Additional information for the observation/event Any text based comment String

19.2 Additional columns

In the following additional columns are documented. These columns are modeling tool dependent and are usually generated when importing a dataset with IQRdataGENERAL().

If present in the imported dataset they can be overwritten with default settings or the user can decide to keep the information.

In addition, the general dataset format can contain any additional columns. These are for example covariate columns that are generated by adding row-based covariate information in the source dataset to numerical columns for the analysis data as done by IQRdataGENERAL().

Column Name Column Content Type
ID Unique subject ID for modeling software Numeric
TIMEPOS TIME shifted to have TIMEPOS=0 at first event in a subject Numeric
TAD Time since last dose (pre-first-dose values same as TIME). This columns does not make a difference between different dose names. It contains the time since last dose, independently of the DOSENAME. If no dose defined in subject it is NA. TAD before the first dose is TIME Numeric
DV Observation value (0 for dosing events). Set to LLOQ if BLOQ handling method is M3 or M4. Set to LLOQ/2 is M5 or M6. Set to 0 if M7. If VALUE undefined but VALUETXT defined, DV will be determined as 1:N for alphabetic ordering of VALUETXT. Numeric
MDV Missing data value columns (0 if observation value is defined and IGNORE is NA, 1 for dose records and for NA observation values, 1 for all records that do have IGNORE not NA). MDV=1 for values below LLOQ if M1 method. If M6 method then MDV=0 for first DV*LLOQ in a sequence and MDV=1 for the following in a sequence. Numeric
EVID Event ID. 0 for observations, 1 for dosing records Numeric
CENS Censoring column. Depending on the method for BLLOQ handling this column is set. If M3 or M4 method is chosen then CENS=1 if DV*LLOQ. If M1, M5, M6, or M7 then CENS=0. 0 for all dosing events. Numeric
AMT Dose given at dosing instant (0 for observation records) Numeric
ADM Administration column. 0 for observation events. Number of input for dosing events. Usually defined by the user. Default values if not user defined: If more than one dose is considered, the order of the defined dose NAMEs defines the ADM number (1 for first, 2 for second, …). If a single dose is considered, then ADM is selected according to the information in ROUTE:
1 for: SUBCUT, ORAL, INTRAMUSCULAR, INTRAARTICULAR, RECTAL, INHALED, GENERAL_ABS1
2 for: IV, GENERAL_IV
3 for: TOPICAL, GENERAL_ABS0
Numeric
TINF Infusion time (TIMEUNIT). (0 for observation records, DURATION for dose records) Numeric
RATE Calculated from AMT and TINF Numeric
YTYPE Observation number. 0 for dosing records. 1,2,3,4, … for observation records. If observations provided in obsNAMES then this order will be used. Non-doses that are not defined in obsNAMES will obtain YTYPE=0 Numeric
DOSE Carry forward of the last defined AMT of a dose event. Values before first dose get the DOSE set to 0. If no dose present in subject DOSE is set to 0. This column does not make a difference between different dose names. It contains the AMT since last dose, independently of the selected doses in doseNAMES Numeric
TADDx Dosing input specific TAD column. Only present if more than one dosing input defined in “doseNAMES”. “x” defines the index of the dose NAME in doseNAMES. If a dose NAME does not appear in a subject the value is set to NA. Numeric
DOSEDx Carry forward of the last defined AMT of a dose event (with specific dose NAME). Values before first dose get the DOSEDx set to 0. If no dose present in subject DOSEDx is set to 0. “x” indicates the position of the dose NAME in doseNAMES. Numeric

19.3 Deprecated columns

Deprecated columns can still be used to allow backward compatibility. None of those columns is autoamtically added if not present in the original data.

Column Name Column Content Description Type
Identification of subject
CENTER Center number Numeric
SUBJECT Subject number If defined, cannot be left empty String
Study information
STUDYDES Study title, short description Populated from the protocol title. String
PART Part of study as defined per protocol Populated from the source data and/or protocol and only if more than one part is present. For example:
1=first part
2=second part
etc.
If no parts defined, set to 0.
String
EXTENS Extension of the core study Populated from the source data and/or protocol and only if extension is present. For example:
0=core study
1=first extension
2=second extension
etc.
String
Treatment group information
TRTNAMER Name of treatment to which subject was randomized Populated from the source data, protocol or the provided specifications, and can be recoded for harmonization across trials, if needed. String
TRTR Numeric randomized treatment flag Numeric randomized treatment flag (unique for each entry in TRTNAMER) Numeric
Event time information
DATEDAY Start date of event Populated from source data and/or may need to be imputed when it’s partial or missing.
Formatted as DD-MMM-YYYY (Example: 01-JUL-2015)
If information unknown and cannot be imputed, set to “UNKNOWN”.
String
DATETIME Start time of event Populated from source data and/or may need to be imputed when it’s partial or missing, in particular in case of profiles. String