19 General Dataset Format
In the context of model-based drug development often many different analyses, or pharmacometric activities, need to be performed within the same project. This includes, but is not limited to, graphical exploration, dose-concentration and concentration–response modeling, where many different biomarkers or endpoints might need to be considered.
Requesting a separate modeling dataset for each modeling activity considerably strains the resources of the modeler who prepares these datasets or of the supporting programming group, especially in the case when a certain level of validation is required. The approach used by IQRtools is to use a general dataset format for all pharmacometric analyses.
Requirements for development of this dataset format have been a well-defined structure that is project independent (same across projects, compounds, and indications), independent of the modeling activity to be performed, and independent of the modeling tool to be used.
Additionally, the format had to have a minimum level of redundancy and include information that is typically not found within traditional pharmacometric analysis datasets:
- names for variables
- units
- additional anotation
- etc.
This renders the dataset immediately understandable without additional documentation, reduces mistakes, and facilitates project hand-over processes.
The IQR Tools have been enabled to work with this format; however, can also work with any traditional NONMEM or MONOLIX data format – directly, or by conversion of traditional data formats into the general dataset format.
Note, as of Version 1.3.0 of IQR Tools the data format has undergone an improvement. Datasets used in pre 1.3.0 can still be used.
19.1 General columns
The following table contains the “general” columns. “General” means that these columns are not modeling tools related (exception: ADDL and II).
Modeling tool related columns will be added by IQRtools upon import of the dataset (IQRdataGENERAL()
) but can also be present in the dataset file before import. These additional columns are documented in the next section.
- REQUIRED columns for IQR Tools are marked in red.
- Columns marked in blue are created with default values if not existing in the imported dataset.
- Columns marked in italic created are created (if not already existing) if referenced columns (in “Default Content”) exist.
- Since datasets for pharmacometric analyses typically are comma-separated-files, no field in the general dataset is allowed to contain commata.
Column Name | Column Content | Default Content | Description | Type |
---|---|---|---|---|
General | ||||
IXGDF | Index of record in dataset. Starting from 1, then 2, 3, etc. until last record/row number | 1,2,3, etc. | This index is kept during post processing of the general dataset format. When rows/records are removed, the kept ones will keep the index in IXGDF. In this way contents of derived datasets can always be compared to contents of the general dataset format. | Numeric |
IGNORE | Reason/comment related to exclusion of the observation or dose from the analysis | NA | String | |
Identification of subject | ||||
USUBJID | Unique subject identifier | String | ||
INDNAME | Indication name | Populated from the protocol or the provided specifications. | String | |
IND | Numeric indication flag | 1-N for INDNAME in alphabetic order | Numeric indication flag (unique for each entry in INDNAME) | Numeric |
Study information | ||||
COMPOUND | Name of the investigational compound | Populated from the source data or protocol or provided specifications. | String | |
STUDY | Short study name/number | Populated from the source data. | String | |
STUDYN | Numeric study flag | 1-N for STUDY in alphabetic order | Numeric study flag (unique for each entry in STUDY) | Numeric |
Treatment group information | ||||
TRTNAME | Name of actual treatment given to subject | Populated from the source data, protocol or the provided specifications, and can be recoded for harmonization across trials, if needed. | String | |
TRT | Numeric treatment flag | 1-N for TRTNAME in alphabetic order | Numeric treatment flag (unique for each entry in TRTNAME) | Numeric |
Visit information | ||||
VISIT | Visit number |
Populated from the source data. When visit information not available in source data, it could be imputed from: - Other assessments happening at the same date, - PK log (in the case of PK with a sample number). Field can only be left empty in special cases when nominal visits do not make sense. Examples: - adverse events - comedication |
Numeric | |
VISNAME | Visit name | Populated from the source data. | String | |
BASE | Flag indicating assessments at baseline | 0 |
Derived based on VISIT and VISNAME: =0 for non-baseline visit, =1 for first baseline visit, =2 for second baseline visit, etc. |
Numeric |
SCREEN | Flag indicating assessments at screening | 0 |
Derived based on VISIT and VISNAME: =0 for non-screening visit, =1 for first screening visit, =2 for second screening visit, etc. |
Numeric |
Event time information | ||||
TIMEUNIT | Unit of all Numeric time definitions in the dataset |
Populated from the provided specification. Can be: “HOURS”, “MINUTES”, “DAYS”, “SECONDS”, “WEEKS”, “MONTHS”, “YEARS” |
String | |
DURATION | Duration of event | 0 |
Duration of event in same time units as TIMEUNIT. Derived from start date/time and end date/time of an event. Populated from source data and/or may need to be imputed when information is partial or missing. If end time is same as start time (e.g. bolus administration, 0 is a perfectly acceptable value for DURATION). If event continues post End of Study, set to -1. |
Numeric |
NT | Nominal event time |
Planned time of event. Based on protocol, in the time unit defined in TIMEUNIT column. For repeated visits (e.g., 1.001, 2.001), use defined protocol time for the originally scheduled visit. For “End of study” visit, use defined nominal time for end of study – even if subject dropped out and completed End of study visit earlier. Field cannot be left empty for per protocol planned time dependent assessments. Field can be left empty, e.g., for adverse events. |
Numeric | |
STDTC | Start date and time of event | Format: YYYY-MM-DDTHH:MM. Annotation only. | String | |
ENDTC | End date and time of event | Format: YYYY-MM-DDTHH:MM. Annotation only. | String | |
TIME | Actual time of event relative to first dose administration |
In the time unit defined in TIMEUNIT column. Derived as the difference between DATEDAY/DATETIME of the event and the DATEDAY/DATETIME of the first administration of selected primary study drug. Field can be left empty if TIME unknown. |
Numeric | |
PROFNR | Profile number | This column serves to define the number of the profile – which typically could be the day for a PK profile. It is information that would be used for the conduct of an NCA analysis. If provided, also PROFTIME needs to be defined. | Numeric | |
PROFTIME | Profile time | This column serves to define the actual or nominal time points for a PK profile. Where times are defined from the dose given previous to this profile (so starting at 0 post dose). This information, together with PROFNR column information is used to identify a particular profile for NCA. If provided, also PROFNR needs to be defined. | Numeric | |
Event value information | ||||
TYPENAME | Unique type of event |
Populated from the specification. For example: “Efficacy Readouts”, “Lab Values”, “Adverse Events”, etc. |
String | |
NAME | Unique short name of event |
Populated from the specification. For example: “Plasma concentration Compound X”, “Dose Compound X”, etc. |
String | |
VALUE | Value of event defined by NAME |
Applicable to Numeric readouts. Populated from the source data in the units defined in the UNIT column. Examples for the values: the given dose, the observed PK concentration, or the value of other readouts. Special cases: - For adverse events it is not used and should be set to 0 - For concomitant medications: the dose in pre-specified unit and frequency (e.g., mg/day) - For BLOQ records: any value can be entered that is lower than the actual LLOQ. It is not acceptable to set this value to “NaN” or “NA” since then no discrimination can be made between “missing” and “*LLOQ”. For PK records on untransformed data “0” is suggested. On log transformed data 0 should not be used but log(LLOQ/2) would be acceptable. |
Numeric | |
VALUETXT | Text version of value | NA |
Applicable to categorical readouts. Populated from the source data. Examples: “Male”, “Female”, “Asian”, etc. Can only be left empty if VALUE is not missing. |
String |
UNIT | Unit of the value reported in the VALUE column | Populated/converted from the source data into a reference unit. For same event the same unit has to be used across the dataset. For Labs, Vitals, etc., SI unit is the default. For dimensionless readouts use “NONE”. | String | |
ULOQ | Upper limit of quantification for event defined by NAME |
Populated from source data /protocol for PK/LAB assessment only if it is applicable. Units as in UNIT column. Field can be left empty if ULOQ unknown. |
Numeric | |
LLOQ | Lower limit of quantification for event defined by NAME | NA |
Populated from source data /protocol for PK/LAB assessment only if it is applicable. Units as in UNIT column. Field can be left empty if LLOQ unknown. |
Numeric |
Dose event additional information | ||||
ROUTE | Route of administration |
Populated from the source data and/or protocol. Field is to be left empty for observation events. Field cannot be left empty for dosing events. For dosing events the value should be one of the following: (“IV”,“SUBCUT”,“ORAL”,“INHALED”,“INTRAMUSCULAR”,“INTRAARTICULAR”,“RECTAL”,“TOPICAL”,“GENERAL_IV”,“GENERAL_ABS1”,“GENERAL_ABS0”) |
String | |
II | Interval of additional dosing | 0 |
Interval of dosing, if single row should define multiple dosings. Allows for coding repeated dosing more efficiently. Populated from source data, protocol or provided specification, in time units defined in the TIMEUNIT column. If unused, set to 0. Observation records have a 0 entry. |
Numeric |
ADDL | Number of ADDITIONAL doses given with the specified interval | 0 |
Number of ADDITIONAL doses given with the specified interval, if single row should define multiple dosings, Allows for coding repeated dosing more efficiently and linked to II. Number of doses coded by a single dose record will be (1+ADDL) with interval II. Populated from source data, protocol or provided specification. If unused, set to 0. Observation records have a 0 entry. |
Numeric |
Adverse event additional information | ||||
AE | Adverse event flag | Defines if the record codes an adverse event (0: no, 1: yes). This assumes that only Aes are coded (grades 1-5)! | Numeric | |
AEGRADE | Adverse event grade | NA (if AE column not in data then no default used) | Grade of adverse event (1-5) | Numeric |
AESER | Seriousness of adverse event | NA (if AE column not in data then no default used) | Flag (0 not serious or 1 serious) Seriousness of adverse event | Numeric |
AEDRGREL | Drug related adverse event or not | NA (if AE column not in data then no default used) | Flag (0 not drug related or 1 drug related) Drug related adverse event or not | Numeric |
Additional information | ||||
COMMENT | Additional information for the observation/event | Any text based comment | String |
19.2 Additional columns
In the following additional columns are documented. These columns are modeling tool dependent and are usually generated when importing a dataset with IQRdataGENERAL()
.
If present in the imported dataset they can be overwritten with default settings or the user can decide to keep the information.
In addition, the general dataset format can contain any additional columns. These are for example covariate columns that are generated by adding row-based covariate information in the source dataset to numerical columns for the analysis data as done by IQRdataGENERAL()
.
Column Name | Column Content | Type |
---|---|---|
ID | Unique subject ID for modeling software | Numeric |
TIMEPOS | TIME shifted to have TIMEPOS=0 at first event in a subject | Numeric |
TAD | Time since last dose (pre-first-dose values same as TIME). This columns does not make a difference between different dose names. It contains the time since last dose, independently of the DOSENAME. If no dose defined in subject it is NA. TAD before the first dose is TIME | Numeric |
DV | Observation value (0 for dosing events). Set to LLOQ if BLOQ handling method is M3 or M4. Set to LLOQ/2 is M5 or M6. Set to 0 if M7. If VALUE undefined but VALUETXT defined, DV will be determined as 1:N for alphabetic ordering of VALUETXT. | Numeric |
MDV | Missing data value columns (0 if observation value is defined and IGNORE is NA, 1 for dose records and for NA observation values, 1 for all records that do have IGNORE not NA). MDV=1 for values below LLOQ if M1 method. If M6 method then MDV=0 for first DV*LLOQ in a sequence and MDV=1 for the following in a sequence. | Numeric |
EVID | Event ID. 0 for observations, 1 for dosing records | Numeric |
CENS | Censoring column. Depending on the method for BLLOQ handling this column is set. If M3 or M4 method is chosen then CENS=1 if DV*LLOQ. If M1, M5, M6, or M7 then CENS=0. 0 for all dosing events. | Numeric |
AMT | Dose given at dosing instant (0 for observation records) | Numeric |
ADM |
Administration column. 0 for observation events. Number of input for dosing events. Usually defined by the user. Default values if not user defined:
If more than one dose is considered, the order of the defined dose NAMEs defines the ADM number (1 for first, 2 for second, …). If a single dose is considered, then ADM is selected according to the information in ROUTE: 1 for: SUBCUT, ORAL, INTRAMUSCULAR, INTRAARTICULAR, RECTAL, INHALED, GENERAL_ABS1 2 for: IV, GENERAL_IV 3 for: TOPICAL, GENERAL_ABS |
|Numeri |
TINF | Infusion time (TIMEUNIT). (0 for observation records, DURATION for dose records) | Numeric |
RATE | Calculated from AMT and TINF | Numeric |
YTYPE | Observation number. 0 for dosing records. 1,2,3,4, … for observation records. If observations provided in obsNAMES then this order will be used. Non-doses that are not defined in obsNAMES will obtain YTYPE=0 | Numeric |
DOSE | Carry forward of the last defined AMT of a dose event. Values before first dose get the DOSE set to 0. If no dose present in subject DOSE is set to 0. This column does not make a difference between different dose names. It contains the AMT since last dose, independently of the selected doses in doseNAMES | Numeric |
TADDx | Dosing input specific TAD column. Only present if more than one dosing input defined in “doseNAMES”. “x” defines the index of the dose NAME in doseNAMES. If a dose NAME does not appear in a subject the value is set to NA. | Numeric |
DOSEDx | Carry forward of the last defined AMT of a dose event (with specific dose NAME). Values before first dose get the DOSEDx set to 0. If no dose present in subject DOSEDx is set to 0. “x” indicates the position of the dose NAME in doseNAMES. | Numeric |
19.3 Deprecated columns
Deprecated columns can still be used to allow backward compatibility. None of those columns is autoamtically added if not present in the original data.
Column Name | Column Content | Description | Type |
---|---|---|---|
Identification of subject | |||
CENTER | Center number | Numeric | |
SUBJECT | Subject number | If defined, cannot be left empty | String |
Study information | |||
STUDYDES | Study title, short description | Populated from the protocol title. | String |
PART | Part of study as defined per protocol |
Populated from the source data and/or protocol and only if more than one part is present. For example: 1=first part 2=second part etc. If no parts defined, set to 0. |
String |
EXTENS | Extension of the core study |
Populated from the source data and/or protocol and only if extension is present. For example: 0=core study 1=first extension 2=second extension etc. |
String |
Treatment group information | |||
TRTNAMER | Name of treatment to which subject was randomized | Populated from the source data, protocol or the provided specifications, and can be recoded for harmonization across trials, if needed. | String |
TRTR | Numeric randomized treatment flag | Numeric randomized treatment flag (unique for each entry in TRTNAMER) | Numeric |
Event time information | |||
DATEDAY | Start date of event |
Populated from source data and/or may need to be imputed when it’s partial or missing. Formatted as DD-MMM-YYYY (Example: 01-JUL-2015) If information unknown and cannot be imputed, set to “UNKNOWN”. |
String |
DATETIME | Start time of event | Populated from source data and/or may need to be imputed when it’s partial or missing, in particular in case of profiles. | String |