19 General Dataset Format

In the context of model-based drug development often many different analyses, or pharmacometric activities, need to be performed within the same project. This includes, but is not limited to, graphical exploration, dose-concentration and concentration–response modeling, where many different biomarkers or endpoints might need to be considered.

Requesting a separate modeling dataset for each modeling activity considerably strains the resources of the modeler who prepares these datasets or of the supporting programming group, especially in the case when a certain level of validation is required. The approach used by IQRtools is to use a general dataset format for all pharmacometric analyses.

Requirements for development of this dataset format have been a well-defined structure that is project independent (same across projects, compounds, and indications), independent of the modeling activity to be performed, and independent of the modeling tool to be used.

Additionally, the format had to have a minimum level of redundancy and include information that is typically not found within traditional pharmacometric analysis datasets:

names for variables
units
additional anotation
etc.

This renders the dataset immediately understandable without additional documentation, reduces mistakes, and facilitates project hand-over processes.

The IQR Tools have been enabled to work with this format; however, can also work with any traditional NONMEM or MONOLIX data format – directly, or by conversion of traditional data formats into the general dataset format.

Note, as of Version 1.3.0 of IQR Tools the data format has undergone an improvement. Datasets used in pre 1.3.0 can still be used.

19.1 General columns

The following table contains the “general” columns. “General” means that these columns are not modeling tools related (exception: ADDL and II).

Modeling tool related columns will be added by IQRtools upon import of the dataset (IQRdataGENERAL()) but can also be present in the dataset file before import. These additional columns are documented in the next section.

REQUIRED columns for IQR Tools are marked in red.
Columns marked in blue are created with default values if not existing in the imported dataset.
Columns marked in italic created are created (if not already existing) if referenced columns (in “Default Content”) exist.
Since datasets for pharmacometric analyses typically are comma-separated-files, no field in the general dataset is allowed to contain commata.

Column Name	Column Content	Default Content	Description	Type
General
IXGDF	Index of record in dataset. Starting from 1, then 2, 3, etc. until last record/row number	1,2,3, etc.	This index is kept during post processing of the general dataset format. When rows/records are removed, the kept ones will keep the index in IXGDF. In this way contents of derived datasets can always be compared to contents of the general dataset format.	Numeric
IGNORE	Reason/comment related to exclusion of the observation or dose from the analysis	NA		String
Identification of subject
USUBJID	Unique subject identifier			String
INDNAME	Indication name		Populated from the protocol or the provided specifications.	String
IND	Numeric indication flag	1-N for INDNAME in alphabetic order	Numeric indication flag (unique for each entry in INDNAME)	Numeric
Study information
COMPOUND	Name of the investigational compound		Populated from the source data or protocol or provided specifications.	String
STUDY	Short study name/number		Populated from the source data.	String
STUDYN	Numeric study flag	1-N for STUDY in alphabetic order	Numeric study flag (unique for each entry in STUDY)	Numeric
Treatment group information
TRTNAME	Name of actual treatment given to subject		Populated from the source data, protocol or the provided specifications, and can be recoded for harmonization across trials, if needed.	String
TRT	Numeric treatment flag	1-N for TRTNAME in alphabetic order	Numeric treatment flag (unique for each entry in TRTNAME)	Numeric
Visit information
VISIT	Visit number		Populated from the source data. When visit information not available in source data, it could be imputed from: - Other assessments happening at the same date, - PK log (in the case of PK with a sample number). Field can only be left empty in special cases when nominal visits do not make sense. Examples: - adverse events - comedication	Numeric
VISNAME	Visit name		Populated from the source data.	String
BASE	Flag indicating assessments at baseline	0	Derived based on VISIT and VISNAME: =0 for non-baseline visit, =1 for first baseline visit, =2 for second baseline visit, etc.	Numeric
SCREEN	Flag indicating assessments at screening	0	Derived based on VISIT and VISNAME: =0 for non-screening visit, =1 for first screening visit, =2 for second screening visit, etc.	Numeric
Event time information
TIMEUNIT	Unit of all Numeric time definitions in the dataset		Populated from the provided specification. Can be: “HOURS”, “MINUTES”, “DAYS”, “SECONDS”, “WEEKS”, “MONTHS”, “YEARS”	String
DURATION	Duration of event	0	Duration of event in same time units as TIMEUNIT. Derived from start date/time and end date/time of an event. Populated from source data and/or may need to be imputed when information is partial or missing. If end time is same as start time (e.g. bolus administration, 0 is a perfectly acceptable value for DURATION). If event continues post End of Study, set to -1.	Numeric
NT	Nominal event time		Planned time of event. Based on protocol, in the time unit defined in TIMEUNIT column. For repeated visits (e.g., 1.001, 2.001), use defined protocol time for the originally scheduled visit. For “End of study” visit, use defined nominal time for end of study – even if subject dropped out and completed End of study visit earlier. Field cannot be left empty for per protocol planned time dependent assessments. Field can be left empty, e.g., for adverse events.	Numeric
STDTC	Start date and time of event		Format: YYYY-MM-DDTHH:MM. Annotation only.	String
ENDTC	End date and time of event		Format: YYYY-MM-DDTHH:MM. Annotation only.	String
TIME	Actual time of event relative to first dose administration		In the time unit defined in TIMEUNIT column. Derived as the difference between DATEDAY/DATETIME of the event and the DATEDAY/DATETIME of the first administration of selected primary study drug. Field can be left empty if TIME unknown.	Numeric
PROFNR	Profile number		This column serves to define the number of the profile – which typically could be the day for a PK profile. It is information that would be used for the conduct of an NCA analysis. If provided, also PROFTIME needs to be defined.	Numeric
PROFTIME	Profile time		This column serves to define the actual or nominal time points for a PK profile. Where times are defined from the dose given previous to this profile (so starting at 0 post dose). This information, together with PROFNR column information is used to identify a particular profile for NCA. If provided, also PROFNR needs to be defined.	Numeric
Event value information
TYPENAME	Unique type of event		Populated from the specification. For example: “Efficacy Readouts”, “Lab Values”, “Adverse Events”, etc.	String
NAME	Unique short name of event		Populated from the specification. For example: “Plasma concentration Compound X”, “Dose Compound X”, etc.	String
VALUE	Value of event defined by NAME		Applicable to Numeric readouts. Populated from the source data in the units defined in the UNIT column. Examples for the values: the given dose, the observed PK concentration, or the value of other readouts. Special cases: - For adverse events it is not used and should be set to 0 - For concomitant medications: the dose in pre-specified unit and frequency (e.g., mg/day) - For BLOQ records: any value can be entered that is lower than the actual LLOQ. It is not acceptable to set this value to “NaN” or “NA” since then no discrimination can be made between “missing” and “*LLOQ”. For PK records on untransformed data “0” is suggested. On log transformed data 0 should not be used but log(LLOQ/2) would be acceptable.	Numeric
VALUETXT	Text version of value	NA	Applicable to categorical readouts. Populated from the source data. Examples: “Male”, “Female”, “Asian”, etc. Can only be left empty if VALUE is not missing.	String
UNIT	Unit of the value reported in the VALUE column		Populated/converted from the source data into a reference unit. For same event the same unit has to be used across the dataset. For Labs, Vitals, etc., SI unit is the default. For dimensionless readouts use “NONE”.	String
ULOQ	Upper limit of quantification for event defined by NAME		Populated from source data /protocol for PK/LAB assessment only if it is applicable. Units as in UNIT column. Field can be left empty if ULOQ unknown.	Numeric
LLOQ	Lower limit of quantification for event defined by NAME	NA	Populated from source data /protocol for PK/LAB assessment only if it is applicable. Units as in UNIT column. Field can be left empty if LLOQ unknown.	Numeric
Dose event additional information
ROUTE	Route of administration		Populated from the source data and/or protocol. Field is to be left empty for observation events. Field cannot be left empty for dosing events. For dosing events the value should be one of the following: (“IV”,“SUBCUT”,“ORAL”,“INHALED”,“INTRAMUSCULAR”,“INTRAARTICULAR”,“RECTAL”,“TOPICAL”,“GENERAL_IV”,“GENERAL_ABS1”,“GENERAL_ABS0”)	String
II	Interval of additional dosing	0	Interval of dosing, if single row should define multiple dosings. Allows for coding repeated dosing more efficiently. Populated from source data, protocol or provided specification, in time units defined in the TIMEUNIT column. If unused, set to 0. Observation records have a 0 entry.	Numeric
ADDL	Number of ADDITIONAL doses given with the specified interval	0	Number of ADDITIONAL doses given with the specified interval, if single row should define multiple dosings, Allows for coding repeated dosing more efficiently and linked to II. Number of doses coded by a single dose record will be (1+ADDL) with interval II. Populated from source data, protocol or provided specification. If unused, set to 0. Observation records have a 0 entry.	Numeric
Adverse event additional information
AE	Adverse event flag		Defines if the record codes an adverse event (0: no, 1: yes). This assumes that only Aes are coded (grades 1-5)!	Numeric
AEGRADE	Adverse event grade	NA (if AE column not in data then no default used)	Grade of adverse event (1-5)	Numeric
AESER	Seriousness of adverse event	NA (if AE column not in data then no default used)	Flag (0 not serious or 1 serious) Seriousness of adverse event	Numeric
AEDRGREL	Drug related adverse event or not	NA (if AE column not in data then no default used)	Flag (0 not drug related or 1 drug related) Drug related adverse event or not	Numeric
Additional information
COMMENT	Additional information for the observation/event		Any text based comment	String

19.2 Additional columns

In the following additional columns are documented. These columns are modeling tool dependent and are usually generated when importing a dataset with IQRdataGENERAL().

If present in the imported dataset they can be overwritten with default settings or the user can decide to keep the information.

In addition, the general dataset format can contain any additional columns. These are for example covariate columns that are generated by adding row-based covariate information in the source dataset to numerical columns for the analysis data as done by IQRdataGENERAL().

Column Name	Column Content	Type
ID	Unique subject ID for modeling software	Numeric
TIMEPOS	TIME shifted to have TIMEPOS=0 at first event in a subject	Numeric
TAD	Time since last dose (pre-first-dose values same as TIME). This columns does not make a difference between different dose names. It contains the time since last dose, independently of the DOSENAME. If no dose defined in subject it is NA. TAD before the first dose is TIME	Numeric
DV	Observation value (0 for dosing events). Set to LLOQ if BLOQ handling method is M3 or M4. Set to LLOQ/2 is M5 or M6. Set to 0 if M7. If VALUE undefined but VALUETXT defined, DV will be determined as 1:N for alphabetic ordering of VALUETXT.	Numeric
MDV	Missing data value columns (0 if observation value is defined and IGNORE is NA, 1 for dose records and for NA observation values, 1 for all records that do have IGNORE not NA). MDV=1 for values below LLOQ if M1 method. If M6 method then MDV=0 for first DV*LLOQ in a sequence and MDV=1 for the following in a sequence.	Numeric
EVID	Event ID. 0 for observations, 1 for dosing records	Numeric
CENS	Censoring column. Depending on the method for BLLOQ handling this column is set. If M3 or M4 method is chosen then CENS=1 if DV*LLOQ. If M1, M5, M6, or M7 then CENS=0. 0 for all dosing events.	Numeric
AMT	Dose given at dosing instant (0 for observation records)	Numeric
ADM	Administration column. 0 for observation events. Number of input for dosing events. Usually defined by the user. Default values if not user defined: If more than one dose is considered, the order of the defined dose NAMEs defines the ADM number (1 for first, 2 for second, …). If a single dose is considered, then ADM is selected according to the information in ROUTE: 1 for: SUBCUT, ORAL, INTRAMUSCULAR, INTRAARTICULAR, RECTAL, INHALED, GENERAL_ABS1 2 for: IV, GENERAL_IV 3 for: TOPICAL, GENERAL_ABS	\|Numeri
TINF	Infusion time (TIMEUNIT). (0 for observation records, DURATION for dose records)	Numeric
RATE	Calculated from AMT and TINF	Numeric
YTYPE	Observation number. 0 for dosing records. 1,2,3,4, … for observation records. If observations provided in obsNAMES then this order will be used. Non-doses that are not defined in obsNAMES will obtain YTYPE=0	Numeric
DOSE	Carry forward of the last defined AMT of a dose event. Values before first dose get the DOSE set to 0. If no dose present in subject DOSE is set to 0. This column does not make a difference between different dose names. It contains the AMT since last dose, independently of the selected doses in doseNAMES	Numeric
TADDx	Dosing input specific TAD column. Only present if more than one dosing input defined in “doseNAMES”. “x” defines the index of the dose NAME in doseNAMES. If a dose NAME does not appear in a subject the value is set to NA.	Numeric
DOSEDx	Carry forward of the last defined AMT of a dose event (with specific dose NAME). Values before first dose get the DOSEDx set to 0. If no dose present in subject DOSEDx is set to 0. “x” indicates the position of the dose NAME in doseNAMES.	Numeric

19.3 Deprecated columns

Deprecated columns can still be used to allow backward compatibility. None of those columns is autoamtically added if not present in the original data.

Column Name	Column Content	Description	Type
Identification of subject
CENTER	Center number		Numeric
SUBJECT	Subject number	If defined, cannot be left empty	String
Study information
STUDYDES	Study title, short description	Populated from the protocol title.	String
PART	Part of study as defined per protocol	Populated from the source data and/or protocol and only if more than one part is present. For example: 1=first part 2=second part etc. If no parts defined, set to 0.	String
EXTENS	Extension of the core study	Populated from the source data and/or protocol and only if extension is present. For example: 0=core study 1=first extension 2=second extension etc.	String
Treatment group information
TRTNAMER	Name of treatment to which subject was randomized	Populated from the source data, protocol or the provided specifications, and can be recoded for harmonization across trials, if needed.	String
TRTR	Numeric randomized treatment flag	Numeric randomized treatment flag (unique for each entry in TRTNAMER)	Numeric
Event time information
DATEDAY	Start date of event	Populated from source data and/or may need to be imputed when it’s partial or missing. Formatted as DD-MMM-YYYY (Example: 01-JUL-2015) If information unknown and cannot be imputed, set to “UNKNOWN”.	String
DATETIME	Start time of event	Populated from source data and/or may need to be imputed when it’s partial or missing, in particular in case of profiles.	String