4 Reproducibility of Results

Full reproducibility of all results generated at a certain point in time is critical for any type of analysis work but especially within the pharma industry. IQR Tools was designed in order to enable a user to be sure an analysis can be repeated years later and still give the same results.

4.1 The CRAN Nightmare

CRAN is a fantastic place and it brings R to us!

However, it essentially forces a user to always use the newest versions of packages. Packages on CRAN do change on a constant basis and no one ensures that the new packages work with packages that already are installed on your computer. System administrators in companies, having to handle R installations, do know that pain. The normal user is seldom confronted with that - except in the case that he or she has to reproduce a specific result that was generated on an old R installation.

The use of CRAN should be avoided at all costs in a corporate environment and should be limited to personal or experimental installations.

As an anecdote: It happened that one computer was installed with R and a set of R packages. Tests were run. All worked. Walking 6 meters to the next computer and repeating the same procedure. Tests were run. It did not work anymore. Digging down into why, it appeared that during the walk to the other computer a package on CRAN had changed. A real nightmare for reproducibility!

4.2 MRAN Time Machine

An alternative to CRAN is MRAN, the Microsoft R Application Network. MRAN takes snapshots of CRAN at given moments and about 2 months after the availability of a new R version freezes the snapshot of CRAN for this particular R version. This means that for a given R version a snapshot of CRAN R packages is available that:

  • Will never change
  • Is accessible via convenient dated links

An example for such a dated link for the R 3.4.1 version is: https://cran.microsoft.com/snapshot/2017-09-01/, on which CRAN content was frozen on the 1st of September 2017. Packages from this dated repository can be installed in R in the same manner as installation from CRAN, but the repos argument needs to be provided:

If Microsoft Open R is used instead of the normal R, then the options are already set automatically to the time machine repository for this particular R version.

4.3 IQR Tools impact

The installation utility of IQR Tools will install all dependencies (and their dependencies, and so on) from the Microsoft Time Machine repository that fits the R version on which it is installed.

For this reason it is recommended that at least for the initial installation of IQR Tools the forceDependencies argument is set to TRUE. Ideally, IQR Tools is installed on a clean R installation to ensure that all installed R packages are installed from the same dated MRAN repository.

4.4 IQR Tools testing

During testing of IQR Tools the complete unit testing suite is run on each new R version, ensuring that IQR Tools works as expected and correctly on it. As, by design, a version of R is linked with a defined set of package versions it is ensured that also years later, when using the same R version an analysis can be reproduced.

4.4.1 Defining R and IQR Tools version

When conducting an analysis it is important to “somehow” document the version of tools that were used. In Modeling Reports there typically is a section defining the version of R, NONMEM, etc. It almost never contains information about the version of all R packages and their dependencies.

IQR Tools provides a convenient function aux_version() which should be used at the beginning of each scripts. It essentially can be used to define the version of IQR Tools and R that are used for the analysis. The function also checks if the provided version information is correct and returns an error if not. In this sense, the function serves both as a documentation of the versions used and as a check if these versions indee are used.

An example for the use of the aux_version() function at the beginning of an analysis script is shown below:

The function cannot only be used for IQR Tools but for any other R package. However, if the IQR Tools installer is used and all dependencies are installed from the correct dated version on MRAN, the R version number will define the exact version of all dependencies - quite a reduction of complexity.

4.5 For the paranoid

The above approach at reproducibility requires the availability of the MRAN repository in the future. In order to circumvent any potential issues when Microsoft should decide to remove the MRAN repository from the internet, the function packageR_IQRtools() has been included in IQR Tools.

This function generates an “rrepo.zip” file containing the typical “rrepo” folder structure on CRAN and MRAN. The current R installation is anlyzed and all installed packages are downloaded as source packages and included into the rrepo structure. In addition, the correct version of Rtools, the Windows R installer for the current version, and the R source code are downloaded and placed into the rrepo folder structure as well.

This rrepo.zip file can be stored along with a particular analysis code. If in 20 years the a user needs to reproduce an analysis, the only things that need to be available are a computer, an power source for the computer, rrepo.zip and the original analysis files. A copy of the operating system used might be useful as well but is outside of the scope here.