Monday, March 13, 2017

Dynamic Documents and their importance

When a model is ran say every few months and a report has to be produced based on the model results, a copy-paste like modus operandi can easily cause synchronization errors: the wrong text is inserted not referring to the current results but to an older simulation result. I recently had a discussion with someone from a government agency who experienced exactly this problem.

A similar thing can and does happen when documenting a piece of software: examples, code fragments or screen dumps refer to older versions of the program.

R and Rstudio provide tools to create dynamic documents from R using a nice Rstudio environment. We could have said in the document something like:

A cost savings of `r round(100*(oldcost-cost)/oldcost,1)`% has been achieved.

which would generate:

       A cost savings of 25.4% has been achieved.

Here we evaluated an R expression using data from the R session associated with the R markdown document.

Literate Programming

This is the grand-daddy of R Markdown. LP (Literate Programming) was meant to document source code, or rather have a single design document that could spit out either TeX for typesetting the document or Pascal (and later C) source code (4,5). Actually in some respects this tool was very flexible: things could be documented out-of-order. The name web was used for this. Knuth emphasized the usefulness of this: the order of thinking and writing about code is different than the order of generated source code. R Markdown prefers code chunks to be in order.

R users will appreciate the assignment symbol in LP:

image 

R Markdown

The markup language to write documents in R is simple and somewhat restricted (6). Actually that may be a good thing. Too many ways to change the lay out of a document often leads to a somewhat less clean, more convoluted final product. Writing documentation while writing code is a big step forward I think. It makes it more pleasant to use and also has a beneficial impact on the code quality (explaining what you try to do helps to focus the mind and gets things better organized). I have heard people advancing the argument that documentation should be written before the code is written, but that looks to me not always a feasible or easy approach.

Recently better support for large documents (e.g. books) has been added in the form of being able to work sub-documents (sections or chapters) individually. Some books produced this way are shown at https://bookdown.org/.

Output formats

The main output formats of R Markdown are HTML, LaTeX, and MS Word. MS Word is especially important in non-academic institutions as Word and more general Office is a main work horse there. Click on the figures to enlarge.

image imageimage

Here is HTML, LaTeX and Word output of the same small example document.

Tables

Outputting tables is somewhat of a challenge. The easiest is to use the function kable. The LaTex and Word output is reasonable, but the HTML output needs some adjustment:

imageimageimage

Books

bookdown Authoring Books and Technical Documents with R Markdown book coverDynamic Documents with R and knitr, Second Edition book coverReproducible Research with R and R Studio, Second Edition book cover

References
  1. Yihui Xie, Dynamic Documents with R and knitr, 2nd ediition, Chapmand and Hall/CRC, 2015.
  2. Yihui Xie, bookdown, Authoring Books and Technical Documents with R Markdown, Chapman and Hall/CRC, 2016. Also available through: https://bookdown.org/yihui/bookdown/
  3. Christopher Gandrud, Reproducible Research with R and RStudio, Chapman and Hall/CRC, 2015
  4. Donald Knuth, Literate Programming, http://www.literateprogramming.com/knuthweb.pdf, paper on LP (LP means Literate Programming in this context and not Linear Programming).
  5. Donald Knuth, Literate Programming, CSLI publications, 1992 (the book)
  6. https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf a short document summarizing R Markdown.