Yet Another Math Programming Consultant: Dynamic Documents and their importance

Monday, March 13, 2017

Dynamic Documents and their importance

When a model is ran say every few months and a report has to be produced based on the model results, a copy-paste like modus operandi can easily cause synchronization errors: the wrong text is inserted not referring to the current results but to an older simulation result. I recently had a discussion with someone from a government agency who experienced exactly this problem.

A similar thing can and does happen when documenting a piece of software: examples, code fragments or screen dumps refer to older versions of the program.

R and Rstudio provide tools to create dynamic documents from R using a nice Rstudio environment. We could have said in the document something like:

A cost savings of `r round(100*(oldcost-cost)/oldcost,1)`% has been achieved.

which would generate:

A cost savings of 25.4% has been achieved.

Here we evaluated an R expression using data from the R session associated with the R markdown document.

Literate Programming

This is the grand-daddy of R Markdown. LP (Literate Programming) was meant to document source code, or rather have a single design document that could spit out either TeX for typesetting the document or Pascal (and later C) source code (4,5). Actually in some respects this tool was very flexible: things could be documented out-of-order. The name web was used for this. Knuth emphasized the usefulness of this: the order of thinking and writing about code is different than the order of generated source code. R Markdown prefers code chunks to be in order.

R users will appreciate the assignment symbol in LP:

R Markdown

The markup language to write documents in R is simple and somewhat restricted (6). Actually that may be a good thing. Too many ways to change the lay out of a document often leads to a somewhat less clean, more convoluted final product. Writing documentation while writing code is a big step forward I think. It makes it more pleasant to use and also has a beneficial impact on the code quality (explaining what you try to do helps to focus the mind and gets things better organized). I have heard people advancing the argument that documentation should be written before the code is written, but that looks to me not always a feasible or easy approach.

Recently better support for large documents (e.g. books) has been added in the form of being able to work sub-documents (sections or chapters) individually. Some books produced this way are shown at https://bookdown.org/.

Output formats

The main output formats of R Markdown are HTML, LaTeX, and MS Word. MS Word is especially important in non-academic institutions as Word and more general Office is a main work horse there. Click on the figures to enlarge.

Here is HTML, LaTeX and Word output of the same small example document.

Tables

Outputting tables is somewhat of a challenge. The easiest is to use the function kable. The LaTex and Word output is reasonable, but the HTML output needs some adjustment:

Books

bookdown Authoring Books and Technical Documents with R Markdown book cover Dynamic Documents with R and knitr, Second Edition book cover Reproducible Research with R and R Studio, Second Edition book cover

References

Yihui Xie, Dynamic Documents with R and knitr, 2nd ediition, Chapmand and Hall/CRC, 2015.
Yihui Xie, bookdown, Authoring Books and Technical Documents with R Markdown, Chapman and Hall/CRC, 2016. Also available through: https://bookdown.org/yihui/bookdown/
Christopher Gandrud, Reproducible Research with R and RStudio, Chapman and Hall/CRC, 2015
Donald Knuth, Literate Programming, http://www.literateprogramming.com/knuthweb.pdf, paper on LP (LP means Literate Programming in this context and not Linear Programming).
Donald Knuth, Literate Programming, CSLI publications, 1992 (the book)
https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf a short document summarizing R Markdown.

Yet Another Math Programming Consultant