Sunday, April 10, 2016

CSV reading in R: a value goes missing

R has the well known read.csv function. Using a data set from USDA/FAS there was a bit of a surprise:

image_thumb[1]

In the data frame df we expect that the number of different country codes is identical to the number of different country names. It isn’t. It turned out that one of the country codes is NA (Netherlands Antilles). By default this will be interpreted as a missing value (also known as NA).  The solution is to set the optional argument na.strings = NULL. What would have been the simplest approach to quickly find the problem?