Monday, October 5, 2020

Excel has a 1 million row limit: missing covid tests data

From [1]:

A million-row limit on Microsoft’s Excel spreadsheet software may have led to Public Health England misplacing nearly 16,000 Covid test results, it is understood.

The data error, which led to 15,841 positive tests being left off the official daily figures, means than 50,000 potentially infectious people may have been missed by contact tracers and not told to self-isolate. 


Many spreadsheets contain errors [2] and it is not always easy to spot them. The unstructured nature of Excel is both an advantage (flexible) but also a danger.

My usual advice is:

  1. Don't use Excel as a database
  2. If you need to store large data in Excel, use the data model (a.k.a. Power Query) and not the sheets. The data model is more like a real database (e.g. columns have a type, large tables ok).
  3. Use more structured environments (like modeling languages) for analysis where possible and use Excel only for data entry and reporting


References


  1. How Excel may have caused loss of 16,000 Covid tests in England, https://www.theguardian.com/politics/2020/oct/05/how-excel-may-have-caused-loss-of-16000-covid-tests-in-england, October 5, 2020.
  2. Excel errors: How Microsoft's spreadsheet may be hazardous to your health, https://www.zdnet.com/article/excel-errors-microsofts-spreadsheet-may-be-hazardous-to-your-health/, July 2017.

No comments:

Post a Comment