Reading a large CSV file into Excel's data model is somewhat slow compared to R: Excel/Power Query takes 100 seconds vs R only 25 seconds on the same CSV file (with 1.8 million records).
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVMhbo-sME0k2zhdtaF0iHQeoprA7y0uZSqQaFlK2e7llrdD3q1Cgm_Zt32sFRNEx-Ae9rwsTufCGlMCFtPyJjgdF0J3Cjnq6wSmRL0k2EbFIqjtCFF9FSSi3EOA5Qk63KUPixuPAqwPhg/s1600/csv1.PNG) |
Reading CSV into Excel/Power Query data model |
I used some VBA code for the timing. The only thing we do is to load data into the data model (i.e. there is no Power Pivot table to create).
R does this quite a bit faster:
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiX2R9QoDgP-D9fqUd_pUIp3Pav5GgUnpUliWxAI1ohJBlQT__ff07wg-COy-YT7Po2-H9mNInJC00-NKj9C3eaTxIa_UEnZ6LpgjkN5KNpDzZ1YZDYangIcD7vsgit0qfFJi5KDJBIA0jy/s1600/csv2.PNG) |
Reading the same CSV into R |
I expected this to be closer. I think this operation should be IO bound instead of CPU bound, something Microsoft should know how to do super fast.
No comments:
Post a Comment