In Pivot a table: GAMS, SQLite, R and Python we described a pivoting operation on a small example. In the comments it was mentioned that for larger tables or dataframes, especially tydir is very fast. Here we try to confirm this using an artificial example:
Below we time each step (see here for more information).
GAMS Step | Code | Time (seconds) |
GAMS populate a 2D parameter with 1 million entries | set i /i1*i100000/ j /j1*j10/ ; parameter p(i,j); p(i,j) = uniform(0,1); | 0.4 |
Write a GDX file | execute_unload 'x.gdx'; | 0.3 |
Store in SQLite database | execute 'gdx2sqlite -i x.gdx -o x.db -fast'; | 5 |
R Step | Code | Time (seconds) |
Read database table | df1<-dbGetQuery(db,"select * from p") |
2 |
Pivot using SQL (no indices) | df2<- dbGetQuery(db,"select tj1.i as i, tj1.value as j1, tj2.value as j2, |
50 |
Pivot using SQL after creating indices on columns i and j | id (same query as above) |
15 |
Pivot using SQL (no indices, alternative query) | > df1 <- dbGetQuery(db," + select i, + sum(case when j='j1' then value else null end) as j1, + sum(case when j='j2' then value else null end) as j2, + sum(case when j='j3' then value else null end) as j3, + sum(case when j='j4' then value else null end) as j4, + sum(case when j='j5' then value else null end) as j5, + sum(case when j='j6' then value else null end) as j6, + sum(case when j='j7' then value else null end) as j7, + sum(case when j='j8' then value else null end) as j8, + sum(case when j='j9' then value else null end) as j9, + sum(case when j='j10' then value else null end) as j10 + from p + group by i") |
5 |
Pivot using reshape2/dcast | df2<-dcast(df1,i~j,value.var="value") |
2 |
Pivot using tidyr/spread | df2<-spread(df1,j,value) |
2 |
Python Step | Code | Time (seconds) |
pandas/pivot | df2=df1.pivot(index='i',columns='j',values='value') | 2 |
Conclusion
The methods that work on in-memory datastructures are faster than SQL but depending on the SQL formulation we can come close. The big advantage of the pivot methods in R and Python is that they require much less typing (and less thought).
No comments:
Post a Comment