Saturday, February 18, 2017

A script to generate power pivot tables

In (1) I provided a GAMS/VBS script to generate an Excel spreadsheet containing a pivot table that used a CSV file as external data source. Essentially the pivot table provides a view of the data in the CSV file. Here I show a script that allows to store the data inside the Excel data model: no external CSV file is used after loading the data into the data model.

$ontext

  
Example code that creates a pivot table with large data loaded

  
into Excel's data model.

$offtext

sets c,cty,yr,attr;
parameter psd(c,cty,yr,attr) 'production, supply and distribution'
;

$gdxin psd_alldata.gdx
$loaddc c=dim1 cty=dim2 yr=dim3 attr=dim4 psd

parameter
TotalProduction(c,yr);
TotalProduction(c,yr) =
sum(cty,psd(c,cty,yr,'Production'
));


alias
(Commodities,Countries,Year,Attributes,*);
parameter
Cube(Commodities,Countries,Year,Attributes);

Cube(c,cty,yr,attr) = psd(c,cty,yr,attr);
Cube(c,
'World',yr,'Production'
) = TotalProduction(c,yr);

$set gdxfile         cube.gdx
$set csvfile         cube.csv

execute_unload "%gdxfile%"
,Cube;
execute "gdxdump %gdxfile% output=%csvfile% format=csv symb=Cube"
;


$set xlsfile         %system.fp%pivotcube.xlsx
$set scriptname      script.vbs

execute "cscript %scriptname% //Nologo"
;


$onecho > %scriptname%
Wscript.Echo "Creating pivot table..."

Wscript.Echo "This will take a minute."
Wscript.Echo ""
Set xl = CreateObject("Excel.Application")
xl.DisplayAlerts=False
Wscript.Echo "  Creating Workbook..."
Set wb = xl.Workbooks.Add

Wscript.Echo "  Creating Query..."
wb.Queries.Add "cube", _
 
"let Source = Csv.Document(File.Contents(""%system.fp%%csvfile%""),[Delimiter="",""])," & _
 
"  #""Headers"" = Table.PromoteHeaders(Source)," & _
 
"  #""Type"" = Table.TransformColumnTypes(#""Headers"",{{""Val"", Number.Type}}) in #""Type"""

Wscript.Echo "  Creating Connection (loading data)..."
Set con = wb.Connections.Add2("Query - cube", _
       
"Connection to the 'cube' query in the workbook.", _
       
"OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=cube" _
       
, """cube""", 6, True, False)

Wscript.Echo "  Creating PivotCache..."
Set pc = wb.PivotCaches.Create(2,con,6)

If wb.Sheets.count = 0 Then
  
Set sh = wb.Sheets.Add()
Else
  
Set sh = wb.Sheets(1)
End If

Wscript.Echo "  Creating PivotTable..."
Set pt = pc.CreatePivotTable(sh.Range("A1"))
pt.SmallGrid = False

Wscript.Echo "  Initial Layout..."
pt.CubeFields("[cube].[Commodities]").Orientation=3
pt.CubeFields("[cube].[Attributes]").Orientation=3
pt.CubeFields("[cube].[Year]").Orientation=2
pt.CubeFields("[cube].[Countries]").Orientation=1

xlSum = -4157
call pt.CubeFields.GetMeasure("[cube].[Val]",xlSum,"Sum of Val")
call pt.AddDataField(pt.CubeFields("[Measures].[Sum of Val]"), "Sum of Val")

pt.PivotFields("[cube].[Commodities].[Commodities]").ClearAllFilters
pt.PivotFields("[cube].[Commodities].[Commodities]").CurrentPageName = _
  
"[cube].[Commodities].&[Coffee, Green]"
pt.PivotFields("[cube].[Attributes].[Attributes]").ClearAllFilters
pt.PivotFields("[cube].[Attributes].[Attributes]").CurrentPageName = _
   
"[cube].[Attributes].&[Production]"

pt.TableStyle2 = "PivotStyleDark7"

Wscript.Echo "  Saving %xlsfile%..."
wb.SaveAs("%xlsfile%")
wb.Close
xl.Quit

$offecho

The steps are as follows:

  1. Load data into GAMS.
  2. Create a 4 dimensional cube parameter suited for display as a pivot table. The dimensions are commodity, country, year, attribute.
  3. Export the cube as GDX file.
  4. Convert the GDX file to a CSV file.
  5. Run a VBscript script that talks to Excel and performs the following steps:
    1. Load the data into the data model.
    2. Create a pivot table for this data.
    3. Create an initial layout that makes sense for a user of the data (opposed to an empty pivot table).

The result will look like:

image

We see three regions. On the left we see the actual pivot table:

image

The most right panel allows to change the layout of the pivot table:

image

Note that our 4 dimensional cube parameter has become a 5 column table: the values are a separate column.

The smaller panel in the middle shows the queries in data model:

image

Data model

The step to load the CSV data into the data model of Excel is actually written in what is called the M language:

  1. Source = Csv.Document(File.Contents("cube.csv"),[Delimiter=","])
  2. #"Headers" = Table.PromoteHeaders(Source)
  3. #"Type" = Table.TransformColumnTypes(#"Headers",{{"Val", Number.Type}}) in #"Type"

Interestingly we can follow these steps after the fact in the Excel Query Editor. After the first step Source we see the raw CSV data has been imported:

image

The second step promotes the headers:

image

and the last step changes the type of the value column to numeric:

image

File Sizes

The method in (1) requires both an .xlsx file and a .csv file to be present. For this example, we end up with:

02/18/2017  12:49 PM         6,397,317 pivotcube.xlsx
02/18/2017  12:48 PM        62,767,308 cube.csv

The method shown here produces a self-contained .xslx file. We have:

02/18/2017  12:52 PM         2,958,573 pivotcube.xlsx

I am not sure why this version is even smaller than the pivot table with external CSV data, but apparently the internal data model can be squeezed very efficiently. Some of it may have to do with the internal Vertipaq Compression Engine which allows for very efficient column storage of the data in the data model (3). 

References
  1. http://yetanothermathprogrammingconsultant.blogspot.com/2016/04/export-large-result-sets-to-pivot-table.html contains a script to generate a pivot table with a CSV file as an external data source.
  2. http://yetanothermathprogrammingconsultant.blogspot.com/2016/07/excel-pivot-table-and-large-data-sets.html shows how to load CSV data into the data model and create a pivot table by hand.
  3. Alberto Ferrari and Marco Russo, The VertiPaq Engine in DAX, https://www.microsoftpressstore.com/articles/article.aspx?p=2449192&seqNum=3 (chapter from The Definitive Guide to DAX, Business intelligence with Microsoft Excel, SQL Server Analysis Server, and Power BI, Microsoft Press, 2015).

No comments:

Post a Comment