- A directory with CSV files. One file for each symbol.
- A directory with feather files. These are like binary CSV files [2]. Again one file per symbol.
- A single Pickle file.
- GDX file
- name of output directory
- output format
- Data files.
- Text file with a description of the content of the GDX file.
- Python script containing example code on how to read the data files. The data is read into Pandas dataframes.
Example 1: sparse matrix
Sets i /i1*i1000/ j /j1*j1000/ ;
Parameter A(i,j) 'sparse matrix'; A(i,j)$(uniform(0,1)<0.05) = uniform(1,1000);
execute_unload "SparseMatrix.gdx",i,j,A; |
C:\Users\erwin\Downloads\sparsematrixCSV>type gdxcontent.txt Symbol Dim Type Records Explanatory text 1 A 2 Par 50166 sparse matrix 2 i 1 Set 1000 3 j 1 Set 1000 |
C:\Users\erwin\Downloads\sparsematrixCSV>type howtoread.py # example Python reader for data from sparsematrix.gdx import pandas as pd
i = pd.read_csv('i.csv') j = pd.read_csv('j.csv') A = pd.read_csv('A.csv') |
- The column name 'uni_0' probably indicates this is a subset of the universe of set elements in the GAMS model. The zero is probably added because there may be multiple indices. In this case, there are no multiple columns, so the 0 suffix is not needed. In GAMS terms, this would be a '*' column.
- The column 'element_text' is likely reserved for text associated with a set element. We don't have this kind of explanatory text in our model, so this column can be ignored. It probably should not even be there.
- The column names 'i_0' and 'j_0' are mangled. They could just be 'i' and 'j'. This is probably to guard against duplicate column names. GAMS allows something like A(i,i) so the concern is understandable. However, it would be better to mangle only if needed.
C:\Users\erwin\Downloads\sparsematrixPickle>type howtoread.py # example Python reader for data from sparsematrix.gdx import pandas as pd import pickle
print('Pickle file was written with Pandas version 1.4.4') print(f'Your version is {pd. __version__}. This should be the same or newer.')
with open('alldata.pickle', 'rb') as f: alldata = pickle.load(f)
print(f'Keys:{alldata.keys()}')
print(f'Summary of data:\n{alldata}') |
C:\Users\erwin\Downloads\sparsematrixPickle>py howtoread.py Pickle file was written with Pandas version 1.4.4 Your version is 1.5.2. This should be the same or newer. Keys:dict_keys(['i', 'j', 'A']) Summary of data: {'i': uni_0 element_text 0 i1 1 i2 2 i3 3 i4 4 i5 .. ... ... 995 i996 996 i997 997 i998 998 i999 999 i1000
[1000 rows x 2 columns], 'j': uni_0 element_text 0 j1 1 j2 2 j3 3 j4 4 j5 .. ... ... 995 j996 996 j997 997 j998 998 j999 999 j1000
[1000 rows x 2 columns], 'A': i_0 j_1 value 0 i1 j44 339.211722 1 i1 j58 792.568281 2 i1 j64 585.546042 3 i1 j79 270.343439 4 i1 j98 401.826455 ... ... ... ... 50161 i1000 j955 866.032624 50162 i1000 j963 633.530151 50163 i1000 j974 300.124396 50164 i1000 j985 987.395802 50165 i1000 j991 341.163135
[50166 rows x 3 columns]} |
C:\Users\erwin\Downloads\sparsematrixFeather>type howtoread.py # example Python reader for data from sparsematrix.gdx import pandas as pd
i = pd.read_feather('i.feather') j = pd.read_feather('j.feather') A = pd.read_feather('A.feather') |
>cd \gams\41\GMSPython (or whatever your GAMS
installation directory is) |
- They are in long format (database format). This is how things are typically stored in a database. In a spreadsheet or wide format, we would pivot one of the index columns to the right. Pivot and unpivot operations are some of the most important ones when using dataframes.
- They use a sparse representation: the zero values are not stored.
- The index columns form a unique key. There are no duplicates.
- The index columns are strings, and the value column is a double-precision floating point value. When using CSV (text) files, these may appear as integers.
Example 2: sparse network
set i 'nodes' /node1*node100/; alias(i,j); set a(i,j) 'arcs';
a(i,j) = uniform(0,1)<0.1;
* * only display if small * display$(card(i)<50) i,a;
* * export network as GDX file * execute_unload "network.gdx",i,a; |
* * size of network * parameter size(*) 'number of nodes and arcs'; size('nodes') = card(i); size('arcs') = card(a); option size:0; display size;
* * print some statistics on the degrees of the nodes * parameter maxdegree(*) 'max in/out-degree'; maxdegree('in') = smax(i,sum(a(j,i),1)); maxdegree('out') = smax(i,sum(a(i,j),1)); maxdegree('in/out') = smax(i,sum(a(j,i),1)+sum(a(i,j),1)); option maxdegree:0; display maxdegree; |
---- 30 PARAMETER size number of nodes and arcs nodes 100, arcs 1002 ---- 39 PARAMETER maxdegree max in/out-degree in 20, out 17, in/out 36
Let's do the same thing in Python, using the networkx package.
Conclusion
The GAMS script shown in the appendix is a tool to convert a GAMS GDX data file into a representation that can be consumed in standalone Python. This little tool has already been shown to be useful for GAMS modelers to produce data sets for further processing in a Python environment.
References
- Python Package Index, https://pypi.org/search/?q=GDX
- Feather File Format, https://arrow.apache.org/docs/python/feather.html
Appendix: gdx2python.gms
$onText
Export GDX file to Python data files Supported export formats: - multiple CSV files (one per symbol) - multiple Feather files (binary, one per symbol) - Pickle file (single file) The Feather files are the fastest. However, you need to install support for this (package pyarrow). $offText
*----------------------------------------------------------------------- * User-provided settings *-----------------------------------------------------------------------
$set gdxfile network.gdx $set outputdir networkdata
* Output format * can be: CSV,Feather,Pickle $set format CSV *$set format Feather *$set format Pickle
*----------------------------------------------------------------------- * Body *-----------------------------------------------------------------------
* quick checks $if not exist %gdxfile% $abort %gdxfile% does not exist $call mkdir %outputdir% $if errorlevel 1 $abort could not create %outputdir% * you can comment out the previous like if you want to go ahead * in the case the directory already exist.
* for reference dump the table of contents of the GDX file to the output directory $call gdxdump %gdxfile% symbols > %outputdir%\gdxcontent.txt $call type %outputdir%\gdxcontent.txt
* Here we loop over all symbols in the GDX file and export them.
$onEmbeddedCode Python:
import gamstransfer as gt import pandas as pd import pickle import sys import time
gdxfile = r'%gdxfile%' outputdir = r'%outputdir%' format = '%format%'
start = time.time() print(f'Reading {gdxfile}') gdx = gt.Container(gdxfile) print(f'Read time:{time.time()-start:.1f} seconds')
start = time.time()
howtoread = f'''# example Python reader for data from {gdxfile} import pandas as pd '''
if format=='Pickle': alldata = {} pandasversion = pd. __version__ howtoread += 'import pickle'
# run through all symbols in gdx file for sym in gdx.data.keys(): df = gdx.data[sym].records if df is None: continue #print(df.head()) if format=='CSV': fln = f'{outputdir}\\{sym}.csv' print(f' {sym} --> {fln}') df.to_csv(fln,index=False) howtoread += f"\n{sym} = pd.read_csv('{sym}.csv')" if format=='Feather': fln = f'%outputdir%\\{sym}.feather' print(f' {sym} --> {fln}') df.to_feather(fln) howtoread += f"\n{sym} = pd.read_feather('{sym}.feather')" if format=='Pickle': print(f' Reading {sym}') alldata[sym] = df if format=='Pickle': fln = f'{outputdir}\\alldata.pickle' print(f' {gdxfile} --> {fln}') with open(fln, 'wb') as handle: pickle.dump(alldata, handle, protocol=pickle.HIGHEST_PROTOCOL) howtoread += f"\n\nprint('Pickle file was written with Pandas version {pandasversion}')" howtoread += "\nprint(f'Your version is {pd. __version__}. This should be the same or newer.')" howtoread += "\n\nwith open('alldata.pickle', 'rb') as f:" howtoread += "\n alldata = pickle.load(f)" howtoread += "\n\nprint(f'Keys:{alldata.keys()}')" howtoread += "\n\nprint(f'Summary of data:\\n{alldata}')"
fln = f'{outputdir}\\howtoread.py' print(f' Writing {fln}') with open(fln,'w') as f: f.write(howtoread)
print(f'Write time:{time.time()-start:.1f} seconds') $offEmbeddedCode
|
Thank you for your work, that is really helpful. But there is an update about the library import. According to GAMS latest Python API, the command 'import gamstransfer as gt' has been updated. One of the way is 'from gams import transfer as gt'.
ReplyDelete