In [1] I showed how quantiles in GAMS are difficult, and how they can be calculated better in R. Here is some Python code to do the same:
import pandas as pd
df = pd.read_csv("p.csv")
q=df.groupby(['i','j']).quantile([0,.25,.5,.75,1])
print(q)
q.to_csv("q.csv") |
The q data frame looks like:
Val i j i1 j1 0.00 18.002966 0.25 28.242058 0.50 33.222936 0.75 62.736221 1.00 85.770764 j2 0.00 7.644259 0.25 41.375281 0.50 61.313381 0.75 82.127640 1.00 99.813645 j3 0.00 14.017667 0.25 16.559221 0.50 30.775334 0.75 38.482815 1.00 67.223932 i2 j1 0.00 11.938737 0.25 29.259331 0.50 55.029177 0.75 69.633259 1.00 83.258388 j2 0.00 16.857103 0.25 28.783298 0.50 53.358812 0.75 65.534761 1.00 87.373769 j3 0.00 5.608600 0.25 17.433855 0.50 33.311746 0.75 45.566497 1.00 64.926986 |
This is pretty clean. Data frames can easily be read in from CSV files (see the example above) or databases.
The new GAMS Python scripting tool is not very friendly in this respect. We need to do a lot of transformations leading to a low signal-to-noise ratio:
embeddedCode Python: import pandas as pd p = list(gams.get('p', keyFormat=KeyFormat.FLAT)) df = pd.DataFrame(p, columns=["i", "j", "k", "value"]) print("");print("df");print(df) q=df.groupby(['i','j']).quantile([0,.25,.5,.75,1]) print("q");print(q)
# q has one multi-level index
# convert to standard indices q2 = q.reset_index() print("q2");print(q2)
# to extract a set q: convert to list of strings qset = q2["level_2"].unique().astype('str').tolist() print("qset");print(qset) gams.set("q", qset)
# get data itself: list of tuples quantiles = list(zip(q2["i"],q2["j"],q2["level_2"].astype('str'),q2["value"])) print("quantiles");print(quantiles) gams.set("quantiles",quantiles)
endEmbeddedCode q,quantiles
display q,quantiles; |
Note that I am trying to prevent looping over data frame rows here: all operations are on complete columns. The CSV input/output is actually much shorter and cleaner. In this code, there is really one line that does really some work; the rest can be considered as just overhead.
The GAMS interface should probably support data frames directly to overcome this “impedance mismatch.” When users need to mix and match different languages the interface should make things as easy as possible. A GAMS/Python interface that is too low level and stays too close to GAMS asks the user to be reasonably fluent in in both GAMS and Python (the intersection of knowledgeable GAMS and Python users is likely to be small), and write quite some glue-code dealing with getting data in and out. Choosing a better abstraction level would probably help here.
References
- http://yetanothermathprogrammingconsultant.blogspot.nl/2017/08/quantiles.html
No comments:
Post a Comment