Wassily Leontief in front of an IO table, credit:NYU |
Inverting a dense matrix
GAMS allows running pieces of Python code as part of a GAMS model. Unfortunately, the interface is rather low-level and inefficient. Usually "low-level" is associated with high-performance, but in the Python world, this is not the case. Here is an example.
Suppose we want to use the Python interface to invert a matrix \(A\).
The main GAMS file can look like:
$ontext |
* invert.inc gams.printLog("\n###
invert.inc") |
### invert.inc --- .. Copied data from i,A (19.9s) --- .. Inverted A (0.7s) --- .. Created GAMS sparse matrix B (25.6s) --- .. Total elapsed 46.2s
The Leontief inverse
001 'Crop cultivation' 002 'Livestock and sericulture' 003 'Agricultural services' 004 'Forestry' 005 'Fisheries' 006 'Metallic ores' 007 'Non-metallic ores' 008 'Coal' 009 'Crude petroleum and natural gas' 010 'Foods' 011 'Drinks' 012 'Feeds and organic fertilizer, n.e.c.' 013 'Tabacco' 014 'Textile products' 015 'Wearing apparel and other textile products' 016 'Timber and wooden products' 017 'Furniture and fixtures' 018 'Pulp, paper, paperboard and processed paper' 019 'Paper products' 020 'Publishing and printing' 021 'Chemical fertilizer' 022 'Inorganic basic chemical products' 023 'Petrochemical basic products and intermediate chemical products' 024 'Synthetic resins' 025 'Synthetic fibers' 026 'Medicaments' 027 'Final chemical products, n.e.c.' 028 'Petroleum refinery products' 029 'Coal products' 030 'Plastic products' 031 'Rubber products' 032 'Leather, fur skins and miscellaneous leather products' 033 'Glass and glass products' 034 'Cement and cement products' 035 'Pottery, china and earthenware' 036 'Other ceramic, stone and clay products' 037 'Pig iron and crude steel' 038 'Steel products' 039 'Steel castings and forgings and other steel products' 040 'Non-ferrous metals' 041 'Non-ferrous metal products' 042 'Metal products for construction and architecture' 043 'Other metal products' 044 'General industrial machinery' 045 'Special industrial machinery' 046 'Other general machines' 047 'Machinery for office and service industry' 048 'Household electric appliance' 049 'Electronic equipment and communication equipment' 050 'Heavy electrical equipment' 051 'Other electrical machinery' 052 'Motor vehicles' 053 'Ships and repair of ships' 054 'Other transportation equipment and repair of transportation equipment' 055 'Precision instruments' 056 'Miscellaneous manufacturing products' 057 'Construction' 058 'Repair of constructions' 059 'Civil engineering' 060 'Electric power for enterprise use' 061 'Gas and heat supply' 062 'Water supply' 063 'Waste disposal services' 064 'Commerce' 065 'Finance and insurance' 066 'Real estate agencies and rental services' 067 'House rent' 068 'Railway transport' 069 'Road transport(except transport by private cars)' 070 'Transport by private cars' 071 'Water transport' 072 'Air transport' 073 'Freight forwarding' 074 'Storage facility services' 075 'Services relating to transport' 076 'Communication' 077 'Broadcasting' 078 'Public administration' 079 'Education' 080 'Research' 081 'Medical service and health' 082 'Social security' 083 'Other public services' 084 'Advertising, survey and information services' 085 'Goods rental and leasing services' 086 'Repair of motor vehicles and machine' 087 'Other business services' 088 'Amusement and recreational services' 089 'Eating and drinking places' 090 'Hotel and other lodging places' 091 'Other personal services' 092 'Office supplies' 093 'Activities not elsewhere classified'
That is a bit smaller than our previous random matrix.
The main matrix elements of this transaction matrix can be defined as \[x_{i,j} = \text{output of industry $i$ sold to industry $j$}\] I.e. the rows are the sellers and the columns are the buyers [3].
$ontext |
Leontief Inverse for Regional IO Tables
- They tend to be big: the number of rows and columns of \(A\) is the number of regions times the number of sectors.
- The indices for the rows and columns are 2-dimensional: regions and sectors. We either need to map this to a one-dimensional set or we need the matrix-inversion code to accept multi-dimensional sets for the rows and columns.
REGIONS ======= AUS 'Australia' AUT 'Austria' BEL 'Belgium' BGR 'Bulgaria' BRA 'Brazil' CAN 'Canada' CHE 'Switserland' CHN 'China' CYP 'Cyprus' CZE 'Czech Republic' DEU 'Germany' DNK 'Denmark' ESP 'Spain' EST 'Estonia' FIN 'Finland' FRA 'France' GBR 'UK' GRC 'Greece' HRV 'Croatia' HUN 'Hungary' IDN 'Indonesia' IND 'India' IRL 'Ireland' ITA 'Italy' JPN 'Japan' KOR 'South Korea' LTU 'Lithuania' LUX 'Luxembourg' LVA 'Latvia' MEX 'Mexico' MLT 'Malta' NLD 'The Netherlands' NOR 'Norway' POL 'Poland' PRT 'Portugal' ROU 'Romania' RUS 'Russia' SVK 'Slovak Republic' SVN 'Slovenia' SWE 'Sweden' TUR 'Turkey' TWN 'Taiwan' USA 'USA' ROW 'Rest of the world' SECTORS ======= A01 'Crop and animal production, hunting and related service activities' A02 'Forestry and logging' A03 'Fishing and aquaculture' B 'Mining and quarrying' C10-C12 'Manufacture of food products, beverages and tobacco products' C13-C15 'Manufacture of textiles, wearing apparel and leather products' C16 'Manufacture of wood and of products of wood and cork, except furniture; manufacture of articles of straw and plaiting materials' C17 'Manufacture of paper and paper products' C18 'Printing and reproduction of recorded media' C19 'Manufacture of coke and refined petroleum products' C20 'Manufacture of chemicals and chemical products' C21 'Manufacture of basic pharmaceutical products and pharmaceutical preparations' C22 'Manufacture of rubber and plastic products' C23 'Manufacture of other non-metallic mineral products' C24 'Manufacture of basic metals' C25 'Manufacture of fabricated metal products, except machinery and equipment' C26 'Manufacture of computer, electronic and optical products' C27 'Manufacture of electrical equipment' C28 'Manufacture of machinery and equipment n.e.c.' C29 'Manufacture of motor vehicles, trailers and semi-trailers' C30 'Manufacture of other transport equipment' C31_C32 'Manufacture of furniture; other manufacturing' C33 'Repair and installation of machinery and equipment' D35 'Electricity, gas, steam and air conditioning supply' E36 'Water collection, treatment and supply' E37-E39 'Sewerage; waste collection, treatment and disposal activities; materials recovery; remediation activities and other waste management services' F 'Construction' G45 'Wholesale and retail trade and repair of motor vehicles and motorcycles' G46 'Wholesale trade, except of motor vehicles and motorcycles' G47 'Retail trade, except of motor vehicles and motorcycles' H49 'Land transport and transport via pipelines' H50 'Water transport' H51 'Air transport' H52 'Warehousing and support activities for transportation' H53 'Postal and courier activities' I 'Accommodation and food service activities' J58 'Publishing activities' J59_J60 'Motion picture, video and television programme production, sound recording and music publishing activities; programming and broadcasting activities' J61 'Telecommunications' J62_J63 'Computer programming, consultancy and related activities; information service activities' K64 'Financial service activities, except insurance and pension funding' K65 'Insurance, reinsurance and pension funding, except compulsory social security' K66 'Activities auxiliary to financial services and insurance activities' L68 'Real estate activities' M69_M70 'Legal and accounting activities; activities of head offices; management consultancy activities' M71 'Architectural and engineering activities; technical testing and analysis' M72 'Scientific research and development' M73 'Advertising and market research' M74_M75 'Other professional, scientific and technical activities; veterinary activities' N 'Administrative and support service activities' O84 'Public administration and defence; compulsory social security' P85 'Education' Q 'Human health and social work activities' R_S 'Other service activities' T 'Activities of households as employers; undifferentiated goods- and services-producing activities of households for own use' U 'Activities of extraterritorial organizations and bodies'
$ontext |
$ontext gams.printLog("\n###
leontief.inc") |
Conclusions
- The embedded Python API is not well suited for algorithms that require a substantial amount of data to be exchanged with GAMS. It uses a low-level approach, something which is not compatible with Python. Python is just not designed to do large-scale low-level stuff. This is a design problem in the API: the wrong abstraction level was chosen. Data should be exchanged in higher-level data structures such as numpy arrays or Pandas dataframes using high-performance kernels. This will lead to performance gains of at least a factor of 10.
- As a result, an implementation using the Python embedded API suffers from large overhead and requires patience from the user.
Notes
- I have had some good experiences with Cython and Numba to speed up Python code. But I don't think those tools are applicable here.
- GAMS 36.0 comes with a Python-based matrix inversion. It is similar to my version and can be invoked as:
$libInclude linalg -e invert i A B
Unfortunately, it seems to be about 50% slower than my version:
46.125 0.068GB 33 EmbeddedCode (my version)
73.469 0.068GB 107 EmbeddedCode (GAMS linalg version)
(The first column is the time in seconds). I have not analyzed this carefully but I think I do better because I don't have if statements in a tight loop. Note that the actual matrix inversion takes less than a second. - When you want to use this for practical input-output models, you probably want to save the inverse matrix in a GDX file or a restart file so it is readily available.
- We could use an LP solver to invert a matrix [5]. That is more of a curiosity.
- There are good arguments to never invert a matrix [6]. However, for reasons of convenience and familiarity, it is easier to use an explicit inverse here. This is just how users work and how they are educated.
References
- Input-output model, https://en.wikipedia.org/wiki/Input%E2%80%93output_model
- Input-Output Tables for Japan, https://www.soumu.go.jp/english/dgpp_ss/data/io/index.htm
- Ronald E. Miller and Peter D. Blair, Input-Output Analysis, Foundations and Extensions, Cambridge University Press, 2nd edition, 2009.
- World Input-Output Database, http://www.wiod.org/database/wiots16
- Inverting a matrix by LP, http://yetanothermathprogrammingconsultant.blogspot.com/2021/04/inverting-matrix-by-lp.html
- John D. Cook, Don't invert that matrix, https://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/
Erwin, Thanks for putting this together. As usual, your post inspires us (GAMS) to improve our software. I have a few comments. I am only refering to the first part of the post (inverting a matrix).
ReplyDeleteAs a reference point I ran your code with GAMS 36.2.0 on my off-the-shelf machine with Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz and OS Windows 10 Pro 21H1:
--- .. Copied data from i,A (3.0s)
--- .. Inverted A (0.1s)
--- .. Created GAMS sparse matrix B (3.1s)
--- .. Total elapsed 6.2s
While the times are about 7 times faster than what you report, the ratio is unchanged and not good. The gams.get() method is only one way to get data from GAMS to embedded Python code. One can also use the Python OO API which works nicely together with gams2numpy which has methods to move data in bulk between GAMS and Python. Basically you do Gams2Numpy().gmdReadSymbolRaw(gams.db,"A") this gives you a numpy array with the data from A in one swoop (which can then turned cheaply into a pandas dataframe). There are also methods to get the index as strings and obviously also ways to write back a numpy array to GAMS. This has not been well documented but with the upcoming GAMS 37 there will be more about gams2numpy and a new API called Transfer that unifies bulk data exchange across some systems. With GAMS 37 we hope to include Transfer for Python (based on gams2numpy) and Matlab/Octave. R, Julia, and others will follow. This still leaves us with the issue of remapping the GAMS indexes of i (that can have holes, does not have to start at 1, etc. I actually added a dummy set to make the GAMS indexes "more interesting") to the Python indexes of 0..card(i)-1. In your example (and also in our linalg libinclude) a map between these two is established and applied in both directions (for reading A and writing B) using some Python loop. My colleague Adam Christensen (who works on the Python port of Transfer) uses pandas category type for this mapping, no (user) loops required! There is an assumption (we check for it with an assert) that the matrix does not contain an all zero row or column, which is okay for an invert routine. So with the bulk data transfer and the categories the times come down even further:
--- .. Copied data from i,A (0.5s)
--- .. Inverted A (0.1s)
--- .. Created GAMS sparse matrix B (1.3s)
--- .. Total elapsed 1.9s
So we can reduce the total time by a factor of 3. Still a pretty big ratio. Interesting enough with a dedicated tool like your invert.exe and GDX file communication this is still significantly faster (total time ~0.5 secs!) than all the in-memory communication of data and embedded Python code. Nevertheless, embedded Python code in GAMS gives a lot of flexibility (and access to the Python universe) and less of a dependence on GAMS to provide tools to do a specific task. The category stuff is pretty interesting and perhaps we can do part of this remapping already in gams2numpy (which uses the numpy C api and executes fast). The entire invert example redone with gams2numpy and pandas catagories is accessible here: https://cloud.gams.com/s/FYZPsXAcPfn3pfa
Hi Michael. That looks very promising. Your link seems to be just a reformatted version of my code (no gams2numpy/pandas).
DeleteSorry, I messed up by uploading the "reference" model. I have updated the file now on cloud.gams.com and the link https://cloud.gams.com/s/FYZPsXAcPfn3pfa should have now the version with gams2numpy/pandas.
Delete