#### Background

variableinvIA(r,p,r,p) 'inv(I-A)
calculated by a CNS model'; equation
LinEq(r,p,r,p) 'IA*invIA=I';LinEq(rp,rp2).. sum(rp3, IA(rp,rp3)*invIA(rp3,rp2))
=e= I(rp,rp2);model Linear /linEq/;solve linear using cns;option invIA:3:2:2;display invIA.l;* accuracy check.* same test as in toolscalar err 'max error in | (I-A)*inv(I-A) - I |';err = smax((rp,rp2),abs(sum(rp3, IA(rp,rp3)*invIA.l(rp3,rp2))
- I(rp,rp2)));display err; |

#### Attempt 1: Microsoft Visual C++ with the Eigen library.

- It contains quite a few linear algebra algorithms, including LU factorization, solving, and inverting matrices.
- It allows to write easy-to-read matrix expressions. E.g. to check if things are ok, I added a simple-minded accuracy check: \[\max | (I-A)^{-1} \cdot (I-A) - I | \] should be small. Here \(|.|\) indicates elementwise absolute values. This can be written as:

double maxvalue = (INV * IA - MatrixXd::Identity(matrixSize, matrixSize)).array().abs().maxCoeff(); - Support for parallel computing using OpenMP [4]. MSVC++ also supports this.

- By default OMP (and Eigen) uses 16
**threads**. This machine has 8 cores and 16 hardware threads. - Reading the input GDX file is quite fast: 1.3 seconds. (Note: this is read in "raw" mode for performance reasons). There is also some hashing going on.
- The nonzero count of \(I-A\) is 4.9 million. The inverse will be dense, so \(19278^2=372\) million elements. We use here dense storage schemes and dense algorithms.
- Instead of calling inverse(), I split the calculation explicitly into two parts: LU decomposition and solve. The time to perform the LU factorization looks reasonable to me, but the solve time is rather large.
- We can write the GDX file a bit faster by using
**compression**. The writing time goes down from 29.4 to 14.8 seconds.

#### Attempt 2: Intel/Clang with the MKL library.

- MKL only uses 8 threads by default: the number of physical cores.
- Both the LU decomposition, but especially the solve part is much faster.
- Here we used compression when writing the GDX file.
- I used Row Major Order memory layout (i.e. C style row-wise storage).
- MKL is a bit more low-level. Basically, we use BLAS[6] and LAPACK[7].

#### Conclusion

Inverting a 19278x19278 matrix | ||
---|---|---|

Library | LU factorization | Solve |

MSVC++/Eigen | 49 seconds | 1047 seconds |

Intel/Clang/MKL | 24 seconds | 68 seconds |

#### References

- Matrix operations via GAMS Embedded Python: multi-regional Input-Output tables, https://yetanothermathprogrammingconsultant.blogspot.com/2021/08/matrix-operations-via-gams-embedded.html
- John D. Cook,
*Don't invert that matrix*, https://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/ - Eigen, https://eigen.tuxfamily.org/index.php
- OpenMP, https://www.openmp.org/.
- The LLVM Compiler Infrastructure, https://llvm.org/
- Basic Linear Algebra Subprograms, https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms.
- LAPACK, https://en.wikipedia.org/wiki/LAPACK.