Saturday, January 4, 2020

Small Blending Problem in PuLP




In [1] a small blending problem is proposed.

There are different raw steel materials that have to be mixed (blended) into a final material that has certain specifications. In this case the specifications are limits on the elements Carbon (C), Copper (Cu) and Manganese (Mn). We assume things blend linearly.

Blending is a traditional linear programming application, and models are found in many text books.

The problem is small so let's try PuLP here.

I'll try to write a bit about indexing using strings (instead of integers), and compare PuLP with CVXPY and GAMS. As the model is small, I'll also add some data manipulation (using data frames) and some simple reporting (also using data frames). I use string indexing in Pulp (with GAMS this is standard). Of course CVXPY is very different: it is matrix based. So there things are position dependent. The idea it to extract data from the data frame in such a way that positions are predictable.

Problem data


The data for the problem is as follows:


Demand: 5000 Kg

Specification of final material:

    Element      %Minimum %Max   
    Carbon       2         3     
    Copper       0.4       0.6   
    Manganese    1.2       1.65  


Raw material inventory:

Alloy          C%   Cu%   Mn%     Stocks kg Price € / kg
Iron alloy     2.50 0.00  1.30    4000      1.20
Iron alloy     3.00 0.00  0.80    3000      1.50
Iron alloy     0.00 0.30  0.00    6000      0.90
Copper alloy   0.00 90.00 0.00    5000      1.30
Copper alloy   0.00 96.00 4.00    2000      1.45
Aluminum alloy 0.00 0.40  1.20    3000      1.20
Aluminum alloy 0.00 0.60  0.00   2,500      1.00


Mathematical Model


The basic model is:

Blending Model
\[\begin{align}\min & \sum_i \color{darkblue}{\mathit{Cost}}_i\cdot \color{darkred}{\mathit{Use}}_i\\ & \color{darkblue}{\mathit{Min}}_j \le \frac{\displaystyle\sum_i \color{darkblue}{\mathit{Element}}_{i,j}\cdot \color{darkred}{\mathit{Use}}_i}{\displaystyle \sum_i \color{darkred}{\mathit{Use}}_i} \le \color{darkblue}{\mathit{Max}}_j \\ & \sum_i \color{darkred}{\mathit{Use}}_i = \color{darkblue}{\mathit{Demand}} \\ & 0 \le \color{darkred}{\mathit{Use}}_i \le \color{darkblue}{\mathit{Available}}_i \end{align} \]

Here we use: \[\begin{cases} i  & \text{Types of raw material in stock}\\ j & \text{Element with limits}\\ {\mathit{Cost}}_i & \text{Unit cost of raw material} \\ {\mathit{Min}}_j, {\mathit{Max}}_j & \text{Limits on element content in final product} \\   {\mathit{Element}}_{i,j} & \text{Content of elements in raw material}\\ {\mathit{Demand}} & \text{Demand for final product}\\ {\mathit{Available}}_i  & \text{Availability of raw material} \\  {\mathit{Use}}_i  & \text{Decision variable: how much raw material to use} \end{cases}\] The blending constraint is nonlinear: we divide by the total weight of the final product to calculate the percentages. We can linearize this fraction in two ways:

  1. multiply all sides by \(\sum_i \mathit{Use}_i\). This leads to \[\mathit{Min}_j \cdot\sum_i \mathit{Use}_i \le \sum_i \mathit{Element}_{i,j}\cdot\mathit{Use}_i \le \mathit{Max}_j \sum_i \mathit{Use}_i\] This first reformulation is especially useful when the total final product is not constant. Note that this formulation is sometimes difficult to recognize when rewritten as something like: \[\begin{cases} \displaystyle \sum_i (\mathit{Element}_{i,j}-\mathit{Min}_j )\mathit{Use}_i  \ge 0 \>\>\forall j \\ \displaystyle \sum_i (\mathit{Max}_j - \mathit{Element}_{i,j})\mathit{Use}_i  \ge 0 \>\>\forall j \end{cases}\] 
  2. In our case we know that \(\sum_i \mathit{Use}_i\) is constant: it is always equal to \(\mathit{Demand}\). I.e. \[ \mathit{Min}_j  \le  \frac{1}{\mathit{Demand}} \sum_i \mathit{Element}_{i,j}\cdot\mathit{Use}_i \le \mathit{Max}_j\]


We often need to split sandwich equations into two simple inequalities. That often leads to duplicate expressions: \[\begin{cases}  \displaystyle  \frac{1}{\mathit{Demand}}\sum_i \mathit{Element}_{i,j}\cdot\mathit{Use}_i \ge \mathit{Min}_j \>\>\forall j \\  \displaystyle \frac{1}{\mathit{Demand}}\sum_i \mathit{Element}_{i,j} \cdot\mathit{Use}_i \le \mathit{Max}_j  \>\>\forall j \end{cases}\] For small problems this is not an issue. For larger problems, I prefer to introduce extra variables that prevents these duplicate expressions. We end up with the following linear programming formulation:

Linear Programming Formulation
\[\begin{align}\min & \sum_i \color{darkblue}{\mathit{Cost}}_i \cdot\color{darkred}{\mathit{Use}}_i\\ & \color{darkblue}{\mathit{Demand}} \cdot \color{darkred}{\mathit{Content}}_j = \sum_i \color{darkblue}{\mathit{Element}}_{i,j} \cdot\color{darkred}{\mathit{Use}}_i \\ & \sum_i \color{darkred}{\mathit{Use}}_i = \color{darkblue}{\mathit{Demand}} \\ & \color{darkred}{\mathit{Use}}_i \in [0, \color{darkblue}{\mathit{Available}}_i]\\ & \color{darkred}{\mathit{Content}}_j \in [\color{darkblue}{\mathit{Min}}_j,\color{darkblue}{\mathit{Max}}_j] \end{align} \]


Even for small, almost trivial models, it makes sense to first develop a mathematical model. Especially, if you are not very experienced in developing linear programming models. Starting with a pen and a piece of paper is sometimes better than immediately start coding.

Implementation in Python/Pulp


An implementation using PuLP can look like:


from io import StringIO
import pandas as pd
import pulp as lp

# for inputting tabular data below
def table(s):
  return pd.read_csv(StringIO(s),sep='\s+',index_col='ID')

#------------------------------------------------------------------
# data
#------------------------------------------------------------------

demand = 5000

requirements = table("""
   ID  Element      Min   Max
   C   Carbon       2     3
   Cu  Copper       0.4   0.6
   Mn  Manganese    1.2   1.65
    """)

supplyData = table("""
  ID  Alloy             C       Cu     Mn     Stock   Price
  A   "Iron alloy"      2.50    0.00   1.30   4000    1.20
  B   "Iron alloy"      3.00    0.00   0.80   3000    1.50
  C   "Iron alloy"      0.00    0.30   0.00   6000    0.90
  D   "Copper alloy"    0.00   90.00   0.00   5000    1.30
  E   "Copper alloy"    0.00   96.00   4.00   2000    1.45
  F   "Aluminum alloy"  0.00    0.40   1.20   3000    1.20
  G   "Aluminum alloy"  0.00    0.60   0.00   2500    1.00
  """)

print("----- Data-------")
print(requirements)
print(supplyData)


#------------------------------------------------------------------
# derived data
#------------------------------------------------------------------

# our sets are stockItems ["A","B",..] and elements ["C","Cu",...] 
Items = supplyData.index
Elements = requirements.index

print("----- Indices-------")
print(Items)
print(Elements)

#------------------------------------------------------------------
# LP Model
#------------------------------------------------------------------


use = lp.LpVariable.dicts("Use",Items,0,None,cat='Continuous')
content = lp.LpVariable.dicts("Content",Elements,0,None,cat='Continuous')

model = lp.LpProblem("Steel", lp.LpMinimize)

# objective : minimize cost
model += lp.lpSum([use[i]*supplyData.loc[i,'Price'] for i in Items ])

# upper bounds wrt availability
for i in Items:
  model += use[i] <= supplyData.loc[i,'Stock']

# final content of elements and their bounds  
for j in Elements:
  model += demand*content[j] == lp.lpSum([use[i]*supplyData.loc[i,j] for i in Items])
  model += content[j] >= requirements.loc[j,'Min']
  model += content[j] <= requirements.loc[j,'Max']

# meet demand
model += lp.lpSum([use[i] for i in Items]) == demand


# for debugging
#print(model)


#------------------------------------------------------------------
# Solve and reporting
#------------------------------------------------------------------

model.solve()


print("----- Model Results-------")
print("Status:", lp.LpStatus[model.status])
print("Objective:",lp.value(model.objective))


# collect results
L = []
for i in Items: 
  L.append(['use',i,0.0,use[i].varValue,supplyData.loc[i,'Stock']])
for j in Elements:
  L.append(['content',j,requirements.loc[j,'Min'],content[j].varValue,requirements.loc[j,'Max']])
results = pd.DataFrame(L,columns=['Variable','Index','Lower','Value','Upper'])
print(results)


Notes:

  • We input the basic data as data frames. Data frames are a standard way to handle tabular data. Data frames are originally from the R statistical software system.
  • Usually read_csv is for CSV files. Here we use it to read from a string. Blanks are used as separator to make the table more readable for humans.
  • For each data frame we added an index column. This index will allow us to select a row from the data frame. Note that the index is a string. In general using strings as index is safer than using an index number. We see much earlier that things are wrong when making a mistake like using \(j\) (element) instead of \(i\) (raw material).
  • Python Pandas allows duplicate indices. We can check for this using duplicated() function.
  • Because we access the data by name, it would not matter if the rows or columns are in a different position. This is more like a database table, where we assume no particular ordering. 
  • We also use a data frame for reporting. Data frames are printed in a nicer way than Python arrays, and they can be exported to CVS files or spreadsheet with one function call.
  • The variables are also indexed by names. This is accomplished by lp.LpVariable.dicts(). This is safer than using a standard array of variables.
  • AFAIK, PuLP can only handle a single scalar bound in the LpVariable statement (e.g. all lower bounds for a variables are zero). This means: we have to specify a number of bounds as explicit singleton constraints or use a var.bounds statement. 

The results look like:


----- Data-------
      Element  Min   Max
ID                      
C      Carbon  2.0  3.00
Cu     Copper  0.4  0.60
Mn  Manganese  1.2  1.65
             Alloy    C    Cu   Mn  Stock  Price
ID                                              
A       Iron alloy  2.5   0.0  1.3   4000   1.20
B       Iron alloy  3.0   0.0  0.8   3000   1.50
C       Iron alloy  0.0   0.3  0.0   6000   0.90
D     Copper alloy  0.0  90.0  0.0   5000   1.30
E     Copper alloy  0.0  96.0  4.0   2000   1.45
F   Aluminum alloy  0.0   0.4  1.2   3000   1.20
G   Aluminum alloy  0.0   0.6  0.0   2500   1.00
----- Indices-------
Index(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='object', name='ID')
Index(['C', 'Cu', 'Mn'], dtype='object', name='ID')
----- Model Results-------
Status: Optimal
Objective: 5887.57427835
  Variable Index  Lower        Value    Upper
0      use     A    0.0  4000.000000  4000.00
1      use     B    0.0     0.000000  3000.00
2      use     C    0.0   397.763020  6000.00
3      use     D    0.0     0.000000  5000.00
4      use     E    0.0    27.612723  2000.00
5      use     F    0.0   574.624260  3000.00
6      use     G    0.0     0.000000  2500.00
7  content     C    2.0     2.000000     3.00
8  content    Cu    0.4     0.600000     0.60
9  content    Mn    1.2     1.200000     1.65


Safety


We will get an error if we misspell things. E.g. if we use in the second table Mgn instead of Mn, we will see:


KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2896             try:
-> 2897                 return self._engine.get_loc(key)
   2898             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Mn'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
9 frames
<ipython-input-3-d87ed8f34980> in <module>()
     59 # final content of elements and their bounds
     60 for j in Elements:
---> 61   model += demand*content[j] == lp.lpSum([use[i]*supplyData.loc[i,j] for i in Items])
     62   model += content[j] >= requirements.loc[j,'Min']
     63   model += content[j] <= requirements.loc[j,'Max']

<ipython-input-3-d87ed8f34980> in <listcomp>(.0)
     59 # final content of elements and their bounds
     60 for j in Elements:
---> 61   model += demand*content[j] == lp.lpSum([use[i]*supplyData.loc[i,j] for i in Items])
     62   model += content[j] >= requirements.loc[j,'Min']
     63   model += content[j] <= requirements.loc[j,'Max']

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in __getitem__(self, key)
   1416                 except (KeyError, IndexError, AttributeError):
   1417                     pass
-> 1418             return self._getitem_tuple(key)
   1419         else:
   1420             # we by definition only have the 0th axis

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
    803     def _getitem_tuple(self, tup):
    804         try:
--> 805             return self._getitem_lowerdim(tup)
    806         except IndexingError:
    807             pass

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
    959                     return section
    960                 # This is an elided recursive call to iloc/loc/etc'
--> 961                 return getattr(section, self.name)[new_key]
    962 
    963         raise IndexingError("not applicable")

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in __getitem__(self, key)
   1422 
   1423             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1424             return self._getitem_axis(maybe_callable, axis=axis)
   1425 
   1426     def _is_scalar_access(self, key: Tuple):

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1848         # fall thru to straight lookup
   1849         self._validate_key(key, axis)
-> 1850         return self._get_label(key, axis=axis)
   1851 
   1852 

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _get_label(self, label, axis)
    154             # but will fail when the index is not present
    155             # see GH5667
--> 156             return self.obj._xs(label, axis=axis)
    157         elif isinstance(label, tuple) and isinstance(label[axis], slice):
    158             raise IndexingError("no slices here, handle elsewhere")

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
   3735             loc, new_index = self.index.get_loc_level(key, drop_level=drop_level)
   3736         else:
-> 3737             loc = self.index.get_loc(key)
   3738 
   3739             if isinstance(loc, np.ndarray):

/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897                 return self._engine.get_loc(key)
   2898             except KeyError:
-> 2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2900         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2901         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Mn'


Pulp is not giving a very well-formed error message here (not sure if Pulp can actually do that -- Python is issuing this before Pulp sees what is happening). But at least we are alerted (rather heavy-handedly) there is something wrong when we use Mn.  Careful inspection of the stack frame shows we have a problem in the constraint

 model += demand*content[j] == lp.lpSum([use[i]*supplyData.loc[i,j] for i in Items])

This error is actually generating an exception inside an exception handler!

Of course a much better and simpler error message would be: "supplyData.loc["A","Mn"]: Column "Mn" not found in data frame supplyData." IMHO programmers do not pay enough attention to provide meaningful error messages.

Solver log


The default solver is CBC via a DLL call. I don't think it is possible to see the solver log using this setup. I prefer to see the solver log, just to make sure there are no surprises. For this I used:

model.solve(lp.COIN_CMD(msg=1))

This will call the CBC executable (via an MPS file) and it will show the CBC log:


Welcome to the CBC MILP Solver
Version: 2.9
Build Date: Jan  6 2019

command line - cbc.exe fbbd73baa2494109b8e990ce26eb79b6-pulp.mps branch printingOptions all solution fbbd73baa2494109b8e990ce26eb79b6-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 22 COLUMNS
At line 64 RHS
At line 82 BOUNDS
At line 83 ENDATA
Problem MODEL has 17 rows, 10 columns and 34 elements
Coin0008I MODEL read with 0 errors
Presolve 4 (-13) rows, 7 (-3) columns and 18 (-16) elements
0  Obj 479.87991 Primal inf 10199.217 (4)
3  Obj 5887.5743
Optimal - objective value 5887.5743
After Postsolve, objective 5887.5743, infeasibilities - dual 1275.5592 (2), primal 0 (0)
Presolved model was optimal, full model needs cleaning up
Optimal - objective value 5887.5743
Optimal objective 5887.574275 - 3 iterations time 0.012, Presolve 0.00
Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.03   (Wallclock seconds):       0.03


The solver log shows that the presolver removes 13 of the 17 rows. This high reduction rate is related to the singleton constraints. When we look at the model we generated 3+3+7=13 bound constraints. The presolver is getting rid of these and makes proper bounds of them.

Note that: msg=1 may not always work when running as a Jupyter notebook.

Debugging 


For debugging PuLP models, I recommend:

  1. print(model). Printing the model shows how PuLP interpreted the constraints.
  2. Writing an LP file: model.writeLP("steel.lp").

The output of print(model) is:


Steel:
MINIMIZE
1.2*Use_A + 1.5*Use_B + 0.9*Use_C + 1.3*Use_D + 1.45*Use_E + 1.2*Use_F + 1.0*Use_G + 0.0
SUBJECT TO
_C1: Use_A <= 4000

_C2: Use_B <= 3000

_C3: Use_C <= 6000

_C4: Use_D <= 5000

_C5: Use_E <= 2000

_C6: Use_F <= 3000

_C7: Use_G <= 2500

_C8: 5000 Content_C - 2.5 Use_A - 3 Use_B = 0

_C9: Content_C >= 2

_C10: Content_C <= 3

_C11: 5000 Content_Cu - 0.3 Use_C - 90 Use_D - 96 Use_E - 0.4 Use_F
 - 0.6 Use_G = 0

_C12: Content_Cu >= 0.4

_C13: Content_Cu <= 0.6

_C14: 5000 Content_Mn - 1.3 Use_A - 0.8 Use_B - 4 Use_E - 1.2 Use_F = 0

_C15: Content_Mn >= 1.2

_C16: Content_Mn <= 1.65

_C17: Use_A + Use_B + Use_C + Use_D + Use_E + Use_F + Use_G = 5000

VARIABLES
Content_C Continuous
Content_Cu Continuous
Content_Mn Continuous
Use_A Continuous
Use_B Continuous
Use_C Continuous
Use_D Continuous
Use_E Continuous
Use_F Continuous
Use_G Continuous


The LP file looks like:


\* Steel *\
Minimize
OBJ: 1.2 Use_A + 1.5 Use_B + 0.9 Use_C + 1.3 Use_D + 1.45 Use_E + 1.2 Use_F
 + Use_G
Subject To
_C1: Use_A <= 4000
_C10: Content_C <= 3
_C11: 5000 Content_Cu - 0.3 Use_C - 90 Use_D - 96 Use_E - 0.4 Use_F
 - 0.6 Use_G = 0
_C12: Content_Cu >= 0.4
_C13: Content_Cu <= 0.6
_C14: 5000 Content_Mn - 1.3 Use_A - 0.8 Use_B - 4 Use_E - 1.2 Use_F = 0
_C15: Content_Mn >= 1.2
_C16: Content_Mn <= 1.65
_C17: Use_A + Use_B + Use_C + Use_D + Use_E + Use_F + Use_G = 5000
_C2: Use_B <= 3000
_C3: Use_C <= 6000
_C4: Use_D <= 5000
_C5: Use_E <= 2000
_C6: Use_F <= 3000
_C7: Use_G <= 2500
_C8: 5000 Content_C - 2.5 Use_A - 3 Use_B = 0
_C9: Content_C >= 2
End

The information is basically the same, but the ordering of the rows is a bit different.

Comparison to CVXPY


We can try to model and solve the same problem using CVXPY. CVXPY is matrix oriented, so very different than PuLP. Here is my attempt:


from io import StringIO
import pandas as pd
import numpy as np
import cvxpy as cp

# for inputting tabular data below
def table(s):
  return pd.read_csv(StringIO(s),sep='\s+',index_col='ID')

#------------------------------------------------------------------
# data
#------------------------------------------------------------------

demand = 5000

requirements = table("""
   ID  Element      Min   Max
   C   Carbon       2     3
   Cu  Copper       0.4   0.6
   Mn  Manganese    1.2   1.65
    """)

supplyData = table("""
  ID  Alloy             C       Cu     Mn     Stock   Price
  A   "Iron alloy"      2.50    0.00   1.30   4000    1.20
  B   "Iron alloy"      3.00    0.00   0.80   3000    1.50
  C   "Iron alloy"      0.00    0.30   0.00   6000    0.90
  D   "Copper alloy"    0.00   90.00   0.00   5000    1.30
  E   "Copper alloy"    0.00   96.00   4.00   2000    1.45
  F   "Aluminum alloy"  0.00    0.40   1.20   3000    1.20
  G   "Aluminum alloy"  0.00    0.60   0.00   2500    1.00
  """)

print("----- Data-------")
print(requirements)
print(supplyData)

#------------------------------------------------------------------
# derived data
#------------------------------------------------------------------

# our sets are stockItems ["A","B",..] and elements ["C","Cu",...] 
Items = supplyData.index
Elements = requirements.index

# extract arrays (make sure order is identical)
Min = requirements.loc[Elements,"Min"]
Max = requirements.loc[Elements,"Max"]
Cost = supplyData.loc[Items,"Price"]
Avail = supplyData.loc[Items,"Stock"]
Element = supplyData.loc[Items,Elements]

# counts
NumItems = np.shape(Items)[0]
NumElements  = np.shape(Elements)[0]

# reshape into proper Numpy column vectors to make cvxpy happy
Min = np.reshape(Min.to_numpy(),(NumElements,1))
Max = np.reshape(Max.to_numpy(),(NumElements,1))
Cost = np.reshape(Cost.to_numpy(),(NumItems,1))
Avail = np.reshape(Avail.to_numpy(),(NumItems,1))
Element = Element.to_numpy()

#------------------------------------------------------------------
# LP Model
#------------------------------------------------------------------

use = cp.Variable((NumItems,1),"Use",nonneg=True)
content = cp.Variable((NumElements,1),"Content",nonneg=True)

model = cp.Problem(cp.Minimize(Cost.T @ use),
                   [cp.sum(use) == demand,
                    cp.multiply(demand,content) == Element.T @ use,  
                    content >= Min,
                    content <= Max,
                    use <= Avail                    
                   ])

#------------------------------------------------------------------
# Solve and reporting
#------------------------------------------------------------------

model.solve(solver=cp.ECOS,verbose=True)

print("----- Model Results-------")
print("status:",model.status)
print("objective:",model.value)
results = pd.DataFrame({'variable':'use', 
                        'index': Items, 
                        'lower':0, 
                        'level':use.value.flatten(),
                        'upper':Avail.flatten()
                        })
results = results.append(pd.DataFrame({'variable':'content', 
                        'index': Elements, 
                        'lower':Min.flatten(), 
                        'level':content.value.flatten(),
                        'upper':Max.flatten()
                        }))
print(results)



Notes:

  • I did my best to make sure that the ordering of rows and columns in the data frames is not significant. 
  • We convert the information in the data frames to standard NumPy arrays for the benefit of CVXPY. (A column in a dataframe is a pandas series).
  • If we don't do proper shaping of the arrays, we may see error messages like: ValueError: Cannot broadcast dimensions  (3,) (3, 1)
  • The model is compact, but we needed to put more effort in data extraction. In optimization, it is not at all unusual that data stuff takes more effort than the model equations.

The results look like:


----- Data-------
      Element  Min   Max
ID                      
C      Carbon  2.0  3.00
Cu     Copper  0.4  0.60
Mn  Manganese  1.2  1.65
             Alloy    C    Cu   Mn  Stock  Price
ID                                              
A       Iron alloy  2.5   0.0  1.3   4000   1.20
B       Iron alloy  3.0   0.0  0.8   3000   1.50
C       Iron alloy  0.0   0.3  0.0   6000   0.90
D     Copper alloy  0.0  90.0  0.0   5000   1.30
E     Copper alloy  0.0  96.0  4.0   2000   1.45
F   Aluminum alloy  0.0   0.4  1.2   3000   1.20
G   Aluminum alloy  0.0   0.6  0.0   2500   1.00

ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS

It     pcost       dcost      gap   pres   dres    k/t    mu     step   sigma     IR    |   BT
 0  +6.293e+03  -4.362e+04  +9e+04  1e-01  8e-02  1e+00  4e+03    ---    ---    1  1  - |  -  - 
 1  +5.462e+03  -6.110e+04  +7e+04  2e-01  5e-02  2e+03  3e+03  0.5361  8e-01   0  0  0 |  0  0
 2  +5.497e+03  +1.418e+03  +7e+03  1e-02  4e-03  5e+02  3e+02  0.9313  3e-02   0  0  0 |  0  0
 3  +4.981e+03  +3.654e+03  +2e+03  4e-03  1e-03  2e+02  1e+02  0.6947  5e-02   0  0  0 |  0  0
 4  +5.687e+03  +3.974e+03  +2e+03  7e-03  9e-04  3e+02  9e+01  0.4022  7e-01   0  0  0 |  0  0
 5  +5.653e+03  +5.326e+03  +5e+02  1e-03  2e-04  2e+01  2e+01  0.9127  1e-01   0  0  0 |  0  0
 6  +5.692e+03  +5.535e+03  +2e+02  5e-04  8e-05  1e+01  1e+01  0.5874  1e-01   0  0  0 |  0  0
 7  +5.791e+03  +5.642e+03  +2e+02  6e-04  5e-05  2e+01  7e+00  0.7361  5e-01   0  0  0 |  0  0
 8  +5.843e+03  +5.798e+03  +6e+01  2e-04  2e-05  5e+00  2e+00  0.9890  4e-01   0  0  0 |  0  0
 9  +5.886e+03  +5.883e+03  +4e+00  1e-05  1e-06  3e-01  2e-01  0.9454  1e-02   0  0  0 |  0  0
10  +5.888e+03  +5.888e+03  +5e-02  2e-07  3e-08  4e-03  2e-03  0.9890  2e-03   0  0  0 |  0  0
11  +5.888e+03  +5.888e+03  +6e-04  2e-09  5e-10  5e-05  2e-05  0.9890  1e-04   1  0  0 |  0  0
12  +5.888e+03  +5.888e+03  +6e-06  2e-11  6e-12  5e-07  3e-07  0.9890  1e-04   1  0  0 |  0  0

OPTIMAL (within feastol=1.9e-11, reltol=1.1e-09, abstol=6.3e-06).
Runtime: 0.000566 seconds.

----- Model Results-------
status: optimal
objective: 5887.574272281105
  variable index  lower         level    upper
0      use     A    0.0  4.000000e+03  4000.00
1      use     B    0.0  1.283254e-06  3000.00
2      use     C    0.0  3.977630e+02  6000.00
3      use     D    0.0  5.135476e-07  5000.00
4      use     E    0.0  2.761272e+01  2000.00
5      use     F    0.0  5.746243e+02  3000.00
6      use     G    0.0  5.163966e-06  2500.00
0  content     C    2.0  2.000000e+00     3.00
1  content    Cu    0.4  5.999999e-01     0.60
2  content    Mn    1.2  1.200000e+00     1.65


The results are not rounded. That often means that the solution of an interior point algorithm looks a bit ugly. In essence this is the same solution as we found with PuLP/CBC.

Comparison to GAMS


The GAMS model is closer to PuLP:


*---------------------------------------------------
* data
*---------------------------------------------------

set
  i 'items from inventory' /A*G/
  j 'elements' /C,Cu,Mn/
;

table requirements(j,*)
         Min   Max
   C     2     3
   Cu    0.4   0.6
   Mn    1.2   1.65
;

table supplyData(i,*)
       C      Cu     Mn     Stock   Price
  A    2.5            1.3    4000    1.20
  B    3              0.8    3000    1.50
  C            0.3           6000    0.90
  D           90             5000    1.30
  E           96      4      2000    1.45
  F            0.40   1.20   3000    1.20
  G            0.60          2500    1.00
;

scalar demand /5000/;

*---------------------------------------------------
* LP Model
*---------------------------------------------------

positive variables
    use(i)       'usage of raw material'
    content(j)   'characteristics of final product'
;


* bounds
use.up(i) = supplyData(i,'Stock');
content.lo(j) = requirements(j,'Min');
content.up(j) = requirements(j,'Max');

variable totalCost 'objective variable';

equations
    obj  'objective'
    calcContent(j) 'calculate contents of final product'
    meetDemand 'total demand must be met'
;

obj.. totalCost =e= sum(i, use(i)*supplyData(i,'Price'));

calcContent(j).. demand*content(j) =e= sum(i, use(i)*supplyData(i,j));

meetDemand.. sum(i, use(i)) =e= demand;

model blending /all/;

*---------------------------------------------------
* Solve and reporting
*---------------------------------------------------
solve blending minimizing totalCost using lp;

parameter results(*,*,*);
results('use',i,'lower') = 0;
results('use',i,'level') = use.l(i);
results('use',i,'upper') = supplyData(i,'Stock');
results('content',j,'lower') = requirements(j,'Min');
results('content',j,'level') = content.l(j);
results('content',j,'upper') = requirements(j,'Max');
display results;

Notes:

  • In GAMS we start with the sets, followed by the data. With external data we can extract sets from data.
  • The rows and columns in the tables are not order dependent: we can change the order without changing the meaning of the model. 
  • We can specify bounds directly: no need for singleton constraints.
  • The statement results('use',i,'lower') = 0; is not really needed as the default value is zero. (To be more precise: all data is stored sparse, where "does not exist" is the same as zero). However, this statement forces the set element 'lower' to be before 'level' and 'upper'. Another way to enforce this, is to introduce a dummy set: set dummy /lower,level,upper/. We don't have to use the set. Just the declaration would instill an element ordering.
  • I used * in the data tables. This is conveniently disabling domain checking, but it is also dangerous. In this case, if we change in the supplyData table "Mn" to "Mng", we will not see an error. It would be safer to use full domain checking, which requires to add a few sets.
  • The safe version of the data part in GAMS would be:
    set
      i    'items from inventory' /A*G/
      d    'demand limits' /Min,Max/
      s    'supply data' /C,Cu,Mn,Stock,Price/
      j(s) 'elements' /C,Cu,Mn/
    ;
    
    table requirements(j,d)
             Min   Max
       C     2     3
       Cu    0.4   0.6
       Mn    1.2   1.65
    ;
    
    table supplyData(i,s)
           C      Cu     Mng    Stock   Price
      A    2.5            1.3    4000    1.20
      B    3              0.8    3000    1.50
      C            0.3           6000    0.90
      D           90             5000    1.30
      E           96      4      2000    1.45
      F            0.40   1.20   3000    1.20
      G            0.60          2500    1.00
    ;
    
    Here the typo "Mng" will immediately give a good error message:
      20  table supplyData(i,s)
      21         C      Cu     Mng    Stock   Price
    ****                         $170
    **** 170  Domain violation for element
    For production models it is best to use domain checking throughout.
  • IMHO, this GAMS model looks a bit more streamlined than the Python models.


The output looks like:


----     70 PARAMETER results  

                 lower       level       upper

use    .A                 4000.000    4000.000
use    .B                             3000.000
use    .C                  397.763    6000.000
use    .D                             5000.000
use    .E                   27.613    2000.000
use    .F                  574.624    3000.000
use    .G                             2500.000
content.C        2.000       2.000       3.000
content.Cu       0.400       0.600       0.600
content.Mn       1.200       1.200       1.650


References


  1. Translating a LP from Excel to Python, https://stackoverflow.com/questions/59579342/translating-a-lp-from-excel-to-python-pulp

3 comments:

  1. > Min = np.reshape(Min.to_numpy(),(NumElements,1))

    You can do this via Min = Min.values.reshape(-1,1)

    ReplyDelete
  2. Hello,

    Thanks for the great post. I have just started using PuLP and this helps a lot. One question: How do you get the solver log? In my case, even with the option msg=1, I don't get anything (I am using Spyder).

    ReplyDelete
    Replies
    1. I will answer to myself: I get the log when I run the script from the command window.

      Delete