## Thursday, May 21, 2020

### A small time arbitrage model

In [1] the following question was posed:

We have monthly prices for a product and I have predictions for the next 12 months. The idea is to exploit this and buy when prices are low and sell when they are high. We have limits on how much we can buy and sell and there are also inventory capacity constraints. What plan is the most profitable?

This is an interesting little problem. We will use R to experiment with this.

#### Data

The data set is as follows:

# data
price = c(12, 11, 12, 13, 16, 17, 18, 17, 18, 16, 17, 13)
capacity = 25
max_units_sell = 8


#### Model

The main vehicle we will use is the well-known inventory balance equation:$\mathit{inv}_t = \mathit{inv}_{t-1} + \mathit{buy}_t - \mathit{sell}_t$ I assume that the initial inventory is $\mathit{inv}_0 = 0$

The optimization model for this problem can look like:

Model 1
\begin{align}\max&\sum_t \color{darkblue}{\mathit{price}}_t \cdot(\color{darkred}{\mathit{sell}}_t-\color{darkred}{\mathit{buy}}_t)\\ &\color{darkred}{\mathit{inv}}_t = \color{darkred}{\mathit{inv}}_{t-1} + \color{darkred}{\mathit{buy}}_t - \color{darkred}{\mathit{sell}}_t\\ &\color{darkred}{\mathit{inv}}_0 = 0\\ &\color{darkred}{\mathit{inv}}_t \in \{0,1,\dots,\color{darkblue}{\mathit{capacity}}\}\\ &\color{darkred}{\mathit{buy}}_t \in \{0,1,\dots,\color{darkblue}{\mathit{maxbuy}}\}\\ &\color{darkred}{\mathit{sell}}_t \in \{0,1,\dots,\color{darkblue}{\mathit{maxsell}}\}\\ \end{align}

I used here integer decision variables: the assumption is we cannot deal with fractional values. Obviously, we can easily relax this and work with continuous variables.

#### Implimentation

To try this out, I am using CVXR [2] with the Glpk solver [3].

CVXR is using a matrix oriented notation. It is interesting to see how this can be applied to our inventory balance equation. The main trick is that we use a Lag operator implemented as a matrix-vector multiplication: $L \cdot \mathit{inv}$

The Lag operator. We know that multiplication by an identity matrix gives us the same vector: $v = I \cdot v$ If we shift the diagonal down by one, we get what we want. Let's do a little experiment:

> n <- 5
> v <- 1:n
> v
[1] 1 2 3 4 5
> # matrix-vector multiplication with identity matrix
> diag(n) %*% v
[,1]
[1,]    1
[2,]    2
[3,]    3
[4,]    4
[5,]    5
> # lag operator
> L = cbind(rbind(0,diag(n-1)),0)
> L
[,1] [,2] [,3] [,4] [,5]
[1,]    0    0    0    0    0
[2,]    1    0    0    0    0
[3,]    0    1    0    0    0
[4,]    0    0    1    0    0
[5,]    0    0    0    1    0
> # matrix-vector multiplication with Lag operator
> L %*% v
[,1]
[1,]    0
[2,]    1
[3,]    2
[4,]    3
[5,]    4
>


We see that $$L$$ is indeed the identity matrix but shifted one position down. When we multiply $$Lv$$ we see that the result vector is also shifted one down. The newly inserted element at the first position is zero, which is exactly what we want.

Note: if we want a non-zero initial inventory we need to add a vector $$\mathit{initinv}$$ with the first element being the initial inventory and the other elements all zero. Then we can write the inventory balance equation in matrix notation as $\mathit{inv} = \mathit{initinv} + L \cdot \mathit{inv} + \mathit{buy} - \mathit{sell}$ In the R code below, we assumed this was not needed.

With this, we are ready to solve our problem:

> library(CVXR)
>
> # data
> price = c(12, 11, 12, 13, 16, 17, 18, 17, 18, 16, 17, 13)
> capacity = 25
> max_units_sell = 8
>
> # number of time periods
> NT <- length(price)
>
> # Decision variables
> inv = Variable(NT,integer=T)
> sell = Variable(NT,integer=T)
>
> # Lag operator
> L = cbind(rbind(0,diag(NT-1)),0)
>
> # optimization model
+                    list(inv == L %*% inv + buy - sell,
+                         inv >= 0, inv <= capacity,
+                         sell >= 0, sell <= max_units_sell))
> result <- solve(problem,verbose=T)
GLPK Simplex Optimizer, v4.47
84 rows, 36 columns, 119 non-zeros
*     0: obj =  0.000000000e+000  infeas = 0.000e+000 (12)
*    35: obj = -1.040000000e+002  infeas = 0.000e+000 (0)
OPTIMAL SOLUTION FOUND
GLPK Integer Optimizer, v4.47
84 rows, 36 columns, 119 non-zeros
36 integer variables, none of which are binary
Integer optimization begins...
+    35: >>>>> -1.040000000e+002 >= -1.040000000e+002   0.0% (1; 0)
+    35: mip = -1.040000000e+002 >=     tree is empty   0.0% (0; 1)
INTEGER OPTIMAL SOLUTION FOUND
> cat("status:",result$status) status: optimal > cat("objective:",result$value)
objective: 104
> # print results
> data.frame(price,
+            buy=result$getValue(buy), + sell=result$getValue(sell),
+            inv=resultgetValue(inv)) price buy sell inv 1 12 4 0 4 2 11 4 0 8 3 12 4 0 12 4 13 4 0 16 5 16 4 0 20 6 17 0 8 12 7 18 0 8 4 8 17 4 0 8 9 18 0 8 0 10 16 4 0 4 11 17 0 4 0 12 13 0 0 0 >  We see indeed we are buying when things are cheap and selling when prices are high. This gives us a net profit of 104. The inventory capacity is never binding here. #### A complication In a subsequent post, a non-trivial wrinkle was added to the problem [4]. price = c(12, 11, 12, 13, 16, 17, 18, 17, 18, 16, 17, 13) capacity = 25 max_units_buy_30 = 4 # when inventory level is lower then 30% it is possible to buy 0 to 4 units max_units_buy_65 = 3 # when inventory level is between 30% and 65% it is possible to buy 0 to 3 units max_units_buy_100 = 2 # when inventory level is between 65% and 100% it is possible to buy 0 to 2 units max_units_sell_30 = 4 # when inventory level is lower then 30% it is possible to sell 0 to 4 units max_units_sell_70 = 6 # when inventory level is between 30% and 70% it is possible to sell 0 to 6 units max_units_sell_100 = 8 # when inventory level is between 70% and 100% it is possible to sell 0 to 8 units  This requires a little bit of work. #### Time The first thing to consider is how time is handled in the model. In Model 1, we just used a time index $$t$$ and did not really say much about it. Here, we need to be a bit more precise. The variables buy and sell are what we call flow variables. Buying and selling takes place during time period $$t$$. Inventory, however, is measured at a specific point in time. In our case, this is at the end of period $$t$$. This is called a stock variable. Looking at the picture we should put limits on $$\mathit{buy}_t$$ and $$\mathit{sell}_t$$ based on the inventory level at the end of the previous period: $$\mathit{inv}_{t-1}$$. #### Segmenting inventory levels From the question, we see that we have different intervals: • For buying, we have: 0%-30%, 30%-65% and 65%-100% • For selling, we have: 0%-30%, 30%-70% and 70%-100% I prefer to deal with one set of intervals, so we combine this into: • 0%-30%, 30%-65%, 65%-70% and 70%-100% The idea is to introduce a binary variable $$\delta_{k,t}$$ indicating in which segment $$k$$ we are. Obviously, we can only be in one segment, so: \begin{align} &\sum_k \delta_{k,t} = 1 && \forall t\\ & \delta_{k,t} \in \{0,1\}\end{align} So depending on which $$\delta_{k,t}$$ is turned on, we have different lower- and upperbounds on the inventory levels. This means we can write:$\sum_k \mathit{invlb}_k \cdot \delta_{k,t}\le \mathit{inv}_{t-1} \le \sum_k \mathit{invub}_k \cdot \delta_{k,t}$ Notice that we put the bounds on the inventory level at the end of the previous period. #### Limiting buying and selling Here we need to link the values for $$\delta_{k,t}$$ to bounds on buying and selling. This is very similar to what we did with respect to inventory: \begin{align}\mathit{buy}_t \le \sum_k \mathit{buyub}_k \cdot \delta_{k,t} && \forall t \\\mathit{sell}_t \le \sum_k \mathit{sellub}_k \cdot \delta_{k,t} && \forall t\end{align} #### Model We now have all the pieces to write down the complete model. Model 2 \begin{align}\max&\sum_t \color{darkblue}{\mathit{price}}_t \cdot(\color{darkred}{\mathit{sell}}_t-\color{darkred}{\mathit{buy}}_t)\\ &\color{darkred}{\mathit{inv}}_t = \color{darkred}{\mathit{inv}}_{t-1} + \color{darkred}{\mathit{buy}}_t - \color{darkred}{\mathit{sell}}_t&& \forall t\\ &\color{darkred}{\mathit{inv}}_0 = 0\\ &\color{darkred}{\mathit{inv}}_{t-1} \ge \sum_k \color{darkblue}{\mathit{invlb}}_k \cdot \color{darkred}\delta_{k,t} && \forall t \\ &\color{darkred}{\mathit{inv}}_{t-1} \le \sum_k \color{darkblue}{\mathit{invub}}_k \cdot \color{darkred}\delta_{k,t} && \forall t \\ &\color{darkred}{\mathit{buy}}_t \le \sum_k \color{darkblue}{\mathit{buyub}}_k \cdot \color{darkred}\delta_{k,t} && \forall t \\ &\color{darkred}{\mathit{sell}}_t \le \sum_k \color{darkblue}{\mathit{sellub}}_k \cdot \color{darkred}\delta_{k,t} && \forall t \\ &\sum_k \color{darkred}\delta_{k,t} = 1 && \forall t\\ &\color{darkred}{\mathit{inv}}_t \in \{0,1,\dots,\color{darkblue}{\mathit{capacity}}\}\\ &\color{darkred}{\mathit{buy}}_t \in \{0,1,\dots\}\\ &\color{darkred}{\mathit{sell}}_t \in \{0,1,\dots\}\\ &\color{darkred}\delta_{k,t} \in \{0,1\} \end{align} #### Implementation The R implementation can look like: > library(CVXR) > > # data > price = c(12, 11, 12, 13, 16, 17, 18, 17, 18, 16, 17, 13) > capacity = 25 > max_units_buy = 4 > max_units_sell = 8 > > # capacity segments > s <- c(0,0.3,0.65,0.7,1) > > # corresponding lower and upper bounds > invlb <- s[1:(length(s)-1)] * capacity > invlb [1] 0.00 7.50 16.25 17.50 > invub <- s[2:length(s)] * capacity > invub [1] 7.50 16.25 17.50 25.00 > > buyub <- c(4,3,2,2) > sellub <- c(4,6,6,8) > > # number of time periods > NT <- length(price) > NT [1] 12 > > # number of capacity segments > NS <- length(s)-1 > NS [1] 4 > > # Decision variables > inv = Variable(NT,integer=T) > buy = Variable(NT,integer=T) > sell = Variable(NT,integer=T) > delta = Variable(NS,NT,boolean=T) > > # Lag operator > L = cbind(rbind(0,diag(NT-1)),0) > > # optimization model > problem <- Problem(Maximize(sum(price*(sell-buy))), + list(inv == L %*% inv + buy - sell, + sum_entries(delta,axis=2)==1, + L %*% inv >= t(delta) %*% invlb, + L %*% inv <= t(delta) %*% invub, + buy <= t(delta) %*% buyub, + sell <= t(delta) %*% sellub, + inv >= 0, inv <= capacity, + buy >= 0, sell >= 0)) > result <- solve(problem,verbose=T) GLPK Simplex Optimizer, v4.47 120 rows, 84 columns, 369 non-zeros 0: obj = 0.000000000e+000 infeas = 1.200e+001 (24) * 23: obj = 0.000000000e+000 infeas = 0.000e+000 (24) * 85: obj = -9.875986758e+001 infeas = 0.000e+000 (2) OPTIMAL SOLUTION FOUND GLPK Integer Optimizer, v4.47 120 rows, 84 columns, 369 non-zeros 84 integer variables, 48 of which are binary Integer optimization begins... + 85: mip = not found yet >= -inf (1; 0) + 123: >>>>> -8.800000000e+001 >= -9.100000000e+001 3.4% (17; 0) + 126: >>>>> -9.000000000e+001 >= -9.100000000e+001 1.1% (9; 11) + 142: mip = -9.000000000e+001 >= tree is empty 0.0% (0; 35) INTEGER OPTIMAL SOLUTION FOUND > cat("status:",resultstatus)
status: optimal
> cat("objective:",result$value) objective: 90 > # print results > data.frame(price, + buy=result$getValue(buy),
+            sell=result$getValue(sell), + inv=result$getValue(inv),
+            maxbuy=t(result$getValue(delta)) %*% buyub, + maxsell=t(result$getValue(delta)) %*% sellub)
1     12   3    0   3      4       4
2     11   4    0   7      4       4
3     12   4    0  11      4       4
4     13   3    0  14      3       6
5     16   3    0  17      3       6
6     17   1    0  18      2       6
7     18   0    8  10      2       8
8     17   0    6   4      3       6
9     18   0    4   0      4       4
10    16   4    0   4      4       4
11    17   0    4   0      4       4
12    13   0    0   0      4       4
> # display the optimal values for delta
> print(result\$getValue(delta))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,]    1    1    1    0    0    0    0    0    1     1     1     1
[2,]    0    0    0    1    1    0    0    1    0     0     0     0
[3,]    0    0    0    0    0    1    0    0    0     0     0     0
[4,]    0    0    0    0    0    0    1    0    0     0     0     0
>


The results are interesting. Compared to the first model, the profit has decreased from 140 to 90.

One question that comes to mind is: why only buy 3 in the first period? Well, if we buy 4 and 4 units in periods 1 and 2, we can no longer buy 4 units in period 3. These kinds of decisions are very difficult to make by hand: it is just too complex.

#### Conclusion

This is a nice little example, where some formal modeling really helps. It also shows both the strengths and weaknesses of a CVX-like modeling tool: it can make matrix- and vector-based models very easy to express. But when we stray away from this a bit and want to implement modeling concepts like lags, things become somewhat more complicated.