Wednesday, October 9, 2019

The gas station problem: where to pump gas and how much


Problem


The problem (from [1]) is to determine where to pump gasoline (and how much) during a trip, where prices between gas stations fluctuate.


We consider some different objectives:

  • Minimize Cost. This results in an LP model.
  • Minimize Number of Stops. This makes the model a MIP.
  • Minimize Number of Stops followed by Minimize Cost. This is essentially a multi-objective problem.
  • As a bonus: Minimize Cost but always fill up the tank completely. This is a MIP model.

Data


I invented some data:


----     34 SET i  locations

start    ,    Station1 ,    Station2 ,    Station3 ,    Station4 ,    Station5 ,    Station6 ,    Station7 
Station8 ,    Station9 ,    Station10,    Station11,    Station12,    Station13,    Station14,    Station15
Station16,    Station17,    Station18,    Station19,    Station20,    finish   


----     34 SET g  gas stations

Station1 ,    Station2 ,    Station3 ,    Station4 ,    Station5 ,    Station6 ,    Station7 ,    Station8 
Station9 ,    Station10,    Station11,    Station12,    Station13,    Station14,    Station15,    Station16
Station17,    Station18,    Station19,    Station20


----     34 PARAMETER efficiency           =       18.000  [miles/gallon]
            PARAMETER capacity             =       50.000  tank-capacity [gallons]
            PARAMETER initgas              =       25.000  initial amount of gasoline in tank [gallons]
            PARAMETER finalgas             =       10.000  minimum final amount of gas in tank [gallons]
            PARAMETER triplen              =     2000.000  length of trip [miles]

----     34 PARAMETER price  [$/gallon]

Station1  3.002,    Station2  3.630,    Station3  3.616,    Station4  3.126,    Station5  3.167,    Station6  3.603
Station7  2.067,    Station8  3.281,    Station9  3.748,    Station10 3.783,    Station11 3.065,    Station12 2.349
Station13 3.135,    Station14 2.928,    Station15 3.527,    Station16 3.585,    Station17 3.305,    Station18 3.460
Station19 3.320,    Station20 2.375


----     34 PARAMETER distance  [miles]

         Station1    Station2    Station3    Station4    Station5    Station6    Station7    Station8    Station9

incr       88.074     174.227       3.250     157.671       1.847     140.247      79.275     166.355     117.030
cumul      88.074     262.301     265.550     423.221     425.068     565.315     644.590     810.945     927.975

    +   Station10   Station11   Station12   Station13   Station14   Station15   Station16   Station17   Station18

incr      136.787      16.517      23.499     167.570     127.418      31.520      91.885      91.363      74.604
cumul    1064.762    1081.279    1104.778    1272.348    1399.767    1431.286    1523.172    1614.535    1689.139

    +   Station19   Station20      finish

incr      148.788      18.876     143.198
cumul    1837.926    1856.802    2000.000


Prices and distances were produced using a random number generator.

Note that I added the constraint that we need a little bit left over gas in the tank when arriving at the finish. That requirement was not in the original problem [1]. We can drop this constraint by just setting the parameter \(\mathit{finalgas}=0\).

We also have some derived data: the amount of gas we use for each leg of the trip. This is just the length of the leg divided by the efficiency of the car:


----     34 PARAMETER use  gas usage from previous location [gallons]

Station1  4.893,    Station2  9.679,    Station3  0.181,    Station4  8.760,    Station5  0.103,    Station6  7.791
Station7  4.404,    Station8  9.242,    Station9  6.502,    Station10 7.599,    Station11 0.918,    Station12 1.305
Station13 9.309,    Station14 7.079,    Station15 1.751,    Station16 5.105,    Station17 5.076,    Station18 4.145
Station19 8.266,    Station20 1.049,    finish    7.955



Problem 1: minimize cost


The first problem is to minimize fuel cost. I have modeled this by observing three stages at each way point:

  1. First is the amount of gas in the tank when arriving at point \(i\). This amount should be non-negative: we cannot drive when the tank is empty. This variable is denoted by \(f_{\mathit{before},i}\ge 0\).
  2. The amount we pump is the second stage. This amount is bounded by \([0,\mathrm{capacity}]\). This variable is denoted by \(f_{\mathit{pumped},g}\).
  3. The amount in the tank after pumping. This amount cannot exceed the capacity of the tank. This is \(f_{\mathit{after},i} \in [0,\mathrm{capacity}]\). 

This problem is a little bit like modeling inventory: keep track of what is going out and what is added. The LP model can look like:

Min Cost Model
\[\begin{align} \min \> & \color{darkred}{\mathit{cost}}\\ & \color{darkred}{\mathit{cost}} = \sum_g \color{darkred}f_{\mathit{pumped},g} \cdot \color{darkblue}{\mathit{price}}_g \\ & \color{darkred}f_{\mathit{before},i} = \color{darkred}f_{\mathit{after},i-1} - \color{darkblue}{\mathit{use}}_i && \forall i \ne \mathit{start} \\ & \color{darkred}f_{\mathit{after},g} = \color{darkred}f_{\mathit{before},g} + \color{darkred}f_{\mathit{pumped},g} && \forall g \\ & \color{darkred}f_{\mathit{after},\mathit{start}} = \color{darkblue}{\mathit{initgas}} \\ & \color{darkred}f_{\mathit{before},\mathit{finish}} \ge \color{darkblue}{\mathit{finalgas}} \\ & \color{darkred}f_{k,i} \in [0,\color{darkblue}{\mathit{capacity}}] \end{align}\]

Note that the set \(g\) is a subset of set \(i\): \(g\) indicates the locations with gas stations between \(\mathit{start}\) and \(\mathit{finish}\). Also note that we cannot just substitute out the variable \(f_{\mathit{before},i}\): we need to make sure this quantity is non-negative. Similarly, we cannot substitute out the variable \(f_{\mathit{after},i}\): this must obey the tank capacity bound.

The results look like:


----     66 VARIABLE f.L  amounts of fuel

             start    Station1    Station2    Station3    Station4    Station5    Station6    Station7    Station8

before                  20.107      21.238      21.058      12.298      12.196       4.404                  40.758
pumped                  10.811                                                                  50.000
after       25.000      30.918      21.238      21.058      12.298      12.196       4.404      50.000      40.758

     +    Station9   Station10   Station11   Station12   Station13   Station14   Station15   Station16   Station17

before      34.256      26.657      25.740      24.434      40.691      33.612      31.861      26.756      21.680
pumped                                          25.566
after       34.256      26.657      25.740      50.000      40.691      33.612      31.861      26.756      21.680

     +   Station18   Station19   Station20      finish

before      17.535       9.270       8.221      10.000
pumped                               9.735
after       17.535       9.270      17.955


----     66 VARIABLE cost.L                =      218.998

We see we pump the most at station 7. Looking at the prices this makes sense: gasoline is cheapest at that gas station.

The number of stops where we pump gas is 4, and the total gas bill is $219.


Problem 2: minimize number of stops


In the previous section we solved the minimize cost problem. This gave us 4 stops to refuel with total fuel cost of $219. Now, let's try to minimize the number of times we visit a gas station. Counting in general needs binary variables, and this is no exception. The model can look like:


Min Number of Stops Model
\[\begin{align} \min \> & \color{darkred}{\mathit{numstops}}\\ & \color{darkred}{\mathit{numstops}} = \sum_g \color{darkred} \delta_g \\ & \color{darkred}{\mathit{cost}} = \sum_g \color{darkred}f_{\mathit{pumped},g}  \cdot \color{darkblue}{\mathit{price}}_g \\ & \color{darkred}f_{\mathit{before},i} = \color{darkred}f_{\mathit{after},i-1} - \color{darkblue}{\mathit{use}}_i && \forall i \ne \mathit{start} \\ & \color{darkred}f_{\mathit{after},g} = \color{darkred}f_{\mathit{before},g} + \color{darkred}f_{\mathit{pumped},g} && \forall g \\ & \color{darkred}f_{\mathit{after},\mathit{start}} = \color{darkblue}{\mathit{initgas}} \\ & \color{darkred}f_{\mathit{before},\mathit{finish}} \ge \color{darkblue}{\mathit{finalgas}} \\ & \color{darkred} f_{\mathit{pumped},g} \le \color{darkred} \delta_g  \cdot \color{darkblue}{\mathit{capacity}} && \forall g \\ & \color{darkred}f_{k,i} \in [0,\color{darkblue}{\mathit{capacity}}] \\ & \color{darkred} \delta_g \in \{0,1\} \end{align}\]

Because we have binary variables, this is now a MIP model. The constraint \(f_{\mathit{pumped},g} \le  \delta_g \cdot \mathit{capacity}\)  implements the implication: \[\delta_g=0 \Rightarrow f_{\mathit{pumped},g}=0\]When we solve this we see:



----     71 VARIABLE f.L  amounts of fuel

             start    Station1    Station2    Station3    Station4    Station5    Station6    Station7    Station8

before                  20.107      10.428      10.247       1.488       1.385                  41.954      32.712
pumped                                                                   6.406      46.358
after       25.000      20.107      10.428      10.247       1.488       7.791      46.358      41.954      32.712

     +    Station9   Station10   Station11   Station12   Station13   Station14   Station15   Station16   Station17

before      26.211      18.611      17.694      16.388       7.079                  41.595      36.490      31.415
pumped                                                                  43.346
after       26.211      18.611      17.694      16.388       7.079      43.346      41.595      36.490      31.415

     +   Station18   Station19   Station20      finish

before      27.270      19.004      17.955      10.000
after       27.270      19.004      17.955


----     71 VARIABLE delta.L  

Station5  1.000,    Station6  1.000,    Station14 1.000


----     71 VARIABLE cost.L                =      314.254  
            VARIABLE numstops.L            =        3.000 


So instead of 4 stops, now we only need 3 stops. We ignored the cost in this model. This causes the fuel cost to skyrocket to $314 (from $219 in the min cost model).

I kept the cost constraint in the problem for two reasons. First, it functions as an accounting constraint. Such a constraint is just for informational purposes (it is not meant to change or restrict the solution). A second reason is that we use the cost variable in a second solve in order to minimize cost while keeping the number of stops optimal. This is explained in the next section.

Problem 3: minimize number of stops followed by minimizing cost


After solving the min number of stops problem (previous section), we can fix the number of stops variable \(\mathit{numstops}\) to the optimal value and resolve minimizing the cost. This is essentially a lexicographic approach to solving the multi-objective problem min numstops, min cost. If we do this we get as solution:


----     76 VARIABLE f.L  amounts of fuel

             start    Station1    Station2    Station3    Station4    Station5    Station6    Station7    Station8

before                  20.107      21.238      21.058      12.298      12.196       4.404                  40.758
pumped                  10.811                                                                  50.000
after       25.000      30.918      21.238      21.058      12.298      12.196       4.404      50.000      40.758

     +    Station9   Station10   Station11   Station12   Station13   Station14   Station15   Station16   Station17

before      34.256      26.657      25.740      24.434      15.125       8.046      41.595      36.490      31.415
pumped                                                                  35.301
after       34.256      26.657      25.740      24.434      15.125      43.346      41.595      36.490      31.415

     +   Station18   Station19   Station20      finish

before      27.270      19.004      17.955      10.000
after       27.270      19.004      17.955


----     76 VARIABLE delta.L  

Station1  1.000,    Station7  1.000,    Station14 1.000


----     76 VARIABLE cost.L                =      239.188  
            VARIABLE numstops.L            =        3.000  


Now we have a solution with 3 stops and a fuel cost of $239. This is my proposal for a solution strategy for the problem stated in [1].

An alternative would be to create a single optimization problem with a weighted sum objective: \[\min \> \mathit{numstops} + w \cdot  \mathit{cost}\] with \(w\) a small constant to make sure that \(\mathit{numstops}\) is the most important variable. As the value of \(w\) requires some thought, it may be better to use the lexicographic approach.


Filling up the gas tank


Suppose that when pumping gas we always fill up the tank completely. This alternative is not too difficult to handle. We need to add the implication: \[\delta_g=1 \Rightarrow f_{\mathit{pumped},g}=\mathit{capacity}-f_{\mathit{before},g}\] This can be handled using the inequality: \[f_{\mathit{pumped},g} \ge \delta_g \cdot \mathit{capacity}-f_{\mathit{before},g}\]


If we add this constraint and solve the min cost model we see:


----     84 VARIABLE f.L  amounts of fuel

             start    Station1    Station2    Station3    Station4    Station5    Station6    Station7    Station8

before                  20.107      40.321      40.140      31.381      31.278      23.487      19.082      40.758
pumped                  29.893                                                                  30.918
after       25.000      50.000      40.321      40.140      31.381      31.278      23.487      50.000      40.758

     +    Station9   Station10   Station11   Station12   Station13   Station14   Station15   Station16   Station17

before      34.256      26.657      25.740      24.434      40.691      33.612      48.249      43.144      38.068
pumped                                          25.566                  16.388
after       34.256      26.657      25.740      50.000      40.691      50.000      48.249      43.144      38.068

     +   Station18   Station19   Station20      finish

before      33.924      25.658      24.609      16.654
after       33.924      25.658      24.609


----     84 VARIABLE delta.L  

Station1  1.000,    Station7  1.000,    Station12 1.000,    Station14 1.000


----     84 VARIABLE cost.L                =      261.709  
            VARIABLE numstops.L            =        4.000  


In this case we have a little bit more gasoline left in the tank at the finish than strictly needed. Notice how in each case we pump gas, we end up with a full tank. This fill-up strategy is surprisingly expensive.

Conclusion


Here we see the advantages of using an optimization model compared to a tailored algorithm. We can adapt the optimization model to different situations. From the basic min cost model, we can quickly react to new questions.

References


  1. Gas Station Problem - cheapest and least amount of stations, https://stackoverflow.com/questions/58289424/gas-station-problem-cheapest-and-least-amount-of-stations
  2. Shamir Khuller, Azarakhsh Malekian, Julian Mestre, To Fill or not to Fill: The Gas Station Problem, ACM Transactions on Algorithms, Volume 7, Issue 3, July 2011.

3 comments:

  1. Thanks for that - I was thinking of doing the same kinds of experiments after I saw the original post on StackOverflow.

    I would love to know your results/observations on run time and the scalability of the approach - I suspect that specific algorithms using ideas like recursion as originally suggested the the OP would scale horribly...

    ReplyDelete
    Replies
    1. May be it would be fairer to compare to a carefully crafted dynamic programming algorithm. After all, we throw here some heavy machinery at the problem.

      Delete
    2. I'm not sure I entirely buy the conclusion here: Of course, a more general piece of machinery will solve a larger class of problems (by definition?) but in this particular case, the algorithm outlined in your reference [1] is just a recurrence relation with very few bells and whistles: Their "naive approach" is really just plugging numbers into the relation, whereas their more sophisticated approach also sorts the values used by consecutive updates. And "recurrence relations + a bit of pre-computing" should also classify as an approach that's flexible enough to be applied to different situations.

      In particular, even without changing the recurrence, we can solve the first three problems here at once (the first one by returning $A(s, n, 0)$, the next one by finding the smallest value of $q$ so that $A(s, q, 0)$ is non-infinite (although this one can more easily be computed by the greedy approach in linear time), and the final one by returning $A(s, q, 0)$ for the $q$ we just found).

      As a happy side-effect, this also solves the SO poster's problem of finding all optimal solutions; you can of course do that with the model at hand as well but I guess performance will suffer (can you do something smarter than excluding the solutions you find and re-running?), in particular if the problem is degenerate enough, so that the set of optimal solutions grows large.

      As also mentioned in the answer on SO, there's also an algorithm available with better complexity guarantees [0], and while that one still just solves a recurrence relation, it puts enough care into the way that it does so, that that probably counts as tailoring beyond what generalizes to all other problems. It would be interesting to see how that one performs in practice.

      Given your numbers, the simpler approach mentioned above outperforms the MIP solver for the 500 station case by a good amount, while you could probably still debate whether that implementation is particularly "carefully crafted"; certainly there's some low-hanging apples still.

      [0]: https://arxiv.org/abs/1706.00195

      Delete