Problem
The problem (from [1]) is to determine where to pump gasoline (and how much) during a trip, where prices between gas stations fluctuate.
We consider some different objectives:
- Minimize Cost. This results in an LP model.
- Minimize Number of Stops. This makes the model a MIP.
- Minimize Number of Stops followed by Minimize Cost. This is essentially a multi-objective problem.
- As a bonus: Minimize Cost but always fill up the tank completely. This is a MIP model.
Data
I invented some data:
---- 34 SET i locations start , Station1 , Station2 , Station3 , Station4 , Station5 , Station6 , Station7 Station8 , Station9 , Station10, Station11, Station12, Station13, Station14, Station15 Station16, Station17, Station18, Station19, Station20, finish ---- 34 SET g gas stations Station1 , Station2 , Station3 , Station4 , Station5 , Station6 , Station7 , Station8 Station9 , Station10, Station11, Station12, Station13, Station14, Station15, Station16 Station17, Station18, Station19, Station20 ---- 34 PARAMETER efficiency = 18.000 [miles/gallon] PARAMETER capacity = 50.000 tank-capacity [gallons] PARAMETER initgas = 25.000 initial amount of gasoline in tank [gallons] PARAMETER finalgas = 10.000 minimum final amount of gas in tank [gallons] PARAMETER triplen = 2000.000 length of trip [miles] ---- 34 PARAMETER price [$/gallon] Station1 3.002, Station2 3.630, Station3 3.616, Station4 3.126, Station5 3.167, Station6 3.603 Station7 2.067, Station8 3.281, Station9 3.748, Station10 3.783, Station11 3.065, Station12 2.349 Station13 3.135, Station14 2.928, Station15 3.527, Station16 3.585, Station17 3.305, Station18 3.460 Station19 3.320, Station20 2.375 ---- 34 PARAMETER distance [miles] Station1 Station2 Station3 Station4 Station5 Station6 Station7 Station8 Station9 incr 88.074 174.227 3.250 157.671 1.847 140.247 79.275 166.355 117.030 cumul 88.074 262.301 265.550 423.221 425.068 565.315 644.590 810.945 927.975 + Station10 Station11 Station12 Station13 Station14 Station15 Station16 Station17 Station18 incr 136.787 16.517 23.499 167.570 127.418 31.520 91.885 91.363 74.604 cumul 1064.762 1081.279 1104.778 1272.348 1399.767 1431.286 1523.172 1614.535 1689.139 + Station19 Station20 finish incr 148.788 18.876 143.198 cumul 1837.926 1856.802 2000.000
Prices and distances were produced using a random number generator.
Note that I added the constraint that we need a little bit left over gas in the tank when arriving at the finish. That requirement was not in the original problem [1]. We can drop this constraint by just setting the parameter \(\mathit{finalgas}=0\).
We also have some derived data: the amount of gas we use for each leg of the trip. This is just the length of the leg divided by the efficiency of the car:
---- 34 PARAMETER use gas usage from previous location [gallons] Station1 4.893, Station2 9.679, Station3 0.181, Station4 8.760, Station5 0.103, Station6 7.791 Station7 4.404, Station8 9.242, Station9 6.502, Station10 7.599, Station11 0.918, Station12 1.305 Station13 9.309, Station14 7.079, Station15 1.751, Station16 5.105, Station17 5.076, Station18 4.145 Station19 8.266, Station20 1.049, finish 7.955
Problem 1: minimize cost
The first problem is to minimize fuel cost. I have modeled this by observing three stages at each way point:
- First is the amount of gas in the tank when arriving at point \(i\). This amount should be non-negative: we cannot drive when the tank is empty. This variable is denoted by \(f_{\mathit{before},i}\ge 0\).
- The amount we pump is the second stage. This amount is bounded by \([0,\mathrm{capacity}]\). This variable is denoted by \(f_{\mathit{pumped},g}\).
- The amount in the tank after pumping. This amount cannot exceed the capacity of the tank. This is \(f_{\mathit{after},i} \in [0,\mathrm{capacity}]\).
This problem is a little bit like modeling inventory: keep track of what is going out and what is added. The LP model can look like:
Min Cost Model |
---|
\[\begin{align} \min \> & \color{darkred}{\mathit{cost}}\\ & \color{darkred}{\mathit{cost}} = \sum_g \color{darkred}f_{\mathit{pumped},g} \cdot \color{darkblue}{\mathit{price}}_g \\ & \color{darkred}f_{\mathit{before},i} = \color{darkred}f_{\mathit{after},i-1} - \color{darkblue}{\mathit{use}}_i && \forall i \ne \mathit{start} \\ & \color{darkred}f_{\mathit{after},g} = \color{darkred}f_{\mathit{before},g} + \color{darkred}f_{\mathit{pumped},g} && \forall g \\ & \color{darkred}f_{\mathit{after},\mathit{start}} = \color{darkblue}{\mathit{initgas}} \\ & \color{darkred}f_{\mathit{before},\mathit{finish}} \ge \color{darkblue}{\mathit{finalgas}} \\ & \color{darkred}f_{k,i} \in [0,\color{darkblue}{\mathit{capacity}}] \end{align}\] |
Note that the set \(g\) is a subset of set \(i\): \(g\) indicates the locations with gas stations between \(\mathit{start}\) and \(\mathit{finish}\). Also note that we cannot just substitute out the variable \(f_{\mathit{before},i}\): we need to make sure this quantity is non-negative. Similarly, we cannot substitute out the variable \(f_{\mathit{after},i}\): this must obey the tank capacity bound.
The results look like:
---- 66 VARIABLE f.L amounts of fuel start Station1 Station2 Station3 Station4 Station5 Station6 Station7 Station8 before 20.107 21.238 21.058 12.298 12.196 4.404 40.758 pumped 10.811 50.000 after 25.000 30.918 21.238 21.058 12.298 12.196 4.404 50.000 40.758 + Station9 Station10 Station11 Station12 Station13 Station14 Station15 Station16 Station17 before 34.256 26.657 25.740 24.434 40.691 33.612 31.861 26.756 21.680 pumped 25.566 after 34.256 26.657 25.740 50.000 40.691 33.612 31.861 26.756 21.680 + Station18 Station19 Station20 finish before 17.535 9.270 8.221 10.000 pumped 9.735 after 17.535 9.270 17.955 ---- 66 VARIABLE cost.L = 218.998
We see we pump the most at station 7. Looking at the prices this makes sense: gasoline is cheapest at that gas station.
The number of stops where we pump gas is 4, and the total gas bill is $219.
Problem 2: minimize number of stops
In the previous section we solved the minimize cost problem. This gave us 4 stops to refuel with total fuel cost of $219. Now, let's try to minimize the number of times we visit a gas station. Counting in general needs binary variables, and this is no exception. The model can look like:
Min Number of Stops Model |
---|
\[\begin{align} \min \> & \color{darkred}{\mathit{numstops}}\\ & \color{darkred}{\mathit{numstops}} = \sum_g \color{darkred} \delta_g \\ & \color{darkred}{\mathit{cost}} = \sum_g \color{darkred}f_{\mathit{pumped},g} \cdot \color{darkblue}{\mathit{price}}_g \\ & \color{darkred}f_{\mathit{before},i} = \color{darkred}f_{\mathit{after},i-1} - \color{darkblue}{\mathit{use}}_i && \forall i \ne \mathit{start} \\ & \color{darkred}f_{\mathit{after},g} = \color{darkred}f_{\mathit{before},g} + \color{darkred}f_{\mathit{pumped},g} && \forall g \\ & \color{darkred}f_{\mathit{after},\mathit{start}} = \color{darkblue}{\mathit{initgas}} \\ & \color{darkred}f_{\mathit{before},\mathit{finish}} \ge \color{darkblue}{\mathit{finalgas}} \\ & \color{darkred} f_{\mathit{pumped},g} \le \color{darkred} \delta_g \cdot \color{darkblue}{\mathit{capacity}} && \forall g \\ & \color{darkred}f_{k,i} \in [0,\color{darkblue}{\mathit{capacity}}] \\ & \color{darkred} \delta_g \in \{0,1\} \end{align}\] |
Because we have binary variables, this is now a MIP model. The constraint \(f_{\mathit{pumped},g} \le \delta_g \cdot \mathit{capacity}\) implements the implication: \[\delta_g=0 \Rightarrow f_{\mathit{pumped},g}=0\]When we solve this we see:
---- 71 VARIABLE f.L amounts of fuel start Station1 Station2 Station3 Station4 Station5 Station6 Station7 Station8 before 20.107 10.428 10.247 1.488 1.385 41.954 32.712 pumped 6.406 46.358 after 25.000 20.107 10.428 10.247 1.488 7.791 46.358 41.954 32.712 + Station9 Station10 Station11 Station12 Station13 Station14 Station15 Station16 Station17 before 26.211 18.611 17.694 16.388 7.079 41.595 36.490 31.415 pumped 43.346 after 26.211 18.611 17.694 16.388 7.079 43.346 41.595 36.490 31.415 + Station18 Station19 Station20 finish before 27.270 19.004 17.955 10.000 after 27.270 19.004 17.955 ---- 71 VARIABLE delta.L Station5 1.000, Station6 1.000, Station14 1.000 ---- 71 VARIABLE cost.L = 314.254 VARIABLE numstops.L = 3.000
So instead of 4 stops, now we only need 3 stops. We ignored the cost in this model. This causes the fuel cost to skyrocket to $314 (from $219 in the min cost model).
I kept the cost constraint in the problem for two reasons. First, it functions as an accounting constraint. Such a constraint is just for informational purposes (it is not meant to change or restrict the solution). A second reason is that we use the cost variable in a second solve in order to minimize cost while keeping the number of stops optimal. This is explained in the next section.
Problem 3: minimize number of stops followed by minimizing cost
After solving the min number of stops problem (previous section), we can fix the number of stops variable \(\mathit{numstops}\) to the optimal value and resolve minimizing the cost. This is essentially a lexicographic approach to solving the multi-objective problem min numstops, min cost. If we do this we get as solution:
---- 76 VARIABLE f.L amounts of fuel start Station1 Station2 Station3 Station4 Station5 Station6 Station7 Station8 before 20.107 21.238 21.058 12.298 12.196 4.404 40.758 pumped 10.811 50.000 after 25.000 30.918 21.238 21.058 12.298 12.196 4.404 50.000 40.758 + Station9 Station10 Station11 Station12 Station13 Station14 Station15 Station16 Station17 before 34.256 26.657 25.740 24.434 15.125 8.046 41.595 36.490 31.415 pumped 35.301 after 34.256 26.657 25.740 24.434 15.125 43.346 41.595 36.490 31.415 + Station18 Station19 Station20 finish before 27.270 19.004 17.955 10.000 after 27.270 19.004 17.955 ---- 76 VARIABLE delta.L Station1 1.000, Station7 1.000, Station14 1.000 ---- 76 VARIABLE cost.L = 239.188 VARIABLE numstops.L = 3.000
Now we have a solution with 3 stops and a fuel cost of $239. This is my proposal for a solution strategy for the problem stated in [1].
An alternative would be to create a single optimization problem with a weighted sum objective: \[\min \> \mathit{numstops} + w \cdot \mathit{cost}\] with \(w\) a small constant to make sure that \(\mathit{numstops}\) is the most important variable. As the value of \(w\) requires some thought, it may be better to use the lexicographic approach.
Filling up the gas tank
Suppose that when pumping gas we always fill up the tank completely. This alternative is not too difficult to handle. We need to add the implication: \[\delta_g=1 \Rightarrow f_{\mathit{pumped},g}=\mathit{capacity}-f_{\mathit{before},g}\] This can be handled using the inequality: \[f_{\mathit{pumped},g} \ge \delta_g \cdot \mathit{capacity}-f_{\mathit{before},g}\]
If we add this constraint and solve the min cost model we see:
---- 84 VARIABLE f.L amounts of fuel start Station1 Station2 Station3 Station4 Station5 Station6 Station7 Station8 before 20.107 40.321 40.140 31.381 31.278 23.487 19.082 40.758 pumped 29.893 30.918 after 25.000 50.000 40.321 40.140 31.381 31.278 23.487 50.000 40.758 + Station9 Station10 Station11 Station12 Station13 Station14 Station15 Station16 Station17 before 34.256 26.657 25.740 24.434 40.691 33.612 48.249 43.144 38.068 pumped 25.566 16.388 after 34.256 26.657 25.740 50.000 40.691 50.000 48.249 43.144 38.068 + Station18 Station19 Station20 finish before 33.924 25.658 24.609 16.654 after 33.924 25.658 24.609 ---- 84 VARIABLE delta.L Station1 1.000, Station7 1.000, Station12 1.000, Station14 1.000 ---- 84 VARIABLE cost.L = 261.709 VARIABLE numstops.L = 4.000
In this case we have a little bit more gasoline left in the tank at the finish than strictly needed. Notice how in each case we pump gas, we end up with a full tank. This fill-up strategy is surprisingly expensive.
Conclusion
Here we see the advantages of using an optimization model compared to a tailored algorithm. We can adapt the optimization model to different situations. From the basic min cost model, we can quickly react to new questions.
References
- Gas Station Problem - cheapest and least amount of stations, https://stackoverflow.com/questions/58289424/gas-station-problem-cheapest-and-least-amount-of-stations
- Shamir Khuller, Azarakhsh Malekian, Julian Mestre, To Fill or not to Fill: The Gas Station Problem, ACM Transactions on Algorithms, Volume 7, Issue 3, July 2011.
Thanks for that - I was thinking of doing the same kinds of experiments after I saw the original post on StackOverflow.
ReplyDeleteI would love to know your results/observations on run time and the scalability of the approach - I suspect that specific algorithms using ideas like recursion as originally suggested the the OP would scale horribly...
May be it would be fairer to compare to a carefully crafted dynamic programming algorithm. After all, we throw here some heavy machinery at the problem.
DeleteI'm not sure I entirely buy the conclusion here: Of course, a more general piece of machinery will solve a larger class of problems (by definition?) but in this particular case, the algorithm outlined in your reference [1] is just a recurrence relation with very few bells and whistles: Their "naive approach" is really just plugging numbers into the relation, whereas their more sophisticated approach also sorts the values used by consecutive updates. And "recurrence relations + a bit of pre-computing" should also classify as an approach that's flexible enough to be applied to different situations.
DeleteIn particular, even without changing the recurrence, we can solve the first three problems here at once (the first one by returning $A(s, n, 0)$, the next one by finding the smallest value of $q$ so that $A(s, q, 0)$ is non-infinite (although this one can more easily be computed by the greedy approach in linear time), and the final one by returning $A(s, q, 0)$ for the $q$ we just found).
As a happy side-effect, this also solves the SO poster's problem of finding all optimal solutions; you can of course do that with the model at hand as well but I guess performance will suffer (can you do something smarter than excluding the solutions you find and re-running?), in particular if the problem is degenerate enough, so that the set of optimal solutions grows large.
As also mentioned in the answer on SO, there's also an algorithm available with better complexity guarantees [0], and while that one still just solves a recurrence relation, it puts enough care into the way that it does so, that that probably counts as tailoring beyond what generalizes to all other problems. It would be interesting to see how that one performs in practice.
Given your numbers, the simpler approach mentioned above outperforms the MIP solver for the 500 station case by a good amount, while you could probably still debate whether that implementation is particularly "carefully crafted"; certainly there's some low-hanging apples still.
[0]: https://arxiv.org/abs/1706.00195