Monday, July 27, 2015

Calculating the Bacon number: a shortest path network

The Bacon number measures how far an actor is removed from Kevin Bacon in terms of playing in the same movie. I received a data set with actors and movies. As this problem can be written as standard network model we can easily solve this with an LP solver.


Networks are often an important part of math programming models. We see them in supply chain models, in models dealing with electricity grids, regional trade flows, transportation etc. Not all users find this easy to model. Here some notes:

  • In this model we moved some of the complexity into the data manipulation step. As a result the equations are very simple. Complex equations are much more difficult to debug.
  • The sparse vector b only has two elements. In GAMS only the non-zero elements are stored so this is quite efficient.
  • Sometimes users use three different type of flow balance equations. I prefer a single equation block with source and sink modeled via this b vector.
  • I have seen implementations where movies are also part of the network. In this case I only have “actor-nodes”, which makes the network smaller.
  • The set arcs plays an important role in the topology of the network.
  • The network is sparse: many arcs do not exist. This is beneficial for the LP solver.
  • I sometimes see users creating a large cost to indicate a link can not be chosen. I think it is often much better to exclude such links from the set arcs.
  • In this case we have modeled a directed graph. I prefer to use directed links as the results are easier to interpret and the modeling is much more straightforward. When using undirected arcs we need to look at the sign of the flows. 
  • The Bacon number is the length of the shortest path where each link has a unit length. Not each optimal solution is unique. This is a pure network model; in practice we often see more complex models with a network component.
  • For pure networks we can easily form and solve the dual. I prefer to stay closer to the problem with a more intuitive primal formulation.
  • Of course Google can find these numbers for you: