Path: blob/main/notebooks/04/08-traveling-salesman-problem.ipynb
663 views
Extra material: Traveling Salesman Problem
Introduction
This notebook can be seen as an addition to both Chapter 3 (since it concerns mixed integer models, and analyses differences in strength) and to Chapter 4 (since it addresses a network optimization problem).
In this notebook we illustrate several things:
That different correct models for the same (mixed) integer optimization problem may require substantially different effort to optimize.
That different models can be combined, and why that may become interesting.
That very large models can be specified 'on a need basis' by adding only the constraints that are needed for a specific instance.
That exploiting knowledge about the instance (e.g. symmetry of the cost function) may dramatically reduce optimization effort.
First, we will deal with how to even formulate this problem, to which there are multiple answers. Next, we shall solve each of the equivalent formulations with a solver and compare the solutions/solution times. Note that an important formulation requires an exponential number of constraints, rendering it impractical to explicitly implement. In the end, we will delve on how classic optimization techniques can be used to attack this problem in a way that is scalable.
This notebook focuses on the Traveling Salesman Problem which may be the most studied (combinatorial) optimization problem.
A wonderful source you may like to visit is maintained by Prof. Bill Cook. You may consider reading this enlightening and exciting book on the pursuit.
The main challenge when modelling the TSP is the connectivity of the solution.
The Wikipedia page lists two well-known models, the Miller–Tucker–Zemlin and the Dantzig–Fulkerson–Johnson.
These are not the only TSP models, see this or this overviews. A thorough comparison is also available.
Besides the practical computation difficulties, different formulations also have different merits as basis to model different business requirements. A good example is this very rich routing problem for which the computational impact of different model choices for the TSP parts of it has also been studied.
As said, we focus on the MTZ and the DFJ models.
A first inspection to these models may invite you to choose the former and abandon any consideration of the latter, on the grounds of compactness: the MTZ model consists of a polynomial number of variables and constraints, while the DFJ model requires an exponential number of constraints.
In the case of the TSP, as this notebook shows, the strength of models is of paramount importance when we need to actually solve the problem.
In the end, compactness may not be all what matters!
Preamble
Let us start importing the modules that we will need.
Solvers and utilities
We will illustrate the differences between free and commercial solvers, and also the advantages of having the ability of modifying a model while solving with a persistent solver.
A transformation to obtain the linear relaxation of a given model.
A simple function to display an elapsed number of seconds as a digital clock.
Data format
Since the TSP is very well-known and widely studied, there exists a de-facto standard for the data format of its instances.
This is important due to the richness of widely studied benchmark instances. Later down this notebook we will read such instances from the internet, we start generating random instances of convenient sizes and introduce some convenience functions to deal with their representation in python data structures.
The most important part of the data is for now the NODE_COORD_SECTION which we model as a list of triplets: (index,x,y).
We limit us to the rounded integer values of the EUC_2D option for EDGE_WEIGHT_TYPE as described in TSPLIB95 for the lengths of the route segments.
Later we will read instances in that format from the internet. Now we generate instances of convenient sizes.
Functions to handle instances and solutions
When we read (or generate instances) we create python data structures that we manipulate with the functions below.
First impressions
We generate a first instance of small size to start playing with.
Just to show that the node labels may be very generic, we replace the numerical (1-based) indices by letters.
The models
Most of the models consider the decision variables: Later we will see how to reduce these number of variables to (less than) half in case of symmetric distances.
All models aim at minimizing the total distance traveled: .
The models share a core: they model the degree of the nodes and the number of node pairs chosen in the solution. In the sequel, we refer to a chosen pair of nodes as an arc, which we may also call an edge in case the distances are symmetric. Pair (or arc) being chosen means that node follows node in the solutions, i.e. .
Models based on arcs (i.e. listing all pairs of potential choices) share a bipartite matching or linear assignment core: As we learnt from Chapter 4, imposing the integrality (binary nature in this case) of the decision variables on the core model above is redundant, as it is natural integral by nature of its constraint matrix being TUM.
Since this property vanishes as soon as we complement the model in any known way (albeit some published work erroneously saying otherwise, see also entry 17 of this wonderful legacy page) to reach a valid TSP model, we state the variables variables as binary.
The core model forces the degree of each node to be 2 (one arc entering the node and one leaving it) and also force arcs to be used (that this is implied by the model above can be easily seen by adding any of the two groups of constraints).
Models vary in the way the way the enforce connectivity, as the core part fails in doing that. In fact, for all is feasible, but the TSP clearly leaves all vertices.
An immediate improvement to the core model is just to fix the self-loops to 0, i.e. , but that is not enough as pairwise loops become the next culprit.
An inductive reasoning on how to forbid all this sub-tours leads to the well-known Dantzig, Fulkerson and Johnson model and formulates the TSP as: The model above prohibits the choice of enough arcs on each subset of nodes to sub-tour on those nodes.
An equivalent version of the same seems to be more popular, enforcing each subset of nodes to be connected to its complement.
Since this model requires an exponential number of (so-called sub-tour elimination) constraints we may prefer to start with a more compact model.
Fortunately, a compact model exists. This ingenious model, devised by C. E. Miller, Alfred E. Tucker and R. A. Zemlin in 1960 adds sequence variables and replace the connectivity (also known as subtour elimination) constraints which are in a exponential number by circa sequence constraints.
In fact, this model has been strengthened by several authors, see this paper.
The first improvement is as below:
The best known as below: These versions of the MTZ model are equivalent in how they model connectivity, but have increasingly tighter linear relaxations.
Below we will consider three MTZ models: the basic one, the one improved by Desrochers and Laporte and the best one so far by Gouveia and Pires.
Building blocks for TSP models
As the good programmers that we all are, we will strive for modularity and reuse. We will not regret!
Note that the code below is a bit tricky: the functions do modify the model that they take as argument. The fact that they return the model could lead into believing that not to be the case, but it is. This construction has the convenience of allowing cascading (composing) function calls.
Examining solutions
A TSP solution is a partition of the nodes, complemented with the addition of the first node of the partition at the end. Each model has a way to encode the solutions, but the translation into a permutation of nodes is different per model. The most straightforward role is played by the variables from the MTZ models, since they list the order of the nodes in that optimal permutation. Models that confine themselves to arc or edge variables require some bookkeeping to translate the optimal solutions into node permutations.
Experiments
We can see that the solution found may differ due to the symmetry of the costs, when drawn they are all the same as the one below (the last one found above). Note the special role played by the first node: the tour starts and ends there, therefore the u variables list the order of the intermediate nodes.
How does the strength of the MTZ models differ?
As we have seen in Chapter 3, a strong MILO model is characterized by a small gap between the optimal values of its linear counterpart (the linear relaxation) and of the true (mixed integer) optimal solution.
We can see above the optimal values of the linear relaxations of the three different MTZ models for (on each line) the same instance.
Note that the instance of 10 nodes that we solved before is the same as in the table above (due to seeding the generator).
The increase of the third version is quite impressive! How is that affecting the times required to solve the integer versions?
Small spoiler about the remainder of this notebook: it is not wise to try to solve instances of more than 15 nodes with these models!
Note that the times above are likely to be very different if you run this notebook on your own machine, even if using the same solver and version. In particular, if your machine has more than two (virtual) processors the differences between the models will be much smaller. In fact, it may happen that the times taken to optimize do not directly reflect the strength of the model.
Exploring the models
While often in our book we follow simple steps (model, implement, feed data, solve) it is wise to consider some improvement iterations that improve the model.
We will see how to combine MTZ and DFJ, how that insight leads to actually becoming able to solve DFJ without the need for MTZ and how DFJ can be specialized to the case of symmetric costs, and the benefits of doing so (if that is indeed the case for the instances at hand).
A tale of two models
The DFJ model is hard to use directly, since we would need to explicitly declare constraints, and that soon becomes impractical and soon after that, even impossible.
Our first attempt to grasp its use follows the idea in this educational paper in combining DFJ and MTZ.
The main idea is as follows:
Start by solving the linear assignment model.
For each sub-tour in the solution identify the set and add the corresponding DFJ sub-tour elimination constraint.
Solve again and repeat a pre-specified number of (
nof) times.
At the end of the procedure above we end up with the DFJ model confined to some sub-tour elimination constraints. These are redundant for the (mixed integer) MTZ model, but not for its linear relaxation. We add the MTZ variables and constraints and solve the resulting mixed model.
Wait, we can solve DFJ after all!
We just saw that we can implement a tiny bit of DFJ on a need basis and complement it with MTZ in order solve the combined model faster than MTZ alone.
But the process of discovering that part of DFJ that matters for the instance at hand may actually lead to solving it without the need of MTZ, and hence only using (the right part of) DFJ.
What we do is known in the literature as constraint (or cut) generation, sometimes also known as lazy constraints.
The fundamental ingredient is known as the separation problem. This is the ability to discover the important (but still missing) constraints that are in fact violated by the (infeasible) solution at hand. Once these constraints are added, the previous solution becomes separated from the new feasible set and will not be found again, ensuring termination of this iterative procedure.
We could not solve this instance with the (free, community edition of the) commercial solver, but the free solver did it in less than one minute!
We could be very happy about this, but now we realize that for all instances that we have been solving and in fact we do not need to have both and .
In essence, our underlying graph has been asymmetric (hence, with directed arcs) but can be regarded as symmetric (with undirected edges).
We do need to be careful about what that means for the model!
Symmetry
So far, our models include variables for all . However, if the cost matrix happens to be symmetric (and that is the case is symmetric of all instances studied in this notebook) we may ignore the direction, i.e. consider only one of the two arcs or .
The number of edges for is less than half of the number of arcs.
The symmetric model can be easily derived from the DFJ model: the linear assignment structure is replaced by constraining each node to have degree 2.
Note that no variant of MTZ is known for the symmetric case!
Are you amazed about the speedup (same solver)?
You should be!
Revisiting the implementation: lazy constraint generation!
Adding constraints on a need basis required so far that we repeatedly solved the problem from scratch, each time with a slightly larger model than before. In fact, some solvers allow the constraints to be added while solving. And pyomo offers ways to use that ability!
Callbacks
So far, we extended our model by adding the needed constraints and the solved again.
Some solvers support a more sophisticated way to add constraints while solving. The mechanism to do so is called a callback: a function that we define and offer to the solver to call while solving the problem.
The solver can do this at different moments. For us, the moment of interest is when the solver finds a feasible solution: we can check if it satisfies all the not yet defined constraints, and if needed add some new constraints.
Pyomo supports callbacks for using with gurobi, cplex and express. These are all commercial solvers, of which the free (community edition) of gurobi is the most generous, allowing for up to 2000 variables and/or constraints.
There is an additional benefit here! While the commercial solver would not solve this instance in the previous implementation with repeated solves (since the instance soon includes a number of constraints beyond the free tier) it does now, since the additional constraints added while solving don't count!
Gray area
Defining a callback forces us to adhere to solver specific knowledge.
One may argue that the flexibility offered by pyomo is now somewhat lost.
Maybe this is the moment to consider modeling the whole model in the solver's dialect (gurobipy in this case) and bypass the overhead of pyomo all together.
Since the number of variables is dominates the number of explicit constraints , the highest that fits the community edition is yielding variables and constraints.
Very fast, right? Maybe we can try some of the benchmarks now!
Famous benchmark instances from the Internet
Well, these sizes soon become too large for us! Definitely for the community edition of gurobi (we can go up to the first two). Actually, this technique becomes impractical soon after that, even when having a gurobi license in place. No mather how good the model is, the problem is still very hard! The solutions listed above where found by highly researched solvers, specific for the TSP.
Beyond this?
Maybe there are some things that you may need to address related to TSP that are not covered here.
Variants
Simple variants of the problem, such as not needing to return to the start node (shortest Hamiltonian path or open TSP) of using a given number of routes instead of one (the multiple TSP), are already with your reach.
These variants can be solved by transformations of the instance, or by remodeling.
We will illustrate the transformation approach, and leave the remodeling to you as exercises.
Open TSP
Suppose (without loss of generality) that you start at node and should visit all other nodes, without the need to return to .
Then, it is enough to modify your cost matrix by making all costs and solve the corresponding TSP.
Note that this modified TSP is asymmetric regardless of the nature of the original instance!
A simple closure does the trick! Notice that we index at inner indices for code simplicity now.
Multiple TSP
Suppose that you start at some node with sales persons that should partition the other nodes and return to the starting node.
Then, it is enough to replicate that node into replicas, with a prohibitively large cost between all pairs of replicas (to force the sales person to go on a route).
Note that this modified TSP stays symmetric if instance given was symmetric!
Obviously that you can also combine these extensions!
For this instance node seems to be an ideal candidate for the home base of sales persons.
We call the replica , since that label was not yet used.
Allowing the salesmen to stay at home leadds to only one leaving and hence to one single route.
Portability
Although we are accustomed to modeling data driven and reflect the nature of the domain being solved in, for instance, the indices of the variables we are now dealing with a problem whose input is, in fact, just a square matrix of costs.
In such "pure" cases is, in fact, better to strive for portable code.
In fact, SymmetricTSPViaGurobi is already portable since it takes its data from C only.
May you wish, then you can copy and paste the code below for your own use. Please attribute! And remember that some functions are defined above, for reuse.
Note that it is trivial to express the solution in the original indices.
Note that if your instance is symmetric, solving the asymmetric model is much more expensive.