This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

USA Traveling Salesman Tour 4

Posted by Cleve Moler,

Find the optimum traveling salesman tour through the capitals of the 48 contiguous states in the USA.

Contents

State Capitals

I am continuing with the workspace from my previous blog post.

John Burkardt provides a second data set, state_capitals_ll.txt, giving the longitudes and latitudes of capitals of the 48 contiguous states. (Ignore AK, DC, HI, PR and US.)

    [lon,lat] = get_capital_coordinates;

When I use these coordinates for the graph I created in my previous post, we get a more accurate picture of the USA.

    G = graph(A,cellstr(names));
    plot(G,'xdata',lon,'ydata',lat);

Travel

The travelling salesman problem, the TSP, was mathematically formulated in the 19th century. The problem is to find the closed circuit of a list of cities that travels the shortest total distance.

For 25 years MATLAB releases have included a simple demo program named travel that finds an approximate solution to the TSP through a few dozen randomly chosen points. The background of the plot, an outline the USA, is just for artistic effect.

Traveler

I've made the travel demo into a function named traveler whose input is the distance matrix D, the pairwise distances between the cities. The output is a path length miles and permutation vector p so that miles is the length of the route produced by lon(p) and lat(p).

Each time it is called, travel starts with a random permutation of the cities. Then, for several thousand iterations, it tries to reduce the length of the trip by revising the permutation in each of two simple ways. One scheme is to move one city to another position in the ordering. This example starts with the order [1:7]. It then moves city 4 to follow city 5, so the order becomes [1 2 3 5 4 6 7]. This reduces the length from 9.71 to 7.24.

   half_fig
   traveler_scheme1

The second scheme is based on the observation that any time a route crosses itself, the length can be reduced by reversing the crossed segment. We don't actually look for crossings, we just try reversing random chunks of the ordering. This example reverses the [3 4 5] in [1:7] to get [1 2 5 4 3 6 7], thereby reducing the length from 9.47 to 7.65.

   traveler_scheme2

These two heuristics are not powerful to guarantee that traveler will find the global shortest path. It's easy to get stuck in local minima. And the random path approach means that traveler is inefficient for more than a few dozen cities.

But nobody knows how to find a guaranteed shortest path for the TSP and the code for traveler is only 58 lines. I will include the code at the end of this post and in Cleve's Laboratory on the MATLAB Central File Exchange.

    close

Distance Matrix

The input to traveler is the distance matrix.

    D = distances(lon,lat);

The inputs to the function distances are the vectors lon and lat, the longitude and latitude of the cities. The output is the 48-by-48 matrix D, the great circle distance between pairs of cities. With some spherical trigonometry, lon and lat, which are essentially angles measured in degrees, are converted to miles, the lengths of "as the crow flies" geodesics on the surface of the earth. The radius of the earth is a scale factor. Here is the core of the function.

    dbtype 11:20 distances.m
11        R = 3959; % radius of the earth (mi.)
12        n = length(lon);
13        D = zeros(n,n);
14        for k = 1:n
15            for j = 1:k-1
16                D(k,j) = R*acos(sind(lat(k))*sind(lat(j)) + ...
17                    cosd(lat(k))*cosd(lat(j))*cosd(lon(k)-lon(j)));
18                D(j,k) = D(k,j);
19            end
20        end

Extrema

Which two capitals are nearest each other?

    Dmin = min(min(D(D>0)))
    [k,j] = find(D == Dmin)
    cities = names(k)
Dmin =
   34.9187
k =
    37
    19
j =
    19
    37
cities = 
  2×1 string array
    "RI"
    "MA"

Providence, Rhode Island and Boston, Massachusetts ("our fair city") are less than 35 miles apart.

What about the other extreme?

    Dmax = max(max(D))
    [k,j] = find(D == Dmax)
    cities = names(k)
Dmax =
   2.6629e+03
k =
    17
     4
j =
     4
    17
cities = 
  2×1 string array
    "ME"
    "CA"

As we might have expected, the capitals of Maine and California are the farthest apart, 2663 miles. That would require one prodigious crow.

Road Trip

Based on some experience that I will describe shortly, I am going to set the random number seed to 347 before we take our road trip with traveler.

    rng(347)
    [miles,p] = traveler(D);
    miles = round(miles)
miles =
       10818

That's an average of

    avg = miles/48
avg =
  225.3750

miles per leg.

Highlight

Let's highlight our graph of neighboring states with our traveling salesman path linking capital cities.

    Gp = plot(G,'xdata',lon,'ydata',lat,'edgecolor',turquoise);
    highlight(Gp,p,'edgecolor',darkred,'linewidth',2)
    title(miles)

There are four missing links. The path goes from UT to MT, but those states are not neighbors. We must go through ID. And the leg from FL to SC goes through GA. There are two more in the northeast. Fill in those missing links with dashed lines.

    missing_links(A,lon,lat,p,darkred)

Zoom in on the northeast.

    axis([-84 -68 33 46])

Final Order

Here the state abbreviations, ordered by this path.

    fprintf(fmt,names(p))
LA TX OK NM AZ NV CA OR WA ID MT UT CO WY SD ND 
MN WI MI OH WV PA NY VT ME NH MA RI CT NJ DE MD 
VA NC SC FL AL GA TN KY IN IL IA NE KS MO AR MS 
LA 

Last Step

It is interesting to look at the last step. After 3298 iterations, travel arrives at this situation with a path length of 10952.

The link connecting WV to VA crosses the link connecting NC to PA. It takes travel 222 more iterations, but eventually a permutation that reverses the path from VA and PA is tried. This connects WV to PA and NC to VA to produce the path shown in the previous plot. The total path length is decreased by 134 miles to 10818.

Optimum

I am confident that we have found the shortest path, but I don't have a proof.

I ran 1000 different initializations of rng. Eight runs produced the path with length 10818 that we found with rng(347). None of the runs found a shorter path. Sixteen more runs found a quite different path with length 10821, only three miles longer. Here is this next best path.

Animation

I must include an animated gif here. This initially shows every sixth step. It switches to every step when the path length is less than 12000. Few legs in random paths match edges in the neighbor graph so initially most lines are dashed. As we near the minimum, more solid lines appear.

Traveler_game

Chances that, without help, traveler will find the shortest route through the 48 capital cities are less than 1 in 100. Here is your opportunity to help guide the search. The traveler_game is a modification of travel that plots every successful step and that provides controls so that you to back up a few steps and try the random strategy again.

I will include the traveler_game in the next update, version 3.70, of Cleve's Laboratory on the MATLAB Central File Excange. That may take a few days.

There are five push buttons.

  • > Take one minimization step.
  • >> Take repeated steps, until a local minimum is reached.
  • < Reverse one step.
  • << Reverse many steps.
  • ^ Start over in a new figure window.

Here are four typical runs.

Codes

Here are the codes for traveler, distances, and path_length.

    type traveler
    type distances
    type path_length
function [len,p] = traveler(D)
%TRAVELER  Functional form of old MATLAB demo, "travel".
%   A pretty good, but certainly not the best, solution of the
%   Traveling Salesman problem.  Form a closed circuit of a
%   number of cities that travels the shortest total distance.
%
%   [len,p] = traveler(D).
%   Input: D = distances between cities with coordinates x and y.
%   Output: p a permutation, x(p) and y(p) is a path with length len.

%   Copyright 1984-2018 The MathWorks, Inc.

    n = size(D,1);
    p = randperm(n);
    len = path_length(p,D);

    for iter = 1:10000

        % Try to reverse a portion.

        pt1 = floor(n*rand)+1;
        pt2 = floor(n*rand)+1;

        lo = min(pt1,pt2);
        hi = max(pt1,pt2);

        q = 1:n;
        q(lo:hi) = q(hi:-1:lo);
        pnew = p(q);

        lennew = path_length(pnew,D);
        if lennew < len
            p = pnew;
            len = lennew;
            iterp = iter;
        end

        % Try a single point insertion

        pt1 = floor(n*rand)+1;
        pt2 = floor((n-1)*rand)+1;

        q = 1:n;
        q(pt1) = [];
        q = [q(1:pt2) pt1 q((pt2+1):(n-1))];
        pnew = p(q);

        lennew = path_length(pnew,D);
        if lennew < len
            p = pnew;
            len = lennew;
            iterp = iter;
        end
    end

    % Close the permutation.
    p(end+1) = p(1);
end

function D = distances(lon,lat)
%   D = distances(lon,lay)
%   Input: vectors lon and lat, the longitude and latitude of the cities.
%   Output: D(k,j) is the distance (in miles) between cities k and j.

%   Copyright 1984-2018 The MathWorks, Inc.
     
    % Great circle distance matrix between pairs of cities.
    % https://en.wikipedia.org/wiki/Great-circle_distance.
    
    R = 3959; % radius of the earth (mi.)
    n = length(lon);
    D = zeros(n,n);
    for k = 1:n
        for j = 1:k-1
            D(k,j) = R*acos(sind(lat(k))*sind(lat(j)) + ...
                cosd(lat(k))*cosd(lat(j))*cosd(lon(k)-lon(j)));
            D(j,k) = D(k,j);
        end
    end
end

function len = path_length(p,D)
% len = path_length(p,D)

    % Calculate current path length for traveling salesman problem.
    % This function calculates the total length of the current path
    % p in the traveling salesman problem.
    %
    % This is a vectorized distance calculation.
    %
    % Create two vectors: p and q = p([n 1:(n-1)]).
    % The first is the list of first cities in any leg, and the second
    % is the list of destination cities for that same leg.
    % (the second vector is an element-shift-right version of the first)
    %
    % Use column indexing into D to create a vector of
    % lengths of each leg which can then be summed.
    
    n = length(p);
    q = p([n 1:(n-1)]);
    len = sum(D(q + (p-1)*n));
    
end


Get the MATLAB code

Published with MATLAB® R2018a

Note

Comments are closed.

4 CommentsOldest to Newest

Hi Imre -- Yes, the Optimization Toolbox has a TSP example. But I'm not sure it is guaranteed to find the global minimum. The description says, "The smallness of the absolute gap implies that the solution is either optimal or has a total length that is close to optimal." There is also a contribution to the File Exchange, number 13680, by Joseph Kirk, that uses a genetic algorithm and that is highly regarded. I haven't used it myself. I doubt that it finds a guaranteed optimal solution. There is a question about coordinate systems. Is the earth flat? The Optimization Toolbox example uses longitude and latitude directly without computing great circle distances. That might be why it converges to a different tour. Some TSP models I noticed briefly on the Web claim to be using actual road mileage. But the biggest reason I like what I have in my blog post is that the code is easy to read, understand and use. We are going to suggest it to high school kids who are participating in M3, the SIAM/MathWorks math modeling challenge. More on that later. -- Cleve
Mary Fenelon replied on : 3 of 4
Hi Cleve—Imre is right, there are ways to find the guaranteed global shortest path and the TSP example in Optimization Toolbox implements a basic version of an exact algorithm using intlinprog. Now, it’s true that these methods may require an exponential number of steps in the worst case, but with both the theoretical and computational advancements over the last 30-40 years, many problems are solved in a reasonable time, much, much less than the worst case bound. It's often quite reasonable to use an exact approach. Bill Cook’s TSP web page is a great place to learn more. The success in solving TSPs is one facet of the larger story of the amazing progress made in solving integer programs in the same time period. To see if your tour is optimal, I modified the TSP example to use your distance function and state capitol data. Running it takes about 2 seconds on my laptop and indeed, your solution is optimal. Your approach of moving cities is a 2-opt heuristic. Others have extended that to swaps of 3 or more cities. Wikipedia cites the Lin-Kernighan-Johnson approach which uses swaps and includes ideas from genetic algorithms; the article says that it often gets high-quality tours on large problems. Whether or not you need a guaranteed optimum depends on the application. For many, a high-quality heuristic tour is good enough. How do you know if you have such a heuristic? It's ideal if you have an exact approach to compare with, as we do with the TSP. Then, as you did, try a lot of random seeds to see how often you get acceptable solutions. About the comment on the gap: if the gap is zero, the solution has been proved optimal. The comment about small values is because intlinprog stops when the gap falls below a tolerance that by default is nonzero.