

15-1 Longest simple path in a directed acyclic graph

Suppose that we are given a directed acyclic graph $G = (V, E)$ with real-valued edge weights and two distinguished vertices $s$ and $t$ . Describe a dynamic-programming approach for finding a longest weighted simple path from $s$ to $t$ . What does the subproblem graph look like? What is the efficiency of your algorithm?

We will make use of the optimal substructure property of longest paths in acyclic graphs. Let $u$ be some vertex of the graph. If $u = t$, then the longest path from $u$ to $t$ has zero weight. If $u \ne t$, let $p$ be a longest path from $u$ to $t$. Path $p$ has at least two vertices. Let $v$ be the second vertex on the path. Let $p'$ be the subpath of $p$ from $v$ to $t$ ($p'$ might be a zero-length path). That is, the path $p$ looks like

$$u \to v \overset{p'}{\leadsto} t.$$

We claim that $p'$ is a longest path from $v$ to $t$.

To prove the claim, we use a cut-and-paste argument. If $p'$ were not a longest path, then there exists a longer path $p''$ from $v$ to $t$. We could cut out $p'$ and paste in $p''$ to produce a path $u \to v \overset{p''}{\leadsto} t$ which is longer than $p$, thus contradicting the assumption that $p$ is a longest path from $u$ to $t$.

It is important to note that the graph is acyclic. Because the graph is acyclic, path $p''$ cannot include the vertex u, for otherwise there would be a cycle of the form $u \to v \leadsto u$ in the graph. Thus, we can indeed use $p''$ to construct a longer path. The acyclicity requirement ensures that by pasting in path $p''$ , the overall path is still a simple path (there is no cycle in the path). This difference between the cyclic and the acyclic case allows us to use dynamic programming to solve the acyclic case.

Let $dist[u]$ denote the weight of a longest path from $u$ to $t$. The optimal substructure property allows us to write a recurrence for $dist[u]$ as

$$ dist[u] = \begin{cases} 0 & \text{if $u = t$}, \\ \max\limits_{(u, v)\in E}{w(u, v) + dist[v]} & \text{otherwise}. \end{cases} $$

This recurrence allows us to construct the following procedure:

LONGEST-PATH-AUS(G, u, t, dist, next)
    if u == t
        dist[u] = 0
        return (dist, next)
    else if next[u] ≥ 0
        return (dist, next)
    else next[u] = 0
        for each vertex v ∈ G.Adj[u]
            (dist, next) = LONGEST-PATH-AUX(G, v, t, dist, next)
            if w(u, v) + dist[v] > dist[u]
                dist[u] = w(u, v) + dist[v]
                next[u] = v
    return (dist, next)

(See Section 22.1 for an explanation of the notation $G.Adj[u]$.)

$\text{LONGEST-PATH-AUX}$ is a memoized, recursive procedure, which returns the tuple $(dist, next)$. The array $dist$ is the memoized array that holds the solution to subproblems. That is, after the procedure returns, $dist[u]$ will hold the weight of a longest path from $u$ to $t$. The array $next$ serves two purposes:

It holds information necessary for printing out an actual path. Specifically, if $u$ is a vertex on the longest path that the procedure found, then $next[u]$ is the next vertex on the path.
The value in $next[u]$ is used to check whether the current subproblem has been solved earlier. A value of at least zero indicates that this subproblem has been solved earlier.

The first if condition checks for the base case $u = t$. The second if condition checks whether the current subproblem has already been solved. The for loop iterates over each adjacent edge($u, v)$ and updates the longest distance in $dist[u]$.

What is the running time of $\text{LONGEST-PATH-AUX}$? Each subproblem represented by a vertex $u$ is solved at most once due to the memoization. For each vertex, we examine its adjacent edges. Thus, each edge is examined at most once, and the overall running time is $O(E)$. (Section 22.1 discusses how we achieve $O(E)$ time by representing the graph with adjacency lists.)

The $\text{PRINT-PATH}$ procedure prints out the path using information stored in the next array:

PRINT-PATH(s, t, next)
    u = s
    print u
    while u != t
        print "→" next[u]
        u = next[u]

The $\text{LONGEST-PATH-MAIN}$ procedure is the main driver. It creates and initializes the $dist$ and the $next$ arrays. It then calls $\text{LONGEST-PATH-AUX}$ to find a path and $\text{PRINT-PATH}$ to print out the actual path.

LONGEST-PATH-MAIN(G, s, t)
    n = |G.V|
    let dist[1..n] and next[1..n] be new arrays
    for i = 1 to n
        dist[i] = -∞
        next[i] = -1
    (dist, next) = LONGEST-PATH-AUX(G, s, t, dist, next)
    if dist[s] == -∞
        print "No path exists"
    else print "The weight of the longest path is" dist[s]
        PRINT-PATH(s, t, next)

Initializating the dist and next arrays takes $O(V)$ time. Thus the overall running time of $\text{LONGEST-PATH-MAIN}$ is $O(V + E)$.

Alternative solution

We can also solve the problem using a bottom-up aproach. To do so, we need to ensure that we solve "smaller" subproblems before we solve "larger" ones. In our case, we can use a topological sort (see Section 22.4) to obtain a bottom-up procedure, imposing the required ordering on the vertices in $\Theta(V + E)$ time.

LONGEST-PATH2(G, s, t)
    let dist[1..n] and next[1..n] be new arrays
    topologically sort the vertices of G
    for i = 1 to |G.V|
        dist[i] = -∞
    dist[s] = 0
    for each u in topological order, starting from s
        for each edge (u, v) ∈ G.Adj[u]
            if dist[u] + w(u, v) > dist[v]
                dist[v] = dist[u] + w(u, v)
                next[u] = v
    print "The longest distance is" dist[t]
    PRINT-PATH(s, t, next)

The running time of $\text{LONGEST-PATH2}$ is $\Theta(V + E)$.