2

So let's say I have an acyclic, directed graph G with nodes a0, a1, a2, b0, b1, b2, c0, c1, c2 and I know both the outgoing distance for each node to any of it's neighbors. What algorithm can I use to find the longest path between any two nodes that is less than a given length?

EDIT: I have looked at the Wikipedia page for the Longest path problem but I don't know whether to use the algorithm outlined in the 'Acyclic graphs and critical paths' section or the one in the 'Parameterized complexity' section. I'd have problems with either case. In the former, the algorithm requires you to have the incoming distance (I have the outgoing distance) and to be honest, the description in the latter goes over my head as far as implementation.

EDIT2: My completed graph is as follows

graph = {
             "a0": set(["a1", "b0])
             "a1": set(["a2", "b1"])
             "a2": set(["b2"])
             "b0": set(["b1", "c0"])
             "b1": set(["b2", "c1"])
             "b2": set(["c2"])
             "c0": set(["c1"])
             "c1": set(["c2"])
             "c2": set([])
}

I also have the outgoing distance in a separate dictionary e.g.

edge_distance = {
                 "a0": 0
                 "a1": 4
                 "a2": 12
                 "b0": 2
                 "b1": 1
                 "b2": 4
                 "c0": 7
                 "c1": 2
                 "c2": 1
}

Technically the only node for which I have the incoming distance is c2

The path I want to find is a0 to c2

EDIT3: My graph, toposorted:

   c2 b2 c1 a2 b1 c0 a1 b0 a0
chad
  • 385
  • 2
  • 8
  • 1
    How big is the DAG? Brute-forcing this on even a largish DAG shouldn't be too bad. – Patrick Maupin Jul 29 '15 at 02:38
  • 1
    Have you ever heard of A* search? This is an A Star Search problem with a modified stopping condition. Instead of stopping on the first solution you'll want to stop on the last solution before failure. – Matt Jul 29 '15 at 02:44
  • I think that this is close enough to you problem that very little extra code would be necessary: http://stackoverflow.com/questions/17985202/networkx-efficiently-find-absolute-longest-path-in-digraph pay attention to the post by Aric – jfish003 Jul 29 '15 at 03:28
  • 1
    Wikipedia says its NP-hard. That means its hard. Looks like its necessary to enumerate all paths between all pairs of distinct nodes discarding those that exceed the given length early. What's an algo for enumerating all paths between two nodes in a DAG? There is some info on this in other posts such as http://stackoverflow.com/questions/9535819/find-all-paths-between-two-graph-nodes –  Jul 29 '15 at 03:35
  • No, it's not NP-hard if it's a DAG. – Patrick Maupin Jul 29 '15 at 03:59
  • The implementation is actually fairly trivial. If you want to start collecting answers, you should probably post some Python that defines your data set. – Patrick Maupin Jul 29 '15 at 04:10
  • Also, if you have a dataset, using Python to convert between outgoing distance and incoming distance is trivial as well (but since it's a DAG, why do you care? The total distance is the same whether you're coming or going, right?) – Patrick Maupin Jul 29 '15 at 04:13
  • @PatrickMaupin I just added the graph – chad Jul 29 '15 at 12:11
  • Ah, now I see why you care -- your outgoing distance is a single value that is used for all the edges leaving a given node, rather than a value per target node :-) In any case, if you do a topological sort, and then work backwards, you can easily annotate each node with the set of possible distances to the target node. When you have finished your starting node, you just pick its largest distance that is smaller than your maximum length. – Patrick Maupin Jul 29 '15 at 12:24
  • @PatrickMaupin I'm very new (approx. a week) at programming/computer science etc. so if this is obvious, I apologize. When you say that I'll be able to "annotate each node with the set of possible distances to the target node" do you mean that for every node, I have to generate all possible distances to c2? By what means am I to figure out the possible paths from each node? – chad Jul 29 '15 at 12:45
  • You will generate all possible distances to c2 for each node, but you will do so by adding the distance out of each node to all the other nodes where you already have the distances. For your dataset, the alphabetic sort happens to be the same as the topological sort (that won't happen in real life), but anyway, you start with c2, and annotate its distance as a set containing the value 0, and then go backwards from c2 and annotate each node by adding its outgoing distance to each of the distances in each of the nodes it is directly connected to. – Patrick Maupin Jul 29 '15 at 13:04
  • When I annotate, should these all be in the form of one big list? Also, this is called dynamic programming right? Breaking a large problem down into overlapping subproblems – chad Jul 29 '15 at 13:09
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/84578/discussion-between-chad-and-patrick-maupin). – chad Jul 29 '15 at 13:44
  • I realized my answer uses a toposort that is backwards from the one you give. Is your node mapping from destination nodes to source nodes? My answer assumes the other direction. It's trivial to modify the code or write some code to swap them around, though. – Patrick Maupin Jul 30 '15 at 14:21

1 Answers1

1

'''

This meets the problem spec as given, except that it shows all distances between two points. It is trivial to choose the largest one that is below some threshold.

It is not at all optimized, and is not general, in that the problem spec as given has a single distance for every edge leaving any particular node. Nonetheless, it should be easy to modify to make more general.

Also, the major variables are global, but this is easily remedied as well.

NOTE: I am not sure the problem spec is quite correct, because the distance out of a0 is given as 0, but the distance out of c2 (which doesn't go connect anywhere) is given as 1. In any case the scenario (and the example code) is quite contrived, because it is highly unlikely that all edges leaving a given node would have the same length in real life.

'''

if 1:

import collections

# Data given in problem description

# Keys are starting nodes, values are destination nodes.

graph = {
            "a0": set(["a1", "b0"]),
            "a1": set(["a2", "b1"]),
            "a2": set(["b2"]),
            "b0": set(["b1", "c0"]),
            "b1": set(["b2", "c1"]),
            "b2": set(["c2"]),
            "c0": set(["c1"]),
            "c1": set(["c2"]),
            "c2": set([]),
}

# Length of every outbound edge from the node

edge_distance = {
                "a0": 0,
                "a1": 4,
                "a2": 12,
                "b0": 2,
                "b1": 1,
                "b2": 4,
                "c0": 7,
                "c1": 2,
                "c2": 1,
                }

def sort_graph():
    ''' Develop a sorted list using Kahn's algorithm
    '''
    # Number of incoming vertices to each node
    incoming = collections.defaultdict(int)

    #All nodes with vertices exiting them
    startnodes = set()

    # All nodes with vertices entering them
    endnodes = set()

    for start in graph:
        for end in graph[start]:
            startnodes.add(start)
            endnodes.add(end)
            incoming[end] += 1

    ordered = []
    startnodes -= endnodes
    while startnodes:
        start = startnodes.pop()
        ordered.append(start)
        for end in graph[start]:
            incoming[end] -= 1
            if not incoming[end]:
                startnodes.add(end)
    if sum(incoming.values()):
        raise ValueError("Graph has at least one cycle")

    return ordered

ordered = sort_graph()

def calc_dists(start):
    ''' Returns a dictionary containing all possible
        distances from a given start node to each other
        node, expressed as a set of possible distances
        for each target node.  The set will be empty
        if the target node is not reachable from the
        start node.
    '''
    dist_to_node = collections.defaultdict(set)
    dist_to_node[start].add(0)
    for start in ordered:
        cumulative = dist_to_node[start]
        dist = edge_distance[start]
        for end in graph[start]:
            dist_to_node[end].update(dist + prev for prev in cumulative)
    return dist_to_node

# Show all possible distances between a0 and c2
print(calc_dists('a0')['c2'])
Patrick Maupin
  • 8,024
  • 2
  • 23
  • 42