有向图中的最大公共子图

时间:2017-03-30 04:17:37

标签: python python-3.x graph networkx

我试图将一组句子表示为有向图,其中一个单词由一个节点表示。如果重复一个字,则不重复该节点,使用先前存在的节点。让我们将此图称为MainG。

在此之后,我采用一个新句子,创建这个句子的有向图(调用此图SubG),然后在MainG中查找SubG的最大公共子图。

我在Python 3.5中使用NetworkX api。我理解,因为这是普通图的NP完全问题,但是对于有向图,它是线性问题。我提到的其中一个链接:

How can I find Maximum Common Subgraph of two graphs?

我尝试执行以下代码:

import networkx as nx
import pandas as pd
import nltk

class GraphTraversal:
    def createGraph(self, sentences):
        DG=nx.DiGraph()
        tokens = nltk.word_tokenize(sentences)
        token_count = len(tokens)
        for i in range(token_count):
            if i == 0:
                continue
            DG.add_edges_from([(tokens[i-1], tokens[i])], weight=1)
        return DG


    def getMCS(self, G_source, G_new):
        """
        Creator: Bonson
        Return the MCS of the G_new graph that is present 
        in the G_source graph
        """
        order =  nx.topological_sort(G_new)
        print("##### topological sort #####")
        print(order)

        objSubGraph = nx.DiGraph()

        for i in range(len(order)-1):

            if G_source.nodes().__contains__(order[i]) and G_source.nodes().__contains__(order[i+1]):
                print("Contains Nodes {0} -> {1} ".format(order[i], order[i+1]))
                objSubGraph.add_node(order[i])
                objSubGraph.add_node(order[i+1])
                objSubGraph.add_edge(order[i], order[i+1])
            else:
                print("Does Not Contains Nodes {0} -> {1} ".format(order[i], order[i+1]))
                continue


    obj_graph_traversal = GraphTraversal()
    SourceSentences = "A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ."
    SourceGraph = obj_graph_traversal.createGraph(SourceSentences)


    TestSentence_1 = "not much of a story"    #ThisWorks
    TestSentence_1 = "not much of a story of what is good"    #This DOES NOT Works
    TestGraph = obj_graph_traversal.createGraph(TestSentence_1)

    obj_graph_traversal.getMCS(SourceGraph, TestGraph)

当我尝试进行拓扑排序时,第二个不起作用。

有兴趣了解可行的方法。

1 个答案:

答案 0 :(得分:1)

Bonson答案的编辑队列已满,但它不再适用于networkx 2.4,并且有一些可能的改进:

  • connected_component_subgraphs已在networkx 2.4中删除,应使用connected_components返回节点集。

  • 因为只有节点数才能找到最大的组件,所以可以大大简化。

  • 这不再专门针对最初的问题,因为如果搜索“有向图中的最大公共子图”(这是我需要完全不同的东西所需要的),这将是最好的选择

我的改编版本是:

def getMCS(g1, g2):
    matching_graph=networkx.Graph()

    for n1,n2 in g2.edges():
        if g1.has_edge(n1, n2):
            matching_graph.add_edge(n1, n2)

    components = networkx.connected_components(matching_graph)

    largest_component = max(components, key=len)
    return networkx.induced_subgraph(matching_graph, largest_component)

如果最后一行用return networkx.induced_subgraph(g1, largest_component)替换,它也应该可以正常工作并返回有向图。