从csv数据集创建python中的邻接矩阵

时间:2015-04-22 04:48:28

标签: python csv numpy adjacency-matrix

我的数据格式如下:

eventid    mnbr
20         1
26         1
12         2
14         2
15         3
14         3
10         3

eventid是一个成员参加数据的事件被表示为一个小组,因此您可以看到每个成员参加多个活动,多个成员可以参加同一个活动。我的目标是创建一个显示:

的邻接矩阵
 mnbr  1    2    3
 1     1    0    0
 2     0    1    1
 3     0    1    1

每当有两名成员参加同一活动时,会有1。我成功地将csv文件的列读入2个独立的1D numpy数组。然而,在这里,我不确定如何继续。如何使用第2列创建矩阵,以及如何使用第1列填充值?我知道我没有发布任何代码,并且不期望在这方面有任何解决方案,但会非常感谢如何以有效的方式解决问题。我有大约300万个观测值,因此创建太多外部变量会有问题。提前致谢。我收到一条通知,说我的问题可能是重复的,但我的问题是解析数据而不是创建邻接矩阵。

1 个答案:

答案 0 :(得分:4)

这是一个解决方案。它不直接为您提供所请求的邻接矩阵,而是为您提供自己创建它所需的内容。

#assume you stored every line of your input as a tuples (eventid, mnbr).
observations = [(20, 1), (26, 1), (12, 2), (14, 2), (15,3 ), (14, 3), (10, 3)]

#then creates an event link dictionary. i.e something that link every event to all its mnbrs
eventLinks = {}

for (eventid, mnbr) in observations :
    #If this event have never been encoutered then create a new entry in links
    if not eventid in eventLinks.keys():
        eventLinks[eventid] = []

    eventLinks[eventid].append(mnbr)

#collect the mnbrs
mnbrs = set([mnbr for (eventid, mnbr) in observations])

#create a member link dictionary. This one link a mnbr to other mnbr linked to it.
mnbrLinks = { mnbr : set() for mnbr in mnbrs }

for mnbrList in eventLinks.values() :
    #add for each mnbr all the mnbr implied in the same event.
    for mnbr in mnbrList:
        mnbrLinks[mnbr] = mnbrLinks[mnbr].union(set(mnbrList))

print(mnbrLinks)

执行此代码会产生以下结果:

{1: {1}, 2: {2, 3}, 3: {2, 3}}

这是一个字典,其中每个mnbr都有一组相关的邻接mnbrs。这实际上是一个邻接列表,它是一个压缩的邻接矩阵。您可以使用字典键和值作为行和列索引来扩展它并构建您请求的矩阵。

希望有所帮助。 亚瑟。

编辑:我提供了一种使用邻接列表的方法,让您实现自己的邻接矩阵构建。但是,如果数据稀疏,您应该考虑真正使用此数据结构。见http://en.wikipedia.org/wiki/Adjacency_list

编辑2:添加一个代码将adjacencyList转换为一个小的smart adjacencyMatrix

adjacencyList = {1: {1}, 2: {2, 3}, 3: {2, 3}}

class AdjacencyMatrix():

    def __init__(self, adjacencyList, label = ""):
        """ 
        Instanciation method of the class.
        Create an adjacency matrix from an adjacencyList.
        It is supposed that graph vertices are labeled with numbers from 1 to n.
        """

        self.matrix = []
        self.label = label

        #create an empty matrix
        for i in range(len(adjacencyList.keys())):
            self.matrix.append( [0]*(len(adjacencyList.keys())) )

        for key in adjacencyList.keys():
            for value in adjacencyList[key]:
                self[key-1][value-1] = 1

    def __str__(self):
        # return self.__repr__() is another possibility that just print the list of list
        # see python doc about difference between __str__ and __repr__

        #label first line
        string = self.label + "\t"
        for i in range(len(self.matrix)):
            string += str(i+1) + "\t"
        string += "\n"

        #for each matrix line :
        for row in range(len(self.matrix)):
            string += str(row+1) + "\t"
            for column in range(len(self.matrix)):
                string += str(self[row][column]) + "\t"
            string += "\n"


        return string

    def __repr__(self):
        return str(self.matrix)

    def __getitem__(self, index):
        """ Allow to access matrix element using matrix[index][index] syntax """
        return self.matrix.__getitem__(index)

    def __setitem__(self, index, item):
        """ Allow to set matrix element using matrix[index][index] = value syntax """
        return self.matrix.__setitem__(index, item)

    def areAdjacent(self, i, j):
        return self[i-1][j-1] == 1

m = AdjacencyMatrix(adjacencyList, label="mbr")
print(m)
print("m.areAdjacent(1,2) :",m.areAdjacent(1,2))
print("m.areAdjacent(2,3) :",m.areAdjacent(2,3))

此代码给出以下结果:

mbr 1   2   3   
1   1   0   0   
2   0   1   1   
3   0   1   1   

m.areAdjacent(1,2) : False
m.areAdjacent(2,3) : True