添加基于报告的层次结构级别

时间:2021-03-23 09:29:08

标签: python pandas networkx

数据 df

<块引用>
child parent
b     a
c     a
d     b
e     c
f     c
g     f

输出:

child   parent  level
b       a       1
c       a       1
d       b       2
e       c       2
f       c       2
g       f       3

根据此父子报告,“a”是主要父项,因为它不向任何人报告。 'b' 和 'c' 向 'a' 报告,因此它们的级别 = 1。'd' 和 'e' 向级别 1 (b,c) 报告,因此它们的级别 =2。 'g' 报告给 'f'(这是级别 2),因此级别 = 3 表示 'g'。请让我知道如何实现这一目标

我正在尝试下面的代码,但它不起作用

df['Level'] = np.where(df['parent'] == 'a',"level 1",np.nan)
dfm1 = pd.Series(np.where(df['Level'] == 'level 1', df['parent'],None))
df.loc[df['parent'].isin(dfm1),'Level'] = "level 2"

2 个答案:

答案 0 :(得分:5)

这是一种使用 networkx 的方法,我们可以在其中找到没有祖先并获得相同长度的方法

import networkx as nx

G = nx.from_pandas_edgelist(df,"parent","child",create_using=nx.DiGraph())
f = lambda x: len(nx.ancestors(G,x))
df['level'] = df['child'].map(f)

print(df)

  child parent  level
0     b      a      1
1     c      a      1
2     d      b      2
3     e      c      2
4     f      c      2
5     g      f      3

答案 1 :(得分:1)

这是第一性原理的解决方案:

# We will build the tree of relationships, using a helper node class
class Node:
    def __init__(self, value, parent=None, level=0):
        self.value = value
        self.parent = parent
        self.level = level
        self.children = []
    
    def set_child(self, child):
        child.level = self.level + 1
        self.children.append(child)

# Helper function to insert nodes
def insert(node, new_node):
    if new_node.parent == node.value:
        # if the new node is a child, insert it
        node.set_child(new_node)
    else:
        # otherwise, iterate over the children until you find its parent
        if node.children:
            for child in node.children:
                insert(child, new_node)

# gather the level information for the tree
def node_print(node, values=[]):
    if node.parent:
        values.append((node.value, node.parent, node.level))
    for child in node.children:
        values = node_print(child, values=values)
    return values

# Now get the data and build the tree
data = """b     a
c     a
d     b
e     c
f     c
g     f"""


rows = [y.split() for y in data.split("\n")]

for index, (child, parent) in enumerate(rows):
    if index == 0:
        node = Node(value=parent)
    
    child_node = Node(value=child, parent=parent)
    insert(node, child_node)

output = pd.DataFrame(data=node_print(node, values=[]), columns=['child', 'parent', 'level']).sort_values(by='level')

print(output)

  child parent  level
0     b      a      1
2     c      a      1
1     d      b      2
3     e      c      2
4     f      c      2
5     g      f      3


相关问题