比较两个数据框并获得差异

时间:2019-10-03 23:51:11

标签: python python-3.x pandas

我想比较两个数据框并有选择地打印出我的差异。这是我要在图片中完成的操作:

数据框1

Dataframe 1

数据框2

Dataframe 2

所需的输出-数据帧3

Desired Output - Dataframe 3

到目前为止我尝试过什么?

import pandas as pd
import numpy as np

df1 = pd.read_excel("01.xlsx")
df2 = pd.read_excel("02.xlsx")

def diff_pd(df1, df2):
    """Identify differences between two pandas DataFrames"""
    assert (df1.columns == df2.columns).all(), \
        "DataFrame column names are different"
    if any(df1.dtypes != df2.dtypes):
       "Data Types are different, trying to convert"
        df2 = df2.astype(df1.dtypes)
    if df1.equals(df2):
        return None
    else:        # need to account for np.nan != np.nan returning True
        diff_mask = (df1 != df2) & ~(df1.isnull() & df2.isnull())
        ne_stacked = diff_mask.stack()
        changed = ne_stacked[ne_stacked]
        changed.index.names = ['id', 'Naziv usluge']
        difference_locations = np.where(diff_mask)
       changed_from = df1.values[difference_locations]
        changed_to = df2.values[difference_locations]
    return pd.DataFrame({'Service Previous': changed_from, 'Service Current': changed_to},
                            index=changed.index)

df3 = diff_pd(df1, df2)

df3 = df3.fillna(0)
df3 = df3.reset_index()

print(df3)

为了公平起见,我在另一个线程上发现了该代码,但是确实可以完成工作,但是仍然存在一些问题。

  1. 我的数据框不相等,我该怎么办?
  2. 我不完全理解我提供的代码。

谢谢!

1 个答案:

答案 0 :(得分:0)

从...开始更容易些吗?

尝试

import pandas as pd

data1={'Name':['Tom','Bob','Mary'],'Age':[20,30,40],'Pay':[10,10,20]}
data2={'Name':['Tom','Bob','Mary'],'Age':[40,30,20]}

df1=pd.DataFrame.from_records(data1)
df2=pd.DataFrame.from_records(data2)



# Checking Columns

for col in df1.columns:
    if col not in df2.columns:
        print(f"DF2 Missing Col {col}")

# Check Col Values 
for col in df1.columns:
    if col in df2.columns:
        # Ok we have the same column
        if list(df1[col]) == list(df2[col]):
            print(f"Columns {col} are the same")
        else:
            print(f"Columns {col} have differences")

它应该输出

DF2 Missing Col Pay
Columns Age have differences
Columns Name are the same

Python3.7需要或更改f字符串格式。

相关问题