Python迭代一个简短的电子表格并将匹配的值复制到一个大型电子表格中

时间:2017-05-31 15:33:48

标签: python pandas

我正在使用pandas,尝试使用设备列表device_master.xlsx来自动填充主电子表格中的许多列,detailed_billing.xlsx。

我可以阅读这两张表的内容,我可以对我的主电子表格进行其他转换,但我是新手,我无法弄清楚如何遍历device_master的每一行。 xlsx工作表并将其与/填充主工作表中的必需列进行比较。

到目前为止,我对表单其余部分所做的工作如下:

default void forEachRemaining(Consumer<? super Integer> action) {
    if (action instanceof IntConsumer) {
    //   action's implementation is an example of Class Adapter Design Pattern
    //                                   |
        forEachRemaining((IntConsumer) action);
    }
    else {
    //  method reference expression is an example of Object Adapter Design Pattern
    //                                        |
        forEachRemaining((IntConsumer) action::accept);
    }
}

我想说的是:

import numpy as np
import pandas as pd
import csv
#import re

# import the list of devices with work order numbers and project codes:

devmaster_xls = pd.read_excel('device_master.xlsx', 'device master', header = [0], index_col = None)
print('Device Master sheet columns:', devmaster_xls.columns, '\n') #debug, check the columns are right


# import the billing information which will need transforming with work order/ project codes:

data_xls = pd.read_excel('DetailedBilling.xlsx', 'Sheet1', header = [0], index_col = None)
print('Billing sheet columns read in:', data_xls.columns, '\n') #debug, check the columns before

data_xls.insert(13, 'WO Ref', '')
data_xls.insert(14, 'WO Description', '')
data_xls.insert(15, 'Project Code', '')

print('Billing sheet columns with WO additions:', 
data_xls.columns, '\n') #debug, check the columns after

wait = input("Press enter to continue...")


# magic sauce to add work order and cost tracking goes in here
# magic sauce to add work order and cost tracking goes in here
#




data_writer = pd.ExcelWriter('DetailedBilling_edit.xlsx', engine = 'xlsxwriter')
data_xls.to_excel(data_writer, index = False)

#defining the book/sheet to work with
workbook = data_writer.book
worksheet = data_writer.sheets['Sheet1']

# formatting changes
worksheet.set_zoom(85)

server_fmt = workbook.add_format({'font_color': '#800000', 'bold': True})
dollar_fmt = workbook.add_format({'num_format': """_($* #,##0.00_);_($* -#,##0.00;_($* "0.00"_);_(@_)""", 'bold': True})
bold_fmt = workbook.add_format({'bold': True})

worksheet.set_column('A:A', 34, server_fmt)
worksheet.set_column('B:B', 85)
worksheet.set_column('F:F', 28)
worksheet.set_column('G:G', 9)
worksheet.set_column('H:K', 11, dollar_fmt)
worksheet.set_column('L:P', 12.5)
worksheet.set_column('O:O', 85)

#
#what size is this sheet? 
count_row = len(data_xls.index)
#count_row = data_xls.shape[0]
print("Total rows: ", count_row, '\n')

data_writer.save()

我已经尝试过把它放在一个循环中但是它还没有得到很好的帮助 - 任何帮助都会受到赞赏!

编辑:感谢@frankyjuang我获得了正确的数据,这很好,但出于某种原因,我无法将其写入电子表格。我这样做:

data_xls['WO Ref'].loc[(data_xls['Server'] = devmaster_xls['Device Name'])] = devmaster_xls['WO Ref']

它看起来不错,返回类似的内容:

>>> for index, row in data_xls.iterrows(): 
... rowdata = devmaster_xls.loc[devmaster_xls['Device Name'] == row['Server']]
... print(index, rowdata['WO Ref']) 

我试图插入这个:

555 19    REF###
Name: WO Ref, dtype: object
556 19    REF###
Name: WO Ref, dtype: object
557 19    REF###
Name: WO Ref, dtype: object
558 19    REF###
Name: WO Ref, dtype: object
559 19    REF###
Name: WO Ref, dtype: object
560 19    REF###
Name: WO Ref, dtype: object
561 19    REF###
Name: WO Ref, dtype: object
562 19    REF###
Name: WO Ref, dtype: object
563 19    REF###
Name: WO Ref, dtype: object

但是>>> for index, row in data_xls.iterrows(): ... rowdata = devmaster_xls.loc[devmaster_xls['Device Name'] == row['Server']] ... row['WO Ref'] = rowdata['WO Ref'] 显示行是NaN。

2 个答案:

答案 0 :(得分:0)

通过

迭代data_xls中的行
for index, row in data_xls.iterrows():
    row['WO Ref']    # Get the data in this way

通过

查找相应的行
devmaster_xls.loc[devmaster_xls['Device Name'] == some_value]

将它们结合起来

for index, row in data_xls.iterrows():
    the_row_you_want = devmaster_xls.loc[devmaster_xls['Device Name'] == row['WO Ref']]
    # do the operations you want

注意:

如果您希望多次这样做,首先制作索引更有效率,然后使用.loc

devmaster_xls = devmaster_xls.set_index(['Device Name'])
devmaster_xls.loc[row['WO Ref']]

更新

请注意,rowdata仍然是一个只有一行的小型数据框。为了获得它的价值,你不能直接rowdata['COLUMN']。相反,请按iloc[0]

删除单行
row['WO Ref'] = rowdata.iloc[0]['WO Ref'] 

或者,在iloc[0]

之后追加rowdata = ...
rowdata = devmaster_xls.loc[devmaster_xls['Device Name'] == row['Server']].loc[0]

答案 1 :(得分:0)

由于我的经验不足,我正在努力解决@frankyjuang提供的答案 - 我无法适应我所得到的结果与我想要达到的结果。因此,经过更多研究,我想出了以下解决方案,解决了这个问题:

首先,我们需要使用共享密钥为两个电子表格编制索引。在这种情况下,它是服务器名称,格式为servername1.comservername2.com等。但是 - 我不想永久更改我的数据框,因此,将创建一个新列可以用作索引。

这会复制服务器列,将其转换为小写以便以后解决任何不匹配问题,并将其设置为索引:

data_xls['Serverindex'] = data_xls['Server'].str.lower() 
data_xls.set_index('Serverindex', inplace = True)

抓住我的设备主表:

devmaster_xls = pd.read_excel('device_master.xlsx', 'device master', header = [0], index_col = None)

如上所述,从现有列创建索引,在途中将其转换为小写:

devmaster_xls['Devindex'] = devmaster_xls['Device Name'].str.lower() 
devmaster_xls.set_index('Devindex', inplace = True)

然后将相关数据从设备主表复制到主表中就像这样简单:

data_xls.loc[:,'WO Ref'] = devmaster_xls.loc[:,'WO Ref'] 
data_xls.loc[:,'WO Description'] = devmaster_xls.loc[:,'WO Description'] 
data_xls.loc[:,'Project Code'] = devmaster_xls.loc[:,'Project code']

最后,我们不想写出那个索引,所以:

data_xls = data_xls.reset_index(drop = True)
devmaster_xls = devmaster_xls.reset_index(drop = True)

如果这种方法真的很糟糕,我有兴趣找出原因,以及我可以做些什么来改进它。但它确实解决了这个问题并且很快实现了!