用于pos数据的python中的数据清理for循环

时间:2017-11-09 09:26:59

标签: python for-loop dataframe data-cleaning

我有留言店的pos数据。 数据如附图所示。 enter image description here

##read data from csv
data = pd.read_csv('test1.csv')
#make a kist for each column
sales_id = list(data['sales_id'])
shop_number = list(data['shop_number'])
sales = list(data['sales'])
cashier_no = list(data['cashier_no'])
messager_no = list(data['messager_no'])
type_of_sale = list(data['type_of_sale'])
costomer_ID = list(data['costomer_ID'])
type_of_sale = list(data['type_of_sale'])
date = list(data['date'])
time = list(data['time'])

我想制作一个新列表,显示应删除购买数据。 像这样:

data_to_clean= [0,1,0,1,0,0,1,0,1]

要做到这一点,我想制作一个for循环

for i in range(len(type_of_sale)):
    data_to_clean=[]
    if type_of_sale[i] == "purchase":
        data_to_clean = data_to_clean.append(0)
    elif type_of_sale[i] == "return":
        data_to_clean = data_to_clean.append(1)
        ## I want to write a code so I can delete purchasse data too 
        #with conditions if it has the same shop_number,messager_no,costomer_ID and -price

    return list(data_to_clean)

此代码中存在两个主要问题。一个它不动。第二,我不知道如何检查shop_numbermessager_nocostomer_ID以将1或0放入我的data_to_clean列表中。 有时候我必须检查上面的数据,例如sales_id(1628060),有时候会检查sales_id(1599414)以下的数据 知道收银员可能会有所不同。 但constomer_Id应始终相同。

问题是如何编写代码,以便我可以创建一个包含0和1的列表或数据框,以显示应删除哪些数据。

2 个答案:

答案 0 :(得分:0)

如果要在Python中将数据与字符串进行比较,则应将此string放在qoutes中:

for i in range(len(type_of_sale)):
        data_to_clean=[]
        if type_of_sale[i] == "purchase": # here
            data_to_clean = data_to_clean.append(0)
        elif type_of_sale[i] == "return": # and here
            data_to_clean = data_to_clean.append(1)

答案 1 :(得分:0)

检查pandas doc。获取返回订单的商品可以像

一样简单
returns = data.loc[data['type_of_sale'] == 'return']

如果您想要出纳90

的销售
data.loc[(data['type_of_sale'] == 'purchase') & (data['cashier_no'] == 90)]