大熊猫使用dataframe.shift()时表现怪异

时间:2019-06-17 20:09:15

标签: python pandas dataframe

我正在读取一些看起来像这样的数据:

Original data

在此数据集中,许多行在第16列中有null。我需要将此类行中的值向右移动,以便以“ *”开头的值(例如,第16列行) 4,第13列第5行等)将移到它们的右列。 (最终,我将循环执行此操作,以使这些值进入第16列)

这些值左侧的数据也必须移动。例如,当{第7列第16行}中的数据移动到{第8列,第16行}时,{第2列的第16行}中的数据应移动到{第3列第16行}。

但是,我不希望移动第1列(零索引列0)中的数据,因为我将其用作数据的索引。

因此,我的预期输出是这样:

Expected output after iteration 1

我正在使用下面的代码来实现这一目标:

import StringIO
import pandas

# Store the csv string in a variable and turn that into a dataframe
# This string here is the same as the data in the image above.
gps_string = """2010-01-12 18:00:00,$GPGGA,180439,7249.2150,N,11754.4238,W,2.0,10,0.9,-8.1,M,-12.4,M,,*57,,,
2010-01-12 17:30:00,$GPGGA,173439,7249.2160,N,11754.4233,W,2.0,11,0.8,-4.5,M,-12.4,M,,*5B,,,
2010-01-12 17:00:00,$GPGGA,170439,7249.2152,N,11754.4235,W,2.0,11,0.8,-3.1,M,-12.4,M,,*5C,,,
2010-01-12 16:30:00,N,11754.4210,W,2,9.0,1.1,-13.1,M,-12.4,M,,*6C,,,,,,
2010-01-12 16:00:00,N,11754.4229,W,2,10.0,0.9,-2.9,M,-12.4,M,,*53,,,,,,
2010-01-12 15:30:00,N,11754.4269,W,2,9.0,0.8,-4.3,M,-12.4,M,,*54,,,,,,
2010-01-12 15:00:00,N,11754.4267,W,2,10.0,0.8,-1.6,M,-12.4,M,,*56,,,,,,
2010-01-12 14:30:00,$GPGGA,143439,7249.2152,N,11754.4253,W,2.0,11,0.7,-4.3,M,-12.4,M,,*56,,,
2010-01-12 14:00:00,N,11754.4245,W,2,10.0,0.9,-7.0,M,-12.4,M,,*50,,,,,,
2010-01-12 13:30:00,$GPGGA,133439,7249.2134,N,11754.4243,W,2.0,11,0.7,-10.7,M,-12.4,M,,*61,,,
2010-01-12 13:00:00,N,11754.4245,W,2,10.0,0.8,-5.5,M,-12.4,M,,*56,,,,,,
2010-01-12 12:30:00,N,11754.4226,W,2,10.0,0.9,-7.1,M,-12.4,M,,*59,,,,,,
2010-01-12 12:00:00,N,11754.4238,W,2,10.0,0.8,-6.5,M,-12.4,M,,*51,,,,,,
2010-01-12 11:30:00,N,11754.4227,W,2,10.0,0.8,0.1,M,-12.4,M,,*73,,,,,,
2010-01-12 11:00:00,-7.4,M,-12.4,M,,*5F,,,,,,,,,,,,
2010-01-12 10:30:00,N,11754.4271,W,2,8.0,1.1,-8.4,M,-12.4,M,,*5A,,,,,,
""" 
# Read the csv string into a dataframe, with no headers
# Make the first column with timestamp values the index column.
gps_df = pd.read_csv(StringIO.StringIO(gps_string), header=None, 
index_col=0)
rows_to_shift = gps_df[gps_df[15].isnull()].index

# Shift the rows here.
gps_df.loc[rows_to_shift] = gps_df.loc[rows_to_shift].shift(periods=1, axis=1)
gps_df.to_csv("f.csv") # Creates a file after shift to see the output

执行代码后,我得到以下输出文件。

Output after Execution

由此我可以看到,由于某种原因,移位函数在第5列上创建了null(s)列,并且还将原来在第10列中的数据移动到了第15列中,这可能是因为案件?

这可能是dataframe.shift()函数中的错误吗?还是我在这里做错了什么?

1 个答案:

答案 0 :(得分:0)

这是熊猫中的错误,here中有更多详细信息。

似乎移动的对象列将自动移动到具有对象dtype的下一列。

为了解决此问题,我选择了要移位的索引,将数据框中的所有数据转换为字符串,执行移位,再次将数据作为csv字符串获取,然后重新创建数据框以获取以前的数据类型。

以下是我用来解决此问题的代码:

import pandas as pd
import StringIO

gps_string = """
"2010-01-12 18:00:00","$GPGGA","180439","7249.2150","N","11754.4238","W","2","10","0.9","-8.1","M","-12.4","M","","*57","","",""
"2010-01-12 17:30:00","$GPGGA","173439","7249.2160","N","11754.4233","W","2","11","0.8","-4.5","M","-12.4","M","","*5B","","",""
"2010-01-12 17:00:00","$GPGGA","170439","7249.2152","N","11754.4235","W","2","11","0.8","-3.1","M","-12.4","M","","*5C","","",""
"2010-01-12 16:30:00","N","11754.4210","W","2","09","1.1","-13.1","M","-12.4","M","","*6C","","","","","",""
"2010-01-12 16:00:00","N","11754.4229","W","2","10","0.9","-2.9","M","-12.4","M","","*53","","","","","",""
"2010-01-12 15:30:00","N","11754.4269","W","2","09","0.8","-4.3","M","-12.4","M","","*54","","","","","",""
"2010-01-12 15:00:00","N","11754.4267","W","2","10","0.8","-1.6","M","-12.4","M","","*56","","","","","",""
"2010-01-12 14:30:00","$GPGGA","143439","7249.2152","N","11754.4253","W","2","11","0.7","-4.3","M","-12.4","M","","*56","","",""
"2010-01-12 14:00:00","N","11754.4245","W","2","10","0.9","-7.0","M","-12.4","M","","*50","","","","","",""
"2010-01-12 13:30:00","$GPGGA","133439","7249.2134","N","11754.4243","W","2","11","0.7","-10.7","M","-12.4","M","","*61","","",""
"2010-01-12 13:00:00","N","11754.4245","W","2","10","0.8","-5.5","M","-12.4","M","","*56","","","","","",""
"2010-01-12 12:30:00","N","11754.4226","W","2","10","0.9","-7.1","M","-12.4","M","","*59","","","","","",""
"2010-01-12 12:00:00","N","11754.4238","W","2","10","0.8","-6.5","M","-12.4","M","","*51","","","","","",""
"2010-01-12 11:30:00","N","11754.4227","W","2","10","0.8","0.1","M","-12.4","M","","*73","","","","","",""
"2010-01-12 11:00:00","-7.4","M","-12.4","M","","*5F","","","","","","","","","","","",""
"2010-01-12 10:30:00","N","11754.4271","W","2","08","1.1","-8.4","M","-12.4","M","","*5A","","","","","",""

 """

gps_df = pd.read_csv(StringIO.StringIO(gps_string), header=None, index_col=0)
rows_to_shift = gps_df[gps_df[15].isnull()].index  # get the indexes to shift
gps_df_all_strings = gps_df.astype(str)  # Convert all the data to be of type str (string)

# Shift the data
gps_df_all_strings.loc[rows_to_shift] = gps_df_all_strings.loc[rows_to_shift].shift(periods=1, axis=1)
s = gps_df_all_strings.to_csv(header=None)  # Put shifted csv data into a string after shifting.
new_gps_df = pd.read_csv(StringIO.StringIO(s), header=None, index_col=0)  # re read csv data.