我找到了一个对我来说很有效的答案,并且也会发布。
如何获取数据帧列表并创建一个新的df,该df的非索引列中仅包含共享相同值的行?
基本上是一个交集,但是由于多种原因,concat和merge对我来说不起作用。
我查看了以下内容,但没有得到我所需要的:
Finding common rows (intersection) in two Pandas dataframes
How to get intersection of dataframes based on column labels?
Intersection of multiple pandas dataframes
How to find intersection of dataframes based on multiple columns?
答案 0 :(得分:0)
这就是我最终实现的目标。想看看是否有人还有其他更有效的方法。
import copy
dfs_array = [ df1, df2, df3, df4, ... ]
def intersection_of_dfs(dfs_array,col='Ticker'):
if len(dfs_array) <= 1:
# if we only have 1 or 0 elements simply return the origial array
# no error is given, logic must be created for the return value.
return dfs_array
# does a shallow copy only.
dfs = copy.copy(dfs_array)
length = len(dfs)
while length > 1:
df1 = dfs.pop()
df2 = dfs.pop()
df0 = df1.loc[ df1[col].isin( df2[col].values ) ]
dfs.insert(0,df0)
length = len(dfs)
return dfs
使用合并的建议不会起作用,因为它会破坏索引和列标题。
这是合并所提供的:
> [ open_x_x high_x_x low_x_x close_x_x volume_x_x Ticker ... LowAboveShort_y_y ShortAboveLong_y_y Return_y_y DayDiff_y_y
> AboveBelow_y_y ShortToLong_y_y
> 0 52.60 52.68 52.24 52.4779 7632 AADR ... 0.28214 1.087176 0.043298 2.600000 2.0 8.000000
> 1 14.03 14.03 14.03 14.0300 359 AAMC ... 0.17472 0.628733 0.202228 1.333333 7.0 2.600000
> 2 2.15 2.15 1.72 1.9500 10095 AAME ... -0.20068 0.107564 0.114286 1.000000 1.0 0.636364
>
> [3 rows x 61 columns]]
这是下面的代码给出的:
> [ open high low close volume Ticker Difference LongMA ShortMA HighBelowShort LowAboveShort
> ShortAboveLong Return DayDiff AboveBelow ShortToLong
> timestamp
> 2019-12-12 52.60 52.68 52.24 52.4779 7632 AADR 0.1379 50.870684 51.95786 0.72214 0.28214 1.087176 0.043298 2.600000 2.0 8.000000
> 2019-12-12 14.03 14.03 14.03 14.0300 359 AAMC -0.0100 13.226547 13.85528 0.17472 0.17472 0.628733 0.202228 1.333333 7.0 2.600000
> 2019-12-12 2.15 2.15 1.72 1.9500 10095 AAME 0.1900 1.813116 1.92068 0.22932 -0.20068 0.107564 0.114286 1.000000 1.0 0.636364]
请注意如何与列标题一样维护时间戳。