Question

我找到了一个对我来说很有效的答案，并且也会发布。

如何获取数据帧列表并创建一个新的df，该df的非索引列中仅包含共享相同值的行？

基本上是一个交集，但是由于多种原因，concat和merge对我来说不起作用。

我查看了以下内容，但没有得到我所需要的：

Finding common rows (intersection) in two Pandas dataframes

Pandas merge df error

How to get intersection of dataframes based on column labels?

Intersection of multiple pandas dataframes

How to find intersection of dataframes based on multiple columns?

Intersection of pandas dataframe with multiple columns

How to do intersection of dataframes in pandas

Answer 1

这就是我最终实现的目标。想看看是否有人还有其他更有效的方法。

import copy

dfs_array = [ df1, df2, df3, df4, ... ]

def intersection_of_dfs(dfs_array,col='Ticker'):
    if len(dfs_array) <= 1:
        # if we only have 1 or 0 elements simply return the origial array
        # no error is given, logic must be created for the return value.
        return dfs_array
    # does a shallow copy only.
    dfs = copy.copy(dfs_array)
    length = len(dfs) 
    while length > 1:
        df1 = dfs.pop()
        df2 = dfs.pop()
        df0 = df1.loc[ df1[col].isin( df2[col].values ) ]
        dfs.insert(0,df0)
        length = len(dfs)
    return dfs

使用合并的建议不会起作用，因为它会破坏索引和列标题。

这是合并所提供的：

>     [   open_x_x  high_x_x  low_x_x  close_x_x  volume_x_x Ticker  ...  LowAboveShort_y_y  ShortAboveLong_y_y  Return_y_y  DayDiff_y_y 
> AboveBelow_y_y  ShortToLong_y_y
>     0     52.60     52.68    52.24    52.4779        7632   AADR  ...            0.28214            1.087176    0.043298     2.600000             2.0         8.000000
>     1     14.03     14.03    14.03    14.0300         359   AAMC  ...            0.17472            0.628733    0.202228     1.333333             7.0         2.600000
>     2      2.15      2.15     1.72     1.9500       10095   AAME  ...           -0.20068            0.107564    0.114286     1.000000             1.0         0.636364
>     
>     [3 rows x 61 columns]]

这是下面的代码给出的：

>     [             open   high    low    close  volume Ticker  Difference     LongMA   ShortMA  HighBelowShort  LowAboveShort 
> ShortAboveLong    Return   DayDiff  AboveBelow  ShortToLong
>     timestamp                                                                                                                                                                           
>     2019-12-12  52.60  52.68  52.24  52.4779    7632   AADR      0.1379  50.870684  51.95786         0.72214        0.28214        1.087176  0.043298  2.600000         2.0     8.000000
>     2019-12-12  14.03  14.03  14.03  14.0300     359   AAMC     -0.0100  13.226547  13.85528         0.17472        0.17472        0.628733  0.202228  1.333333         7.0     2.600000
>     2019-12-12   2.15   2.15   1.72   1.9500   10095   AAME      0.1900   1.813116   1.92068         0.22932       -0.20068        0.107564  0.114286  1.000000         1.0     0.636364]

请注意如何与列标题一样维护时间戳。

如何在非索引列上找到多个熊猫数据框的交集

1 个答案: