基于另一个列值创建一个列,该基础是基于将值分配给来自输入列的字符串值集

时间:2019-02-09 00:28:56

标签: python python-3.x pandas

我的问题似乎必须有一个简单的解决方案,但我无法解决。我已经尝试过.locnp.wheredf.apply

#input          
datetime        dty dtx     status
2018-09-16 04:38:17 0.0 0.099854    F-On
2018-09-16 04:38:18 0.0 0.100098    F-On
2018-09-16 04:38:19 0.0 0.000000    S-On
2018-09-16 04:38:20 0.0 0.100098    F-On
2018-09-16 04:38:21 0.0 0.100098    circ    
2018-09-16 04:38:22 0.0 0.100098    circInS
2018-09-16 04:38:21 0.0 0.100098    TH
2018-09-16 04:38:21 0.0 0.100098    R
2018-09-16 04:38:21 0.0 0.100098    S

“映射”来自域-

    (F-On,S-On) becomes 'On'
    (circ,TH,circInS) becomes 'fooON'
    (R) stays 'R'
    (S) stays 'S'

#expected ouput         
datetime        dty dtx     status grouped_status               
2018-09-16 04:38:17 0.0 0.099854    F-On    On
2018-09-16 04:38:18 0.0 0.100098    F-On    On
2018-09-16 04:38:19 0.0 0.000000    S-On    On
2018-09-16 04:38:20 0.0 0.100098    F-On    On
2018-09-16 04:38:21 0.0 0.100098    circ    fooON
2018-09-16 04:38:22 0.0 0.100098    circInS fooON
2018-09-16 04:38:21 0.0 0.100098    TH  fooON
2018-09-16 04:38:21 0.0 0.100098    R   R
2018-09-16 04:38:21 0.0 0.100098    S   S
  

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I understand the code below is comparing an array to a single value; 这是模棱两可的,因此失败了。为了逐行比较,我尝试使用df.apply,但没有给出期望的输出。

如果可能的话,如何使以下所有三种方法都起作用,这是进行逐行操作的最佳方法?

#using np.where
df['grouped_status'] = np.where(df['status'] in ('circ','TH','circInS'), 'fooON', df['status'])

#using df.loc
df.loc[df['status'] in ('circ','TH','circInS'),['status']] = 'fooON'
df['grouped_status'] = df['status']

#function for df.apply
def group_status_fn (row):  

    val = ""

    if row['grouped_status'] in ('F-On','B-On','S-On'):
        row['grouped_status'] = 'On'
    elif row['grouped_status'] in (circ,TH,circInS):
        row['grouped_status'] = fooON

    elif row['grouped_status'] == 'R':
        val = 'R'
    elif row['grouped_status'] == 'S':
        val = 'S'

    return val

#using df.apply
df["grouped_status2"]=df.apply(group_status_fn, axis = 1)

#out - output column half empty
datetime        dHD     status grouped_status grouped_status2               

2018-09-16 04:38:35 0.000000    F-On    F-On    
2018-09-16 04:38:36 0.000000    F-On    F-On    
2018-09-16 04:38:37 0.000000    S-On    S-On    
2018-09-16 04:38:38 0.000000    S-On    S-On    
2018-09-16 04:38:39 0.000000    R   R   R
2018-09-16 04:38:40 0.099854    R   R   R
2018-09-16 04:38:41 0.100098    R   R   R
2018-09-16 04:38:42 0.000000    R   R   R
2018-09-16 04:38:43 0.000000    R   R   R

1 个答案:

答案 0 :(得分:1)

使用map

#include <cstdio>
#include <iterator>
#include <numeric>

int main ( )
{
    int const input [] = { 1, 2, 3, 4, 5, 6 };

    // computes sum of squares
    auto const add_square = [] ( int x, int y ) { return x + y * y; };
    int result = std::accumulate
        ( std::cbegin (input)
        , std::cend (input)
        , 0
        , add_square
        );

    std::printf ( "\n%i\n", result );

    return 0;
}

输出

lookup = {'F-On' : 'On', 'S-On' : 'On', 'circ':'fooON', 'TH':'fooON', 'circInS':'fooON', 'R':'R', 'S':'S'}
df['grouped_status'] = df.status.map(lookup)