如何使用行索引创建基于函数的计算列

时间:2014-12-29 05:00:45

标签: python python-2.7 pandas dataframe calculated-columns

我的df如下

BINS
SKILL      object
LOGIN      object
50.0      float64
100.0     float64
150.0     float64
200.0     float64
250.0     float64
300.0     float64
350.0     float64
400.0     float64
450.0     float64
500.0     float64
550.0     float64
600.0     float64
650.0     float64
700.0     float64
750.0     float64
800.0     float64
850.0     float64
900.0     float64
950.0     float64
1000.0    float64
dtype: object

以下是使用的数据示例:HMDrr.head()。values

array([[‘Skill1’, ‘loginA’, 0.07090909090909091, 0.25, 0.35,
        0.147619047619047616, 0.057823529411764705, 0.0,
        0.0, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
        nan],
       [‘Skill1’, ‘loginB’, nan, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [‘Skill1’, ‘loginC’, 0.15, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [‘Skill1’, ‘loginD’, 0.3333333333333333,
        0.1857142857142857, 0.0, 0.15, 0.1, 0.0, 0.05666666666666667,
        0.06692307692307693, 0.05692307692307693, 0.13529411764705882, 0.1,
        0.0, nan, nan, nan, nan, nan, nan, nan, nan],
       [‘Skill1’, ‘loginE’, 0.1, 0.0, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]], dtype=object)

我有按工作类型(SKILL)的员工数据(LOGIN)。数字列是垃圾箱。每个bin包含其50个交互的性能结果,然后是100个,依此类推。我需要通过SKILL和LOGIN计算斜率和截距,以便我可以创建一个新的员工绩效坡道计划。

为此,我构建了以下内容:

#Bins for contacts
startBin = 0.0
stopBin = 1000.0
incrementBin = 50.0
sortBins = np.arange(startBin, stopBin + incrementBin, incrementBin)
binLabels = np.arange(startBin + incrementBin, stopBin + incrementBin, incrementBin)

#Caculate logarithimic slope in HMDrr Dataset
def calc_slope(z):
    y = HMDrr.loc[z,binLabels].dropna()
    number = y.count()+1
    y = y.values.astype(float)
    x = np.log(range(1,number,1))
    slope, intercept, r, p, stderr = linregress(x, y)
    return slope
#Caculate logarithimic intercept in HMDrr Dataset
def calc_intercept(z):
    y = HMDrr.loc[z,binLabels].dropna()
    number = y.count()+1
    y = y.values.astype(float)
    x = np.log(range(1,number,1))
    slope, intercept, r, p, stderr = linregress(x, y)
    return intercept

当我通过手动放置z值来运行它运行正常:

calc_slope(10)
-0.018236067481219649

我想在df中使用上述函数创建SLOPE和INTERCEPT列。

我尝试过各种各样的事情,例如:

HMDrr['SLOPE'] = calc_slope(HMDrr.index)

TypeError                                 Traceback (most recent call last)
<ipython-input-717-4a58ad29d7b0> in <module>()
----> 1 HMDrr['SLOPE'] = calc_slope(HMDrr.index)

<ipython-input-704-26a18390e20c> in calc_slope(z)
      7 def calc_slope(z):
      8     y = HMDrr.loc[z,binLabels].dropna()
----> 9     x = np.log(range(1,y.count()+1,1))
     10     slope, intercept, r, p, stderr = linregress(x, y)
     11     return slope

C:\Anaconda\lib\site-packages\pandas\core\series.pyc in wrapper(self)
     67             return converter(self.iloc[0])
     68         raise TypeError(
---> 69             "cannot convert the series to {0}".format(str(converter)))
     70     return wrapper
     71 

TypeError: cannot convert the series to <type 'int'>

我也尝试过使用apply函数,但很可能我做错了。我的猜测是我要么没有正确地为列使用函数,要么我得到的值不是整数。我已经尝试了好几天,所以现在我正在寻求帮助......

如何使用上述函数生成列,以便获取行特定数据?

1 个答案:

答案 0 :(得分:0)

虽然可能不是最好的方法,但我解决了以下问题。

构建了一个calc_linear函数来返回斜率和截距:

#Caculate logarithimic slope and intercept in HMDrr Dataset
def calc_linear(z):
    y = HMDrr.loc[z,binLabels].dropna()
    number = y.count()+1
    y = y.values.astype(float)
    x = np.log(range(1,number,1))
    slope, intercept, r, p, stderr = linregress(x, y)
    return slope, intercept

为数据创建空列:

#Create metric columns
HMDrr['SLOPE'] = ""
HMDrr['INTERCEPT'] = ""

使用for循环填充列:

#For loop to calculate metrics
for x in range(0,HMDrr.SLOPE.count()):
    values = calc_linear(x)
    HMDrr.SLOPE[x] = values[0]
    HMDrr.INTERCEPT[x] = values[1]

如果有更清洁的方式,那么我很乐意听到它:)