嵌入式R除添加循环外尝试

时间:2018-10-22 14:27:34

标签: python r rpy2

我作为python / R新手正在关注以下博客,并且在向以下代码添加循环语句时遇到了麻烦。目前,我能够使代码完整运行,但仅输出1位客户的季节性标志。我希望它为我所有的客户循环运行。

datamovesme.com/2018/07/01/seasonality-python-code

##Here comes the R code piece     
     try:
          seasonal = r(''' 
          fit<-tbats(customerTS, seasonal.periods = 12, use.parallel = TRUE)
          fit$seasonal
          ''')
      except: seasonal = 1
      seasonal_output = seasonal_output.append({'customer_id':customerid, 'seasonal': seasonal}, ignore_index=True)
      print(f' {customerid} | {seasonal} ')
print(seasonal_output)
seasonal_output.to_csv(outfile)

我尝试了多种代码组合来使其循环,因此在此未列出太多。该博客显示了我们可以使用的现有数据帧和时间序列对象。我不确定要使用哪一个,以及如何将其传递给R代码。

谢谢!

1 个答案:

答案 0 :(得分:1)

博客链接包含以下问题:

  1. 代码未按Python语法的要求正确缩进行。可能是由于网站呈现空白或制表符,但由于缺少缩进更改输出,这对读者不利。

  2. 代码未能解决附加数据帧Never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying的效率低下的问题。取而代之的是,由于 seasonal 是一个值,因此可以构建一列字典,然后将其放入循环外的pd.DataFrame()构造函数中。

解决了上述问题并运行了整个代码块之后,您的解决方案应在所有 customerids 中输出数据帧。

# ... same above assignments ...
outfile = '[put your file path here].csv'
df_list = []

for customerid, dataForCustomer in filledIn.groupby(by=['customer_id']):
    startYear = dataForCustomer.head(1).iloc[0].yr
    startMonth = dataForCustomer.head(1).iloc[0].mnth
    endYear = dataForCustomer.tail(1).iloc[0].yr
    endMonth = dataForCustomer.tail(1).iloc[0].mnth

    #Creating a time series object
    customerTS = stats.ts(dataForCustomer.usage.astype(int),
                          start=base.c(startYear,startMonth),
                          end=base.c(endYear, endMonth), 
                          frequency=12)
    r.assign('customerTS', customerTS)

    ##Here comes the R code piece
    try:
        seasonal = r('''
                        fit<-tbats(customerTS, seasonal.periods = 12, use.parallel = TRUE)
                        fit$seasonal
                     ''')
    except: 
        seasonal = 1

    # APPEND DICTIONARY TO LIST (NOT DATA FRAME)
    df_list.append({'customer_id': customerid, 'seasonal': seasonal})
    print(f' {customerid} | {seasonal} ')

seasonal_output = pd.DataFrame(df_list)
print(seasonal_output)
seasonal_output.to_csv(outfile)