使用dplyr

时间:2018-04-13 17:13:42

标签: r dplyr

我有多个代码和月度退货数据的财务数据。我想创建一个函数来计算夏普比率,只需将股票代码作为字符串提供。

library(dplyr)
library(reshape2)
library(lubridate)

# Define function to calculate sharpe ratio
sharpe_ratio <- function(ticker)
{
  # Create data frame containing annualized returns for each year for ticker 
  # of interest
  df1 %>% subset(ticker == ticker) %>%
     group_by(year(date)) %>% 
      summarize(annual.return = prod(1 + mret.excess) - 1) %>% 
      as.data.frame -> annualized_returns

  # Calculate Sharpe Ratio with annualized metrics
  mu <- mean(annualized_returns$annual.return)
  sigma <- sd(annualized_returns$annual.return)
  return (mu/sigma)
}

然而,当我尝试:

sharpe_ratio("YACKX")

它根本不对数据框(df1)进行子集化,而且无论股票输入如何,我都会得到所有回报的平均年化回报。但是,如果在函数中我替换

... %>% subset(ticker == ticker) %>% ...

... %>% subset(ticker == "YACKX") %>% ...

该函数现在可以正确地设置我的数据帧。我很好奇为什么当我尝试使用正式参数对数据帧进行子集化时,它不起作用,但是当我&#34;修复&#34;通过在引号中键入自动收报机名称来表示其行为正确的子集。

以下是一个示例数据集:

date <- as.Date(c("2000-01-31", "2000-02-29", "2000-03-31", "2000-04-30", "2000-05-31", 
                     "2000-06-30", "2000-07-31", "2000-08-31", "2000-09-30", "2000-10-31", 
                     "2000-11-30", "2000-12-31", "2001-01-31", "2001-02-28", "2001-03-31", 
                     "2001-04-30", "2001-05-31", "2001-06-30", "2001-07-31", "2001-08-31", 
                     "2001-09-30", "2001-10-31", "2001-11-30","2001-12-31", "2000-01-31", 
                     "2000-02-29", "2000-03-31", "2000-04-30", "2000-05-31", "2000-06-30", 
                     "2000-07-31", "2000-08-31", "2000-09-30", "2000-10-31", "2000-11-30", 
                     "2000-12-31", "2001-01-31", "2001-02-28", "2001-03-31", "2001-04-30",
                     "2001-05-31", "2001-06-30", "2001-07-31", "2001-08-31", "2001-09-30", 
                     "2001-10-31", "2001-11-30","2001-12-31"))
tickers <- c(rep("YACKX",24), rep("APIMX",24))
mret.excess <- c(-0.0743128, -0.0798149,  0.0571812, -0.0408150,  0.0277273,  0.0535117, 
                 -0.0181185,  0.0591170, -0.0019288, 0.0786993,  0.0017027,  0.0220814,  
                 0.0170490,  0.0061800, -0.0368087,  0.0216363,  0.0356446, -0.0066351,
                 0.0335736,  0.0006140, -0.0795808,  0.0238521, 0.1076750,  0.0290756, 
                 -0.0566304,  0.0328873,  0.0552739, -0.0458054, -0.0402790,  0.0265851, 
                 -0.0344774,  0.0860904, -0.0575071, -0.0814842, -0.0872155, 0.0028902,
                 0.0470691, -0.1203689, -0.0896772,  0.0995483, -0.0048447, -0.0242168, 
                 -0.0257273, -0.0711448, -0.1155542, 0.0540500,  0.0880436,  0.0202195)
df1 <- data.frame(date_ex,tickers_ex,returns_ex ,stringsAsFactors = FALSE)

对于YACKX,我的夏普比率输出应为[1] 1.997946,但我得到[1] -1.186262。 同样,对于APIMX,我的夏普比率输出应为[1] -7.231879,但我得到-1.186262。因此,我知道数据不是正确的子集。

1 个答案:

答案 0 :(得分:2)

你的功能有一些错误。而不是子集,只需使用filter而不是subset,您在df1中有列名称代码,而不是代码。见下文。

library(dplyr)
library(lubridate)
sharpe_ratio <- function(ticker)
{
  # Create data frame containing annualized returns for each year for ticker 
  # of interest
  df1 %>% filter(tickers == ticker) %>%
    group_by(year(date)) %>% 
    summarize(annual.return = prod(1 + mret.excess) - 1) %>% 
    as.data.frame -> annualized_returns

  # Calculate Sharpe Ratio with annualized metrics
  mu <- mean(annualized_returns$annual.return)
  sigma <- sd(annualized_returns$annual.return)
  return (mu/sigma)
}

sharpe_ratio("YACKX")
[1] 1.997946
sharpe_ratio("APIMX")
[1] -7.231879