将新列添加到df中

时间:2018-06-01 16:26:23

标签: dataframe datatable

对于A,它看起来像这样。

Name    Date       Value     NewColumn   other columns
A       2000-01      0.5      
A       2001-03      0.4      0
A       2002-02      1.0      1  
A       2003-05      0.9      0
A       2004-06      0.9
A       2006-03      0.4        <- no previous year

2 个答案:

答案 0 :(得分:1)

df = read.table(text = "
Name      Date       Value
A       2000-01      0.5
A       2001-03      0.4 
A       2002-02      1.0
A       2003-05      0.9
A       2004-06      0.9
A       2006-03      0.4 
", header=T, stringsAsFactors=F)

library(dplyr)

df %>%
  group_by(Name) %>%                                # for each name
  mutate(change = Value/lag(Value)-1,               # get the change in value (increase or decrease)
         year = as.numeric(substr(Date, 1, 4)),     # get the year from the date
         NewColumn = case_when(change > 0.01 & lag(year) == year-1 ~ 1,         # if change is more than 1% and the previous row is 1 year before flag as 1
                               change < -0.01 & lag(year) == year-1 ~ 0)) %>%   # if change is less than 1% and the previous row is 1 year before flag as 0
  ungroup()

# # A tibble: 6 x 6
#   Name  Date    Value  change  year NewColumn
#   <chr> <chr>   <dbl>   <dbl> <dbl>     <dbl>
# 1 A     2000-01   0.5  NA      2000        NA
# 2 A     2001-03   0.4  -0.200  2001         0
# 3 A     2002-02   1     1.5    2002         1
# 4 A     2003-05   0.9  -0.100  2003         0
# 5 A     2004-06   0.9   0      2004        NA
# 6 A     2006-03   0.4  -0.556  2006        NA

您可以删除一些不必要的变量。我离开他们只是为了帮助你了解这个过程是如何运作的。

答案 1 :(得分:1)

由于问题已用data.table标记,因此这是一个相应的解决方案,它使用NA和逻辑值的一些棘手的算法:

library(data.table)
setDT(DT)[order(Date), NewColumn := {
  yr <- year(lubridate::ymd(Date, truncated = 1L))
  chg <- Value / shift(Value) - 1.0
  NA^(yr - shift(yr) != 1L) * NA^(!abs(chg) > 0.01) * (sign(chg) / 2.0 + 0.5)
}, by = Name][]
   Name    Date Value NewColumn
1:    A 2000-01   0.5        NA
2:    A 2001-03   0.4         0
3:    A 2002-02   1.0         1
4:    A 2003-05   0.9         0
5:    A 2004-06   0.9        NA
6:    A 2006-03   0.4        NA

这里的诀窍是使用NA^0为1且NA^1NAFALSE对应0和TRUE对1的事实,所以

NA^c(FALSE, TRUE)

返回

[1]  1 NA

数据

library(data.table)
DT <- fread("Name      Date       Value
A       2000-01      0.5
A       2001-03      0.4 
A       2002-02      1.0
A       2003-05      0.9
A       2004-06      0.9
A       2006-03      0.4 ")