根据复杂规则识别data.frames中的行

时间:2015-12-09 15:50:05

标签: regex r dataframe

在之前的两个问题中,我问过如何根据复杂的规则识别和提取子串:

当前的问题涉及如何在data.frame结构中实现相同目的。假设你有data.frame如下:

data.frame(time = seq(1:10), 
event = c("FA", "EX", "I1", "FA", "FA", "I3", "EX", "EX", "EX", "I3"), 
actor = c("John", "Alex", "John", "Alex", "Tim", "Sandra", "Sara", "John", "Eliza", "Alex"))

time event actor
1    FA    John
2    EX    Alex
3    I1    John
4    FA    Alex
5    FA    Tim
6    I3    Sandra
7    EX    Sara
8    EX    John
9    EX    Eliza
10   I3    Alex

现在我想从1到10移动并对I3之前的所有行进行分组。这意味着我想返回两个data.frames的列表(第1-6行和第7-10行应该形成一个单独的data.frame放在一个公共列表中)。我怎么能做到这一点?

3 个答案:

答案 0 :(得分:2)

您可以使用function flipbit(inp:string) : string; var new : string; x:integer; begin writeln('new: ',new); writeln('inp: ',inp); new := ''; writeln('new assigned'); for x:= 1 to length(inp) do; begin writeln('loop started'); if strtoint(inp[x]) = 1 then begin new := new + '0'; writeln('0 added'); end; if strtoint(inp[x]) = 0 then begin new := new + '1'; writeln('1 added'); end else begin writeln('Something went wrong'); end; end; result := new; end;

split

答案 1 :(得分:0)

也有效:

i3.index = which(data$event == "I3")
i3.start = c(1, i3.index[-length(i3.index)]+1)

indexMatrix = cbind(from = i3.start, end = i3.index)

apply(indexMatrix, 1, function(x){data[x[1]:x[2],]})

# [[1]]
# time event  actor
# 1    1    FA   John
# 2    2    EX   Alex
# 3    3    I1   John
# 4    4    FA   Alex
# 5    5    FA    Tim
# 6    6    I3 Sandra
# 
# [[2]]
# time event actor
# 7     7    EX  Sara
# 8     8    EX  John
# 9     9    EX Eliza
# 10   10    I3  Alex

答案 2 :(得分:0)

这也有效:

library(dplyr)

data %>%
  arrange(time %>% desc) %>%
  mutate(group = cumsum(event == "I3")) %>%
  arrange(time) %>%
  group_by(group)