如果行没有包含关键字,请删除行

时间:2017-10-27 15:57:44

标签: r

我根据时间标准(代码:见下文)用Rfacebook提取facebook帖子,并且想要删除所有结果(即数据框中的行),其中一列("消息")没有' t包含一个关键词。

我唯一的解决方案'使用grep只留下该列的内容。有人可以帮帮我吗?

代码:

#   RETRIEVING DATA
BBCpage <- getPage(page="bbcnews", token=fb_oauth, n=20, since="2017-05-03", feed=FALSE, reactions=TRUE, verbose=TRUE)

BBCpage$message
#Now I only want to keep the rows where the field "message" contains one of my keywords "Brexit" or "European Union"

# possibility 1: not working, since I end up with ONLY the content of 'messages, not the entire row    
       pattern <- "Brexit|European Union"
       grep(pattern, BBCpage, ignore.case=TRUE, perl = FALSE, value = TRUE, fixed = FALSE, useBytes = FALSE, invert = FALSE)



# possibility 2: not working, no filter applied
  matches <- c("Brexit", "European Union")
  BBCfiltered <- BBCpage[!(BBCpage$message %in% matches), ]

有人可以帮我弄清楚如何应用过滤器吗?

非常感谢,

Ivo

- 编辑:根据请求:这是输出:用于运行以下代码:

BBCpage <- getPage(page="bbcnews", token=fb_oauth, n=20, since="2017-05-03", feed=FALSE, reactions=TRUE, verbose=TRUE)

> dput(BBCpage)
structure(list(id = c("228735667216_10155253874762217", "228735667216_10155253984962217", 
"228735667216_10155254016922217", "228735667216_1510422315708643", 
"228735667216_10155254242117217", "228735667216_10155254357457217", 
"228735667216_10155254531807217", "228735667216_10155254645177217", 
"228735667216_10155254739207217", "228735667216_10155254848077217", 
"228735667216_10155255021777217", "228735667216_10155255187982217", 
"228735667216_10155255303912217", "228735667216_10155255312537217", 
"228735667216_10155255092167217", "228735667216_10155256112042217", 
"228735667216_10155256182962217", "228735667216_10155256278057217", 
"228735667216_1993087934041388", "228735667216_10155256481732217"
), likes_count = c(24996, 1385, 1280, 8870, 2104, 5906, 5813, 
15842, 9313, 3315, 944, 6485, 1638, 1638, 2045, 4356, 2098, 1305, 
237, 741), from_id = c("228735667216", "228735667216", "228735667216", 
"228735667216", "228735667216", "228735667216", "228735667216", 
"228735667216", "228735667216", "228735667216", "228735667216", 
"228735667216", "228735667216", "228735667216", "228735667216", 
"228735667216", "228735667216", "228735667216", "228735667216", 
"228735667216"), from_name = c("BBC News", "BBC News", "BBC News", 
"BBC News", "BBC News", "BBC News", "BBC News", "BBC News", "BBC News", 
"BBC News", "BBC News", "BBC News", "BBC News", "BBC News", "BBC News", 
"BBC News", "BBC News", "BBC News", "BBC News", "BBC News"), 
    message = c("The Catalan parliament votes to declare independence from Spain - as Madrid looks set to impose direct rule.", 
    "As Halloween approaches, we are revisiting a spooky American classic. Goosebumps books were a scary children's book series that have been around for 25 years.  We were #LIVE with Tim Jacobus, the artist behind the creepy cover art.", 
    "Do hotel comparison sites really give you the best deal?", 
    "The first official exhibition about the late pop icon Prince has opened in London - with the help of his little sister. <ed><U+00A0><U+00BC><ed><U+00BE><U+00B8><U+2728>  #MyNameisPrince\n\n(via BBC Entertainment News)", 
    "British-born novelist Christina Baker Kline says the ex-president \"squeezed my butt\" as she posed for a photo.", 
    "Ecstatic scenes in Barcelona as Catalonia’s parliament votes to declare independence from Spain - but Madrid has approved direct rule over the region.\n\nbbc.in/2zbEyCn", 
    "Her husband dropped her at a doctor's appointment in 1975 - and that was the last he ever heard of her.", 
    "<ed><U+00A0><U+00BC><ed><U+00BE><U+0083><ed><U+00A0><U+00BD><ed><U+00B0><U+00BE> No tricks, just treats for these animals at Halloween. <ed><U+00A0><U+00BD><ed><U+00B0><U+00BE><ed><U+00A0><U+00BC><ed><U+00BE><U+0083>", 
    "“We are pure. We are strong. We are brave. And we will fight.”\n\nRose McGowan's message to women in her first public remarks since accusing Harvey Weinstein of rape.", 
    "Downing Street said the declaration was based on an illegal vote. But The Scottish Government said it respected Catalonia's position.", 
    "Surely this should have been: \"Eleven things you need to know about Stranger Things\"... <ed><U+00A0><U+00BE><ed><U+00B4><U+00A6><U+200D><U+2640><U+FE0F>", 
    "\"You have no weight problems, that's the good news.\"\n\nPresident Donald J. Trump handed out Halloween treats and the odd trick to journalists' children on their trip to the Oval Office.", 
    "The actresses are the latest women to make allegations against film director James Toback.", 
    "\"Why are you asking me what I wore? It should not happen, no means no.\"", 
    "A pair of US speed climbers have cracked an \"unbeatable\" record for scaling one of the world's best known rock faces - El Capitan.", 
    "Cambridge University say the online repository has \"never seen numbers like this before\".", 
    "Spain's Deputy PM Soraya Saenz de Santamaria is put in charge of Catalonia after its government was dismissed.", 
    "Did you get enough sleep last night?", "\"Sometimes, I think coming into the studio with you John is a bit like going into Harvey Weinstein's bedroom.\"\n\nUK environment secretary Michael Gove apologises for what he says was his \"clumsy attempt at humour\" on a special edition of BBC Radio 4's Today programme. bbc.in/2idoZPk\n\n(Via BBC Politics)", 
    "Rescuers save caimans from a sticky situation in Brazil."
    ), created_time = c("2017-10-27T13:36:37+0000", "2017-10-27T14:32:50+0000", 
    "2017-10-27T14:34:09+0000", "2017-10-27T15:20:00+0000", "2017-10-27T16:13:54+0000", 
    "2017-10-27T17:04:07+0000", "2017-10-27T17:53:05+0000", "2017-10-27T18:44:23+0000", 
    "2017-10-27T19:29:38+0000", "2017-10-27T20:21:24+0000", "2017-10-27T21:09:17+0000", 
    "2017-10-27T22:11:04+0000", "2017-10-27T22:45:09+0000", "2017-10-27T22:50:13+0000", 
    "2017-10-27T23:44:00+0000", "2017-10-28T07:15:39+0000", "2017-10-28T08:17:01+0000", 
    "2017-10-28T09:18:02+0000", "2017-10-28T10:28:12+0000", "2017-10-28T11:14:21+0000"
    ), type = c("link", "video", "link", "video", "link", "video", 
    "link", "video", "video", "link", "link", "video", "link", 
    "link", "video", "link", "link", "link", "video", "video"
    ), link = c("http://bbc.in/2zTuomQ", "https://www.facebook.com/bbcnews/videos/10155253984962217/", 
    "http://bbc.in/2y9oCAc", "https://www.facebook.com/bbcnews/videos/1510422315708643/", 
    "http://bbc.in/2ia2Q4M", "https://www.facebook.com/bbcnews/videos/10155254357457217/", 
    "http://bbc.in/2iaQ3if", "https://www.facebook.com/bbcnews/videos/10155254645177217/", 
    "https://www.facebook.com/bbcnews/videos/10155254739207217/", 
    "http://bbc.in/2zW9sLZ", "http://bbc.in/2z9SHQr", "https://www.facebook.com/bbcnews/videos/10155255187982217/", 
    "http://bbc.in/2zcSkVm", "http://bbc.in/2zUQc1E", "https://www.facebook.com/bbcnews/videos/10155255092167217/", 
    "http://bbc.in/2zelIu3", "http://bbc.in/2zfgXQY", "http://bbc.in/2ybP2S4", 
    "https://www.facebook.com/bbcnews/videos/1993087934041388/", 
    "https://www.facebook.com/bbcnews/videos/10155256481732217/"
    ), story = c(NA, "BBC News was live.", NA, NA, NA, NA, NA, 
    NA, NA, NA, "BBC News shared BBC Entertainment News's post.", 
    NA, NA, NA, NA, NA, NA, NA, NA, NA), comments_count = c(1982, 
    412, 164, 2778, 1069, 963, 246, 727, 707, 896, 97, 3111, 
    198, 167, 232, 100, 385, 158, 147, 18), shares_count = c(10001, 
    198, 235, 2756, 262, 1677, 567, 4358, 1634, 602, 2, 1850, 
    75, 188, 363, 296, 231, 283, 33, 81), love_count = c(2294, 
    203, 23, 2224, 36, 625, NA, 2744, NA, 249, 83, NA, 55, 49, 
    94, NA, NA, NA, 8, NA), haha_count = c(549, 19, 67, 11, 697, 
    148, NA, 605, NA, 224, 26, NA, 24, 9, 4, NA, NA, NA, 73, 
    NA), wow_count = c(6987, 31, 66, 256, 169, 898, NA, 76, NA, 
    136, 7, NA, 101, 30, 249, NA, NA, NA, 13, NA), sad_count = c(392, 
    2, 1, 26, 85, 134, NA, 5, NA, 83, 1, NA, 218, 183, 1, NA, 
    NA, NA, 3, NA), angry_count = c(398, 17, 10, 6, 305, 183, 
    NA, 2, NA, 865, 0, NA, 32, 248, 2, NA, NA, NA, 61, NA)), .Names = c("id", 
"likes_count", "from_id", "from_name", "message", "created_time", 
"type", "link", "story", "comments_count", "shares_count", "love_count", 
"haha_count", "wow_count", "sad_count", "angry_count"), row.names = c(1L, 
2L, 3L, 19L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 12L, 13L, 14L, 11L, 
15L, 16L, 17L, 20L, 18L), class = "data.frame")
> 

- 编辑2: 其中一条评论有效(见下面的答案);谢谢r2evans

1 个答案:

答案 0 :(得分:0)

r2evans的建议似乎奏效了。我稍微修改了代码并做了这个:

            BBC_page_relevant <- BBC_page[grepl(pattern, BBC_page$message, ...),]

这似乎有效,将相关帖子存储在data.frame BBC_page_relevant。

非常感谢快速而有帮助的回复。 最好, IVO

相关问题