R在数据框列上应用用户定义函数

时间:2015-01-14 09:24:12

标签: r dataframe apply

在R中的

我有一个函数define来计算2个字符串之间的交集:

containedin <- function(t1,t2){
  return length(Reduce(intersect, strsplit(c(t1,t2), "\\s+"))) 
}

我想在包含2个字符串列的数据框上应用此函数: data.selected [C(&#39;关键字&#39;&#39;标题&#39)]

keywords                                                                             title
1  Samsung UN48H6350 48" Samsung UN48H6350 48" Full 1080p Smart HDTV 120Hz with Wi-Fi +$50 Visa Gift Card
2  Samsung UN48H6350 48"     Samsung UN48H6350 48" Full HD Smart LED TV -Bundle- (See Below for Contents)
3  Samsung UN48H6350 48"      Samsung UN48H6350 48" Class Full HD Smart LED TV -BUNDLE- See below Details
4  Samsung UN48H6350 48"     Samsung UN48H6350 48" Full HD Smart LED TV With BD-H5100 Blu-ray Disc Player
5  Samsung UN48H6350 48"                 Samsung UN48H6350 48" Smart 1080p Clear Motion Rate 240 LED HDTV
6  Samsung UN48H6350 48"            Samsung UN48H6350 - 48-Inch Full HD 1080p Smart HDTV 120Hz with Wi-Fi
7  Samsung UN48H6350 48"               Samsung 6350 Series UN48H6350 48" 1080p HD LED LCD Internet TV NEW
8  Samsung UN48H6350 48"  Samsung Un48h6350af 75" 1080p Led-lcd Tv - 16:9 - Hdtv 1080p - (un75h6350afxza)
9  Samsung UN48H6350 48"                         Samsung UN48H6350 - 48" HD 1080p Smart HDTV 120Hz Bundle
10 Samsung UN48H6350 48"   Samsung UN48H6350 - 48-Inch Full HD 1080p Smart HDTV 120Hz with Wi-Fi, (R#416)

如何使用apply函数应用于这两列,以返回带有结果的新列?

1 个答案:

答案 0 :(得分:4)

首先,您的return语句应该会给您一个错误。你可能意味着

containedin <- function(t1,t2){
  length(Reduce(intersect, strsplit(c(t1,t2), "\\s+"))) 
}

无论如何,您可以使用mapply来解决问题。

mapply(containedin, 
       as.character(data.selected[, 'keywords']), 
       as.character(data.selected[, 'title']))

as.character仅在class(data.selected[, 'keywords'])factor(而不是character)时才需要{/ 1}}