hadoop mapreduce反之亦然

时间:2016-01-06 13:35:00

标签: hadoop mapreduce

我有以下示例数据,我正在使用它来学习hadoop mapreduce。 例如,它是跟随者和跟随者的数据。

Follower,followee   
    a,b
    a,c
    a,d
    c,b
    b,d
    d,a
    b,c
    b,e
    e,f

就像a跟随b,a跟随c等等......

我正在尝试操纵数据并获得结果,如果a跟随b而b也跟随a,则b应该是输出txt文件中的结果。我是新来的地图减少并试图找到一种方式,以便我可以得到以下结果。

 a,d
 c,b

1 个答案:

答案 0 :(得分:3)

你可以通过一招来实现这一目标。

诀窍是将键传递给reducer,使得(a,d)和(d,a)具有相同的键并最终在同一个reducer中:

当(a,d)到来时:

JQuery

当(d,a)来时:

'a' < 'd', hence emit:
key => a,d
value => a,d

键的形成方式总是在较高的字母表之前出现较低的字母。因此,对于这两个记录,关键是&#34; a,d&#34;

因此mapper的输出将为:

'd' > 'a', hence emit:
key => a,d
value => d,a

现在,在Reducers中,记录将按以下顺序到达:

Record: a,b
Key = a,b  Value = a,b

Record: a,c
Key = a,c  Value = a,c

Record: a,d
Key = a,d  Value = a,d

Record: c,b
Key = b,c  Value = c,b

Record: b,d
Key = b,d  Value = b,d

Record: d,a
Key = a,d  Value = d,a

Record: b,c
Key = b,c  Value = b,c

Record: b,e
Key = b,e  Value = b,e

Record: e,f
Key = e,f  Value = e,f

因此,在reducer中,您只需解析记录3和4:

Record 1: 
    Key = a,b  Value = a,b

Record 2: 
    Key = a,c  Value = a,c

Record 3: 
    Key = a,d  Value = a,d
    Key = a,d  Value = d,a

Record 4: 
    Key = b,c  Value = c,b
    Key = b,c  Value = b,c

Record 5: 
    Key = b,d  Value = b,d

Record 6: 
    Key = b,e  Value = b,e

Record 7: 
    Key = e,f  Value = e,f

因此,输出将是:

Record 3: 
    Key = a,d  Value = a,d
    Key = a,d  Value = d,a

Record 4: 
    Key = b,c  Value = c,b
    Key = b,c  Value = b,c

即使你有名字而不是字母,这个逻辑也会有效。 对于例如你需要在mapper中使用以下逻辑(其中s1是第一个字符串,s2是第二个字符串):

a,d
c,b

所以,如果你有:

String key = "";
int compare = s1.compareToIgnoreCase(s2);
if(compare >= 0)
    key = s1 + "," + s2;
else
    key = s2 + "," + s1;

密钥将是:

String s1 = "Stack";
String s2 = "Overflow";

同样,如果你有:

Stack,Overflow

仍然,关键是:

s1 = "Overflow";
s2 = "Stack";