Jac-in k-means聚类

时间:2017-04-14 05:12:18

标签: neo4j cypher

在Cypher中,你如何修改k-means来考虑Jaccard距离Dj而不是欧几里德距离?

Jaccard距离定义为Dj = 1-(|A∩B|)/(|A∪B|)

1 个答案:

答案 0 :(得分:0)

以下是如何使用Cypher(来自Recommendations Neoj Sandbox)计算Jaccard距离的示例:

MATCH (m:Movie {title: "Inception"})-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(other:Movie)
WITH m, other, COUNT(g) AS intersection, COLLECT(g.name) AS i
MATCH (m)-[:IN_GENRE]->(mg:Genre)
WITH m,other, intersection,i, COLLECT(mg.name) AS s1
MATCH (other)-[:IN_GENRE]->(og:Genre)
WITH m,other,intersection,i, s1, COLLECT(og.name) AS s2
WITH m,other,intersection,s1,s2
WITH m,other,intersection,s1+filter(x IN s2 WHERE NOT x IN s1) AS union, s1, s2
RETURN m.title, other.title, s1,s2,((1.0*intersection)/SIZE(union)) AS jaccard ORDER BY jaccard DESC LIMIT 100

计算完毕后,可以将其与k-means算法一起使用。你是如何运行k-means的?还在Cypher?