在Neo4J中为每个人生成N个推荐

时间:2016-03-01 15:05:10

标签: neo4j cypher collaborative-filtering

我关注Neo4j中的协作过滤器this tutorial。 在本教程中,我们首先创建一个玩具电影图,如下所示:

LOAD CSV WITH HEADERS FROM "https://neo4j-contrib.github.io/developer-resources/cypher/movies_actors.csv" AS line
WITH line
WHERE line.job = "ACTED_IN"
MERGE (m:Movie {title:line.title}) ON CREATE SET m.released = toInt(line.released), m.tagline = line.tagline
MERGE (p:Person {name:line.name}) ON CREATE SET p.born = toInt(line.born)
MERGE (p)-[:ACTED_IN {roles:split(line.roles,";")}]->(m)
RETURN count(*);

接下来,我们为Tom Hanks提出五个可能的合作者:

MATCH (tom:Person)-[:ACTED_IN]->(movie1)<-[:ACTED_IN]-(coActor:Person),
       (coActor)-[:ACTED_IN]->(movie2)<-[:ACTED_IN]-(coCoActor:Person)
WHERE tom.name = "Tom Hanks"
AND   NOT    (tom)-[:ACTED_IN]->()<-[:ACTED_IN]-(coCoActor)
RETURN coCoActor.name, count(distinct coCoActor) as frequency
ORDER BY frequency DESC
LIMIT 5

如果我想对每个参加过#P; Apollo 13&#34;?的人进行这样的操作怎么办?换句话说,我的任务是为每个参与过#P; Apollo 13&#34;的人提出5个可能的共同演员。 我该如何以有效的方式做到这一点?

1 个答案:

答案 0 :(得分:2)

这里有一些事情。您粘贴的查询确实没有任何意义:

RETURN coCoActor.name, COUNT(DISTINCT coCoActor) AS frequency

这将始终返回1的频率,因此您的ORDER BY毫无意义。

我认为你的意思是:

RETURN coCoActor.name, COUNT(DISTINCT coActor) AS frequency

第二件事是你不需要变量movie1movie2;它们不会在您的查询中再次使用。

最后,你需要声明你不会向他或她自己推荐同一个演员:

WHERE actor <> coCoActor

要真正回答你的问题:

// Find the Apollo 13 actors.
MATCH (actor:Person)-[:ACTED_IN]->(:Movie {title:"Apollo 13"})

// Continue with query.
MATCH (actor)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coActor:Person),
      (coActor)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor:Person)
WHERE NOT (actor)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor) AND
      actor <> coCoActor

// Group by actor and coCoActor, counting how many coActors they share as freq.
WITH actor, coCoActor, COUNT(DISTINCT coActor) AS freq

// Order by freq descending so that COLLECT()[..5] grabs the top 5 per row.
ORDER BY freq DESC

// Get the recommendations.
WITH actor, COLLECT({name: coCoActor.name, freq: freq})[..5] AS recos
RETURN actor.name, recos;
相关问题