随着属性数量的增加,Neo4J中节点创建的吞吐量显着下降

时间:2019-08-13 07:32:30

标签: neo4j

我正在进行A / B测试,以测量Neo4J中创建节点的吞吐量。而且我发现,随着属性数量的增加,用于创建节点的吞吐量显着下降。

设置:Neo4j群集3.5.7(3个核心实例,其中一个是领导者,其余两个是跟随者)

TestA:用于测量在Neo4j中创建节点的吞吐量,其中每个节点具有20个属性。

TestB:用于测量在Neo4j群集3.5.7(其中每个节点具有40个属性)中创建节点的吞吐量。

结果:TestB的吞吐量= 1/2 * TestA的吞吐量

下面是我用来生成负载并测量吞吐量的代码。

import org.neo4j.driver.v1.*;
import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;


public class UnwindCreateNodes {

    Driver driver;
    static int start;
    static int end;

    public UnwindCreateNodes(String uri, String user, String password) {
        Config config = Config.build()
                .withConnectionTimeout(10, TimeUnit.SECONDS)
                .toConfig();
        driver = GraphDatabase.driver(uri, AuthTokens.basic(user, password), config);
    }


    private void addNodes() {
        List<Map<String, Object>> listOfProperties = new ArrayList<>();
        for (int inner = start; inner < end; inner++) {
            Map<String, Object> properties = new HashMap<>();
            properties.put("name", "Jhon " + inner);
            properties.put("last", "Alan" + inner);
            properties.put("id", 2 + inner);
            properties.put("key", "1234" + inner);
            properties.put("field5", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field6", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field7", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field8", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field9", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field10", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field11", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field12", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field13", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field14", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field15", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field16", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field17", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field18", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field19", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field20", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field21",  "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field22", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field23", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field24", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field25", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field26", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field27", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field28", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field29", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field30", "kfhc iahf uheguehuguaeghuszjxcb sd");

            properties.put("field31",  "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field32", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field33", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field34", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field35", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field36", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field37", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field38", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field39", "kfhc iahf uheguehuguaeghuszjxcb sd");
            properties.put("field40", "kfhc iahf uheguehuguaeghuszjxcb sd");
            listOfProperties.add(properties);
        }

        int noOfNodes = 0;
        for (int i = 0; i < listOfProperties.size() / 5000; i++) {
            List<Map<String, Object>> events = new ArrayList<>();
            for (; noOfNodes < (i + 1) * (5000) && noOfNodes < listOfProperties.size(); noOfNodes++) {
                events.add(listOfProperties.get(noOfNodes));
            }
            Map<String, Object> apocParam = new HashMap<>();
            apocParam.put("events", events);
            String query = "UNWIND $events AS event CREATE (a:Label) SET a += event";
            Instant startTime = Instant.now();
            try (Session session = driver.session()) {
                session.writeTransaction((tx) -> tx.run(query, apocParam));
            }
            Instant finish = Instant.now();
            long timeElapsed = Duration.between(startTime, finish).toMillis();
            System.out.println("######################--timeElapsed NODES--############################");
            System.out.println("no of nodes per batch " + events.size());
            System.out.println(timeElapsed);
            System.out.println("############################--NODES--############################");
        }
    }

    public void close() {
        driver.close();
    }

    public static void main(String... args) {
        start = 200001;
        end = 400001;
        if (args.length == 2) {
            start = Integer.valueOf(args[0]);
            end = Integer.valueOf(args[1]);
        }
        UnwindCreateNodes unwindCreateNodes = new UnwindCreateNodes("bolt+routing://x.x.x.x:7687", "neo4j", "neo4j");
        unwindCreateNodes.addNodes();
        unwindCreateNodes.close();
    }
}

下面是图形。

enter image description here

插入5000个节点需要3.5秒,其中每个节点具有40个属性

插入5000个节点需要花费1.8秒,其中每个节点具有20个属性

这是一个显着的放缓,而对于属性数量而言,这不是很大的40。我有一个要求,直到100个属性,但如果我不能缩放40个属性,我不确定如何缩放100个属性?

我尝试过的其他方法是使用apoc.periodic.iterate,取出UNWIND,不使用UNWIND,仅使用CREATE等,但是这种行为仍然存在。

我不想将属性存储在RDBMS等某些外部存储中,因为这对我来说使事情变得复杂,因为我正在构建一个通用应用程序,而我不知道将使用什么属性。

我不能使用CSV工具,因为我的数据来自Kafka,而且数据结构也不符合CSV工具想要的方式。所以没有适合我的CSV工具。

有什么想法可以加快速度吗?

0 个答案:

没有答案