如何在ArangoDB中提高插入性能

时间:2014-10-04 12:18:54

标签: arangodb

我的环境是本地机器: ubuntu 12.04 ArangoDB 2.2.4或2.2.3 perl驱动程序(ArangoDB) CPU:3核6线程 内存:3GB

我使用了save方法。 Save方法等于HTTP_GET和HTTP_POST。 执行结果如下:

  1. 一个perl进程,插入30000个文件。平均700个请求/秒。 350 HTTP_GET和350 HTTP_POST。
  2. 10个perl进程,插入30000个文件。平均1000个请求/秒。 500 HTTP_GET和500 HTTP_POST。
  3. 运行30秒后,它将报告HTTP 500错误。我修改了perl驱动程序(ArangeDB)代码以重试它。所以我可以完成这个测试。

    arangodb的日志在报告HTTP 500错误时跟随。

    2014-10-04T14:46:47Z [26642] DEBUG [./lib/GeneralServer/GeneralServerDispatcher.h:403]   shutdownHandler called, but no handler is known for task
    2014-10-04T14:46:47Z [26642] DEBUG [./lib/GeneralServer/GeneralServerDispatcher.h:403] shutdownHandler called, but no handler is known for task
    

    我希望我的程序可以执行avg 3000-5000请求/ s并减少HTTP 500错误。我可以使用什么改进。谢谢!

    UPDATE BY 7/10/2014,我的插入示例脚本如下。我用AQL替换了save方法。 一个perl进程,插入10000个文档,平均900个请求/秒,1000个HTTP_POST / s。 (没有HTTP 500) 一个perl进程,插入30000个文档,平均700个请求/秒,700个HTTP_POST / s。 (会发出HTTP 500,需要重试)

    #!/usr/bin/perl
    
    use warnings;
    use strict;
    
    use ArangoDB;
    
    my $itdb = ArangoDB->new(
    {
        host       => '10.211.55.2',
        port       => 8529,
        keep_alive => 1,
    }
    );
    
    # Find or create collection
    $itdb->create('Node_temp',{isVolatile => JSON::true});
    ImpNodes();
    
    sub ImpNodes{
    
        for(1..30000){
            my $sth = $itdb->query('INSERT {
                "id": "Jony",
                "value": "File",
                "popup": "public",
                "version": "101",
                "machine": "10.20.18.193",
                "text": {
                   "Address": ["center","bold","250","100"]
                },
                "menuitem":[
                {
                    "value": "New",
                    "onclick": "CreateNewDoc",
                    "action": "CreateNewDoc"
                }
                ,
                {
                    "value": "Open",
                    "onclick": "OpenNewDoc",
                    "action": "OpenNewDoc"
                },
                {
                    "value": "Close",
                    "onclick": "CloseDoc",
                    "action": "CloseDoc"
                },
                {
                    "value": "Save",
                    "onclick": "SaveDoc",
                    "action": "SaveDoc"
                }]
            } in Node_temp');
    
            my $cursor = $sth->execute({
                do_count => 1,
                batch_size => 10,
            });
        }
    }
    

    我已修改Arangodb-0.08以便在Connection.pm中顺利插入。 http_post方法:

    $retries = 100 #for testing
    for(1..$retries){
        ( undef, $code, $msg, undef, $body ) = $self->{_http_agent}->request(
            %{ $self->{_req_args} },
            method     => 'POST',
            path_query => $path,
            headers    => $headers,
            content    => $data,
        );
        last if ( $code < 500 || $code >= 600 );
        print "The return code is 5xx,retry http_post!\n";
        print $code, " : " , $msg , " : " , $body;
        select(undef, undef, undef, 3);
    }
    

1 个答案:

答案 0 :(得分:1)

我对客户端程序进行了扫描,可以验证是否为每个请求打开了一个新连接。这会导致发出许多系统调用。对于每个请求,strace看起来像这样:

17300 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
17300 ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fffaee760c0) = -1 ENOTTY (Inappropriate ioctl for device)
17300 lseek(3, 0, SEEK_CUR)             = -1 ESPIPE (Illegal seek)
17300 ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fffaee760c0) = -1 ENOTTY (Inappropriate ioctl for device)
17300 lseek(3, 0, SEEK_CUR)             = -1 ESPIPE (Illegal seek)
17300 fcntl(3, F_SETFD, FD_CLOEXEC)     = 0
17300 setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
17300 fcntl(3, F_GETFL)                 = 0x2 (flags O_RDWR)
17300 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
17300 connect(3, {sa_family=AF_INET, sin_port=htons(8529), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
17300 select(8, NULL, [3], [3], {299, 999526}) = 1 (out [3], left {299, 999524})
17300 write(3, "POST /_api/cursor HTTP/1.1\r\nConnection: Keep-Alive\r\nUser-Agent: Furl::HTTP/3.05\r\nHost: 127.0.0.1\r\nContent-Type: application/json\r\nContent-Length: 1032\r\nHost: 127.0.0.1:8529\r\n\r\n", 176) = 176
17300 write(3, "{\"count\":true,\"query\":\"INSERT {\\n            \\\"id\\\": \\\"Jony\\\",\\n            \\\"value\\\": \\\"File\\\",\\n            \\\"popup\\\": \\\"public\\\",\\n            \\\"version\\\": \\\"101\\\",\\n            \\\"machine\\\": \\\"10.20.18.193\\\",\\n            \\\"text\\\": {\\n               \\"..., 1032) = 1032
17300 read(3, 0x15f0af0, 10240)         = -1 EAGAIN (Resource temporarily unavailable)
--
17300 close(3)                          = 0
17300 rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
17300 rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER, 0x7faa49b221f0}, {SIG_IGN, [], SA_RESTORER, 0x7faa49b221f0}, 8) = 0
17300 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
17300 rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
17300 rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER, 0x7faa49b221f0}, {SIG_DFL, [], SA_RESTORER, 0x7faa49b221f0}, 8) = 0
17300 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0

我认为您希望避免在每个请求上建立和关闭连接。这也解决了操作系统耗尽端口的问题。

为了防止驱动程序重新打开连接,我必须按如下方式修改FURL:

在Furl / HTTP.pm的第526行中,FURL检查从服务器获取的HTTP响应头。它将从响应头中读取Connection,并将头值与字符串keep-alive进行比较。问题是这没有考虑响应头的不同情况。 ArangoDB返回标头值Keep-Alive(请注意大写),因此FURL无法正确识别它。

以下对Furl / HTTP.pm的更改修复了:

-    if ($connection_header eq 'keep-alive') {
+    if (lc($connection_header) eq 'keep-alive') {

这使得客户端在每次请求后都不会关闭连接,也不会耗尽端口。