为什么在批量插入期间碎片会被初始化和重新定位

时间:2015-10-30 11:13:03

标签: elasticsearch

我试图将数据批量插入到包含3个数据节点的4节点elasticsearch集群中。

数据节点规格: 16 CPU - 7GB RAM - 500GB SSD

将数据插入非数据节点并在5个分片上拆分并设置为1个重复。 要插入大约250GB的数据。

然而,在每个节点上插入大约40GB的数据并且在整个时间跨度内有大约60%的CPU和大约30%的RAM使用率处理一小时的处理后,一些分片处于初始化状态:

~$ curl -XGET 'http://localhost:9200/_cluster/health/osm?level=shards&pretty=true'
{
  "cluster_name" : "elastic_osm",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 5,
  "active_shards" : 9,
  "relocating_shards" : 1,
  "initializing_shards" : 1,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "indices" : {
    "osm" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 9,
      "relocating_shards" : 1,
      "initializing_shards" : 1,
      "unassigned_shards" : 0,
      "shards" : {
        "0" : {
          "status" : "yellow",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 1,
          "unassigned_shards" : 0
        },
        "1" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 2,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        },
        "2" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 2,
          "relocating_shards" : 1,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        },
        "3" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 2,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        },
        "4" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 2,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        }
      }
    }
  }
}

深入挖掘,我发现一个节点的堆空间有问题:

~$ curl -XGET 'localhost:9200/osm/_search_shards?pretty=true'
{
  "nodes" : {
    "1DpvDUf7SKywJrBgQqs9eg" : {
      "name" : "elastic-osm-node-1",
      "transport_address" : "inet[/xxx.xxx.x.x:xxxx]",
      "attributes" : {
        "master" : "true"
      }
    },
    "FiBYw-v_QfO3nJQfHflf_w" : {
      "name" : "elastic-osm-node-3",
      "transport_address" : "inet[/xxx.xxx.x.x:x]",
      "attributes" : {
        "master" : "true"
      }
    },
    "ibpt8lGiS6yDJf4e09RN9Q" : {
      "name" : "elastic-osm-node-2",
      "transport_address" : "inet[/xxx.xxx.x.x:xxxx]",
      "attributes" : {
        "master" : "true"
      }
    }
  },
  "shards" : [ [ {
    "state" : "STARTED",
    "primary" : true,
    "node" : "ibpt8lGiS6yDJf4e09RN9Q",
    "relocating_node" : null,
    "shard" : 0,
    "index" : "osm"
  }, {
    "state" : "INITIALIZING",
    "primary" : false,
    "node" : "FiBYw-v_QfO3nJQfHflf_w",
    "relocating_node" : null,
    "shard" : 0,
    "index" : "osm",
    "unassigned_info" : {
      "reason" : "ALLOCATION_FAILED",
      "at" : "2015-10-30T10:42:25.539Z",
      "details" : "shard failure [engine failure, reason [already closed by tragic event]][OutOfMemoryError[Java heap space]]"
    }
  } ], [ {
    "state" : "STARTED",
    "primary" : true,
    "node" : "FiBYw-v_QfO3nJQfHflf_w",
    "relocating_node" : null,
    "shard" : 1,
    "index" : "osm"
  }, {
    "state" : "STARTED",
    "primary" : false,
    "node" : "1DpvDUf7SKywJrBgQqs9eg",
    "relocating_node" : null,
    "shard" : 1,
    "index" : "osm"
  } ], [ {
    "state" : "RELOCATING",
    "primary" : false,
    "node" : "FiBYw-v_QfO3nJQfHflf_w",
    "relocating_node" : "1DpvDUf7SKywJrBgQqs9eg",
    "shard" : 2,
    "index" : "osm"
  }, {
    "state" : "STARTED",
    "primary" : true,
    "node" : "ibpt8lGiS6yDJf4e09RN9Q",
    "relocating_node" : null,
    "shard" : 2,
    "index" : "osm"
  }, {
    "state" : "INITIALIZING",
    "primary" : false,
    "node" : "1DpvDUf7SKywJrBgQqs9eg",
    "relocating_node" : "FiBYw-v_QfO3nJQfHflf_w",
    "shard" : 2,
    "index" : "osm"
  } ], [ {
    "state" : "STARTED",
    "primary" : false,
    "node" : "FiBYw-v_QfO3nJQfHflf_w",
    "relocating_node" : null,
    "shard" : 3,
    "index" : "osm"
  }, {
    "state" : "STARTED",
    "primary" : true,
    "node" : "1DpvDUf7SKywJrBgQqs9eg",
    "relocating_node" : null,
    "shard" : 3,
    "index" : "osm"
  } ], [ {
    "state" : "STARTED",
    "primary" : false,
    "node" : "ibpt8lGiS6yDJf4e09RN9Q",
    "relocating_node" : null,
    "shard" : 4,
    "index" : "osm"
  }, {
    "state" : "STARTED",
    "primary" : true,
    "node" : "FiBYw-v_QfO3nJQfHflf_w",
    "relocating_node" : null,
    "shard" : 4,
    "index" : "osm"
  } ] ]
}

然而,服务器上设置的ES_HEAP_SIZE是ram的一半:

~$ echo $ES_HEAP_SIZE
7233.0m

,使用量仅为5g:

~$ free -g
             total       used
Mem:            14          5

如果我再等一下,节点完全离开集群,所有副本都进入初始化状态,这使我的插入失败并停止:

{
    "state" : "INITIALIZING",
    "primary" : false,
    "node" : "ibpt8lGiS6yDJf4e09RN9Q",
    "relocating_node" : null,
    "shard" : 3,
    "index" : "osm",
    "unassigned_info" : {
      "reason" : "NODE_LEFT",
      "at" : "2015-10-30T10:53:32.044Z",
      "details" : "node_left[FiBYw-v_QfO3nJQfHflf_w]"
    }

Conf:为了加速插入,我在数据节点elasticsearch配置

上使用这些参数

refresh_interval:-1, threadpool.bulk.size:16, threadpool.bulk.queue_size:1000

为什么会这样?我该如何解决这个问题并使我的批量插入成功? 对于最大堆大小,我是否需要超过50%的RAM?

编辑:由于调整弹性搜索参数并不好,我删除了线程池参数,但它工作得非常慢。 Elasticsearch并非旨在过快地摄取太多数据。

1 个答案:

答案 0 :(得分:0)

删除这些设置:

threadpool.bulk.size: 16
threadpool.bulk.queue_size: 1000

这些设置的默认值应该足够好,不会使群集过载。

并确保按照说明here正确调整批量索引编制过程的大小。根据群集/数据,批量需要具有一定的大小。对于那些希望尽可能多地摄取的人,你不能使用你想要的任何值。每个群集都有局限性,你应该测试你的。