Question

我想使用nagios来监控elasticsearch。基本上，我想知道弹性搜索是否已经开始。

我想我可以使用elasticsearch Cluster Health API（see here）

并使用我得到的“状态”（绿色，黄色或红色），但我仍然不知道如何使用nagios（nagios在一台服务器上，而elasticsearc在另一台服务器上）。 / p>

还有其他办法吗？

编辑： 我刚刚发现 - check_http_json。我想我会试一试。

Answer 1

过了一会儿 - 我设法使用nrpe监控elasticsearch。我想使用elasticsearch Cluster Health API - 但由于安全问题，我无法在其他机器上使用它... 因此，在监视服务器中，我创建了一个新服务 - check_command为check_command check_nrpe!check_elastic。现在在弹性搜索所在的远程服务器中，我使用以下内容编辑了nrpe.cfg文件：

command[check_elastic]=/usr/local/nagios/libexec/check_http -H localhost -u /_cluster/health -p 9200 -w 2 -c 3 -s green

允许这样做，因为此命令是从远程服务器运行的 - 所以这里没有安全问题......

有效!!! 我仍然会尝试在我的qeustion中发布的check_http_json命令 - 但是现在，我的解决方案已经足够了。

Answer 2

在阅读了这篇文章中的建议后，我写了一个简单的check_elasticsearch脚本。它返回状态为OK，WARNING和CRITICAL，对应于群集运行状况响应中的“status”参数（分别为“green”，“yellow”和“red”）

它还从健康页面抓取所有其他参数，并以标准Nagios格式转储它们。

享受！

Answer 3

无耻的插件：https://github.com/jersten/check-es

您可以将它与ZenOSS / Nagios一起使用，以监控群集运行状况，数据索引和单个节点堆的使用情况。

Answer 4

您可以使用这个很酷的Python脚本来监控您的Elasticsearch集群。此脚本检查您的IP：端口以获取Elasticsearch状态。可以找到这个用于监视Elasticsearch的一个或多个Python脚本here。

#!/usr/bin/python
from nagioscheck import NagiosCheck, UsageError
from nagioscheck import PerformanceMetric, Status
import urllib2
import optparse

try:
    import json
except ImportError:
    import simplejson as json


class ESClusterHealthCheck(NagiosCheck):

    def __init__(self):

        NagiosCheck.__init__(self)

        self.add_option('H', 'host', 'host', 'The cluster to check')
        self.add_option('P', 'port', 'port', 'The ES port - defaults to 9200')

    def check(self, opts, args):
        host = opts.host
        port = int(opts.port or '9200')

        try:
            response = urllib2.urlopen(r'http://%s:%d/_cluster/health'
                                       % (host, port))
        except urllib2.HTTPError, e:
            raise Status('unknown', ("API failure", None,
                         "API failure:\n\n%s" % str(e)))
        except urllib2.URLError, e:
            raise Status('critical', (e.reason))

        response_body = response.read()

        try:
            es_cluster_health = json.loads(response_body)
        except ValueError:
            raise Status('unknown', ("API returned nonsense",))

        cluster_status = es_cluster_health['status'].lower()

        if cluster_status == 'red':
            raise Status("CRITICAL", "Cluster status is currently reporting as "
                         "Red")
        elif cluster_status == 'yellow':
            raise Status("WARNING", "Cluster status is currently reporting as "
                         "Yellow")
        else:
            raise Status("OK",
                         "Cluster status is currently reporting as Green")

if __name__ == "__main__":
    ESClusterHealthCheck().run()

Answer 5

我在一百万年前写过这篇文章，它可能仍然有用：https://github.com/radu-gheorghe/check-es

但这实际上取决于您要监视的内容。以上措施：

如果Elasticsearch响应HTTP
如果摄入率降至规定水平以下
如果文档总数下降了所定义的级别

但是当然还有很多有趣的事情。从查询时间到JVM堆使用情况。我们在此处撰写了有关最重要文章的博客文章：https://sematext.com/blog/top-10-elasticsearch-metrics-to-watch/

Elasticsearch具有针对所有这些的API，因此您可以使用通用的check_http_json来获取所需的指标。另外，您可能想使用Sematext Monitoring for Elasticsearch之类的东西，它们可以立即使用这些指标，然后使用forward threshold/anomaly alerts to Nagios。（公开：我为Sematext工作）

如何使用nagios监控elasticsearch

5 个答案: