Thursday, November 17, 2016

Monitor RDS with Nagios

This plugin is written on Python and utilizes the module boto (Python interface to Amazon Web Services) to get various RDS metrics from CloudWatch and compare them against the thresholds.


Install the package: yum install python-boto or apt-get install python-boto

Create a config /etc/boto.cfg or ~nagios/.boto with your AWS API credentials. See http://code.google.com/p/boto/wiki/BotoConfig

This plugin that is supposed to be run by Nagios, i.e. under nagios user, should have permissions to read the config /etc/boto.cfg or ~nagios/.boto.

Example:

[root@centos6 ~]# cat /etc/boto.cfg
[Credentials]
aws_access_key_id = THISISATESTKEY
aws_secret_access_key = thisisatestawssecretaccesskey

If you do not use this config with other tools such as our Cacti script, you can secure this file the following way:

[root@centos6 ~]# chown nagios /etc/boto.cfg
[root@centos6 ~]# chmod 600 /etc/boto.cfg
DESCRIPTION

The plugin provides 4 checks and some options to list and print RDS details:

RDS Status
RDS Load Average
RDS Free Storage
RDS Free Memory
To get the list of all RDS instances under AWS account:

# ./aws-rds-nagios-check.py -l
To get the detailed status of RDS instance identified as blackbox:

# ./aws-rds-nagios-check.py -i blackbox -p
Nagios check for the overall status. Useful if you want to set the rest of the checks dependent from this one:

# ./aws-rds-nagios-check.py -i blackbox -m status
OK mysql 5.1.63. Status: available
Nagios check for CPU utilization, specify thresholds as percentage of 1-min., 5-min., 15-min. average accordingly:

# ./aws-rds-nagios-check.py -i blackbox -m load -w 90,85,80 -c 98,95,90
OK Load average: 18.36%, 18.51%, 15.95% | load1=18.36;90.0;98.0;0;100 load5=18.51;85.0;95.0;0;100 load15=15.95;80.0;90.0;0;100
Nagios check for the free memory, specify thresholds as percentage:

# ./aws-rds-nagios-check.py -i blackbox -m memory -w 5 -c 2
OK Free memory: 5.90 GB (9%) of 68 GB | free_memory=8.68;5.0;2.0;0;100
# ./aws-rds-nagios-check.py -i blackbox -m memory -u GB -w 4 -c 2
OK Free memory: 5.90 GB (9%) of 68 GB | free_memory=5.9;4.0;2.0;0;68
Nagios check for the free storage space, specify thresholds as percentage or GB:

# ./aws-rds-nagios-check.py -i blackbox -m storage -w 10 -c 5
OK Free storage: 162.55 GB (33%) of 500.0 GB | free_storage=32.51;10.0;5.0;0;100
# ./aws-rds-nagios-check.py -i blackbox -m storage -u GB -w 10 -c 5
OK Free storage: 162.55 GB (33%) of 500.0 GB | free_storage=162.55;10.0;5.0;0;500.0
CONFIGURATION

Here is the excerpt of potential Nagios config:

define servicedependency{
      hostgroup_name                  mysql-servers
      service_description             RDS Status
      dependent_service_description   RDS Load Average, RDS Free Storage, RDS Free Memory
      execution_failure_criteria      w,c,u,p
      notification_failure_criteria   w,c,u,p
      }

define service{
      use                             active-service
      hostgroup_name                  mysql-servers
      service_description             RDS Status
      check_command                   check_rds!status!0!0
      }

define service{
      use                             active-service
      hostgroup_name                  mysql-servers
      service_description             RDS Load Average
      check_command                   check_rds!load!90,85,80!98,95,90
      }

define service{
      use                             active-service
      hostgroup_name                  mysql-servers
      service_description             RDS Free Storage
      check_command                   check_rds!storage!10!5
      }

define service{
      use                             active-service
      hostgroup_name                  mysql-servers
      service_description             RDS Free Memory
      check_command                   check_rds!memory!5!2
      }

define command{
      command_name    check_rds
      command_line    $USER1$/pmp-check-aws-rds.py -i $HOSTALIAS$ -m $ARG1$ -w $ARG2$ -c $ARG3$
      }

6 comments:

  1. This is really an amazing post, thanks for sharing such a valuable information with us, keep sharing!!
    DevOps Online Training
    DevOps Training

    ReplyDelete
  2. It works only terminal not showing output data in Nagios just showing "OK Unable to get RDS details and statistics", could you please help if possible.

    Thanks in advance

    ReplyDelete
  3. check_nrpe is not working for this and hence not able to get the details on nagios dashboard

    ReplyDelete
  4. Hi,

    Create a config /etc/boto.cfg or ~nagios/.boto with your AWS API credentials. See http://code.google.com/p/boto/wiki/BotoConfig

    above step I don't understand could you please explain

    ReplyDelete
  5. Hello Tushar,
    from where can I download this plugin please help..

    ReplyDelete