How to Set up Rancher Server Monitoring with TIG Stack?

Learn how to build your own Rancher server monitoring dashboard with multiple metrics and alerts, in less than 15 minutes.

readtime

TL;DR

Introduction

Server monitoring is as important as the content of your server. It allows you to have all the necessary information about its resources in a single view and informs you if they are going to run out before an unpleasant phone call from your boss.

In this article, I will show you how to set up monitoring of your Rancher infrastructure using the TIG Stack (Telegraf, InfluxDB and Grafana). If you think that you do not have time for server monitoring, you are wrong.

In less then 15 minutes we will have a working dashboard showing how much disk space a mobile or web app consumes, its memory usage, network traffic, and Docker container metrics and even a simple alert for high memory usage.

<span class="colorbox1" fs-test-element="box1"><p><strong>Note:</strong></p><p>This tutorial assumes that you have basic knowledge of Docker and Rancher.</p></span>

What we need:

  • Telegraf – for collecting, processing, aggregating and writing metrics
  • InfluxDB – for storing data
  • Grafana – for visualization of metrics and creating alerts

How to set up Rancher server monitoring? – Tutorial

#1 Create a new stack on Rancher

Step 1

Create a docker-compose.yml file:

Setting up Rancher monitoring — Step 1 code block.

All the necessary volumes are defined on top of docker-compose.yml. We stick to the convention that all volumes should be stored inside the /home/docker/ directory so change them if you have a different convention. If you do not have SMTP configured, remove all GF_SMTP_* environment variables from the grafana container.

Step 2

Add new stack in Rancher and upload the docker-compose.yml file

#2 Update Grafana

Your Grafana container probably has some errors. View logs from the container and you will most likely find an issue with permissions. To solve it do the following:

Step 1

Enter the Grafana container and make a directory grafana/data

Step 2

Change the owner of this directory

Setting up Rancher monitoring — Step 2 code block.

Step 3

Restart the grafana container

We will come back to Grafana later on.

#3 Set up InfluxDB

Step 1

Create a influxdb.conf file:

Setting up Rancher monitoring — Step 3 code block.

Step 2

Enter the InfluxDB container

Step 3

Copy the configuration file to /var/lib/influxdb/influxdb.conf

Step 4

Restart the influx container

Step 5

Enter the container again

Step 6

Type influx to start using the influx CLI

Step 7

Add a new user

Setting up Rancher monitoring — Step 4 code block.

Step 8

Type exit to exit the CLI

Step 9

Open influxdb.config again and change auth-enabled to true in the [http] section

Setting up Rancher monitoring — Step 5 code block.

Step 10

Exit and restart the influx container

<span class="colorbox1" fs-test-element="box1"><p><strong>Note:</strong></p><p>Next time you would like to use influx CLI you will have to specify the username and password</p></span>

Setting up Rancher monitoring — Step 6 code block.

Now your InfluxDB is ready to collect server monitoring data.

#4 Install Telegraf

Step 1

Install Telegraf on your server. For Ubuntu you can use following commands:

Setting up Rancher monitoring — Step 7 code block.

To read more about installation, read the Telegraf installation guide

Step 2

Create a telegraf.conf file:

Setting up Rancher monitoring — Step 8 code block.

Step 3

Make sure that you entered correct urls, username and password to your InfluxDB in the [[outputs.influxdb]] section

Setting up Rancher monitoring — Step 9 code block.

Step 4

Upload telegraf.conf to your server. In my case, the config file is stored in /etc/telegraf/telegraf.conf

Step 5

Start the Telegraf service, on Ubuntu you can start it via sudo service telegraf start

Step 6

Add the Telegraf agent to the Docker group to get information for dockers.

Setting up Rancher monitoring — Code block.

#5 Update Grafana again

Step 1

Open grafana, it’s on port 3000 by default

  • Default login: admin
  • Default password: admin

Step 2

Add a datasource:

  • Open http://GRAFANA_ADDRESS:3000/datasources
  • Select influxdb
  • Set the url to influxdb – http://<INFLUXDB_ADDRESS>:8086
  • In InfluxDB Details type
  • Database: <INFLUXDB_DATABASENAME>
  • User: <INFLUXDB_USERNAME>
  • Password: <INFLUXDB_PASSWORD>
  • Save and Tes

Step 3

Import dashboard:

  • Open http://localhost:3000/dashboard/import
  • Enter 928 in Grafana Dashboard field
  • Load

You can pick one of those grafana dashboards. 2738 is quite good for Rancher.

Step 4

Add a notification channel for your server monitoring:

E-mail:

  • Open Alert => Notification Channels
  • Name: monitoring
  • Type: Email
  • Add Email Addresses
  • Save and test

Slack or Mattermost by using a webhook:

  • Name: slack
  • Type: slack
  • Url: <SLACK_WEBHOOK_URL>
  • Save and test. You can find more information about Slack Webhooks here.

Step 5

Add a sample alert for low disk space:

  • Add a new dashboard
  • Pick Graph
  • Click on Panel title -> Edit
  • On the bottom panel Graph -> Metrics
  • In Data Source pick InfluxDB
  • Or click the hamburger button next to the query builder and select Toggle Edit Mode
  • Pass a query SELECT mean(“used_percent”) FROM “mem” WHERE (“host” = <YOUR_SERVER_NAME>) AND $timeFilter GROUP BY time($__interval) fill(none)
  • Switch to the Axes tab
  • Type 0 to Y-min and 100 to Y-Max
  • Switch to the Alert tab
  • Create new alert
  • Type 80 to IS ABOVE
  • Select notification channels
  • Save the dashboard by clicking on the disk icon from the top bar

Now you will be informed when memory usage is above 80% for more than 5 minutes. Was it so hard to set up a full working server monitoring system for your Rancher? Now you are ready to install multiple Telegraf instances on your dev or test servers and see everything in a single Grafana.

Summary

TIG Stack is a combination of really great tools for server monitoring that can save you a lot of time, problems and effort. Combined with the ELK Stack (Elasticsearch, Logstash and Kibana) it gives you a nice set of tools for both visualizing metrics and analyzing log messages. You can continue further by adding a load balancer instead of exposing ports.

A working project can be downloaded from GitHub.

Szczepan Blaszkiewicz
github
Full Stack Developer
Bianka Pluszczewska
github
Editor

Read next

No items found...

Get actionable product building tactics in your mailbox, monthly.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
brainhub rates and rerences

Bye, waterfall. Hello, BizDevOps.

Join 1,200+ other tech leaders and get monthly insights on how to:

  • build superior products that users love
  • release software fast, often, and within budget
  • avoid tension between product and engineering teams
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

By submitting, you agree to receive our BizDevOps newsletter.