Server monitoring is as important as the content of your server. It allows you to have all the necessary information about its resources in a single view and informs you if they are going to run out before an unpleasant phone call from your boss.
In this article, I will show you how to set up monitoring of your Rancher infrastructure using the TIG Stack (Telegraf, InfluxDB and Grafana). If you think that you do not have time for server monitoring, you are wrong.
In less then 15 minutes we will have a working dashboard showing how much disk space a mobile or web app consumes, its memory usage, network traffic, and Docker container metrics and even a simple alert for high memory usage.
<span class="colorbox1" fs-test-element="box1"><p><strong>Note:</strong></p><p>This tutorial assumes that you have basic knowledge of Docker and Rancher.</p></span>
What we need:
Create a docker-compose.yml file:
All the necessary volumes are defined on top of docker-compose.yml. We stick to the convention that all volumes should be stored inside the /home/docker/ directory so change them if you have a different convention. If you do not have SMTP configured, remove all GF_SMTP_* environment variables from the grafana container.
Add new stack in Rancher and upload the docker-compose.yml file
Your Grafana container probably has some errors. View logs from the container and you will most likely find an issue with permissions. To solve it do the following:
Enter the Grafana container and make a directory grafana/data
Change the owner of this directory
Restart the grafana container
We will come back to Grafana later on.
Create a influxdb.conf file:
Enter the InfluxDB container
Copy the configuration file to /var/lib/influxdb/influxdb.conf
Restart the influx container
Enter the container again
Type influx to start using the influx CLI
Add a new user
Type exit to exit the CLI
Open influxdb.config again and change auth-enabled to true in the [http] section
Exit and restart the influx container
<span class="colorbox1" fs-test-element="box1"><p><strong>Note:</strong></p><p>Next time you would like to use influx CLI you will have to specify the username and password</p></span>
Now your InfluxDB is ready to collect server monitoring data.
Install Telegraf on your server. For Ubuntu you can use following commands:
To read more about installation, read the Telegraf installation guide
Create a telegraf.conf file:
Make sure that you entered correct urls, username and password to your InfluxDB in the [[outputs.influxdb]] section
Upload telegraf.conf to your server. In my case, the config file is stored in /etc/telegraf/telegraf.conf
Start the Telegraf service, on Ubuntu you can start it via sudo service telegraf start
Add the Telegraf agent to the Docker group to get information for dockers.
Open grafana, it’s on port 3000 by default
Add a datasource:
You can pick one of those grafana dashboards. 2738 is quite good for Rancher.
Add a notification channel for your server monitoring:
Slack or Mattermost by using a webhook:
Add a sample alert for low disk space:
Now you will be informed when memory usage is above 80% for more than 5 minutes. Was it so hard to set up a full working server monitoring system for your Rancher? Now you are ready to install multiple Telegraf instances on your dev or test servers and see everything in a single Grafana.
TIG Stack is a combination of really great tools for server monitoring that can save you a lot of time, problems and effort. Combined with the ELK Stack (Elasticsearch, Logstash and Kibana) it gives you a nice set of tools for both visualizing metrics and analyzing log messages. You can continue further by adding a load balancer instead of exposing ports.
A working project can be downloaded from GitHub.
Get actionable product building tactics in your mailbox, monthly.
No previous chapters
No next chapters