Automate your NOC World Map at scale

Managing hundreds of devices with your monitoring system might be a tedious task, especially when using GUI based device onboarding. But why not let your config management tool of choice take care of it? This blog post is about a declarative Ansible playbook to generate Telegraf configuration files leveraging the plugin and populate a Grafana World Map.

First of all: You need a working installation of the TIG-Stack (Telegraf, InfluxDB, Grafana) and an Ansible automation host as well.


Geohash is an open source geocode system, basically a way to represent a coordinate by a single alphanumerical value. The precision is a matter of character count, reaching +/- 20m with eight characters. This demo is based on worldwide AWS Regions. Just visit the GeohashExplorer or a geohash translator and add an additional variable to every ‘host’ in the Ansible inventory.

[aws] geohash=dqb0 geohash=dpjn geohash=9qcp geohash=9rf7 geohash=wecp geohash=te7u geohash=wydm geohash=w21z geohash=r3gx geohash=xn7t geohash=xn0h geohash=f244 geohash=u0yh geohash=gc7r geohash=gcpu geohash=u09t geohash=u6sc geohash=theu geohash=6gyc

Yep, and that’s the only manual task left …

Ansible playbook

The playbook and all additional files are available on github. Just specify the variable ‘telegrafhost’ according to your environment. I use a little RaspberryPi as my TIG-Stack, so it is ‘grafanapi’ as mentioned in the Ansible hosts file – including credentials (but please use Ansible vault or another form of secure credential handling in production).

      src: grafanawm.j2
      dest: /etc/telegraf/telegraf.d/{{inventory_hostname}}_wm.conf
    delegate_to: "{{ telegrafhost }}"
    when: geohash is defined

This task generates one config file for every host in the Telegraf configration directory, but only if the additional variable ‘geohash’ is defined. It uses the following Jinja2 template, located in the local working directory, to activate the telegraf ping plugin.

# {{ ansible_managed }} by {{ template_host }}

  urls = ["{{ inventory_hostname }}"]
  interval = "60s"
  count = 4
  ping_interval = 1.0
  timeout = 1.0
  deadline = 10

     geohash="{{ geohash }}"

The next two tasks are required to render this playbook declarative. Simple changes to existing config files are handled by the idempotent Ansible template module itself. But the goal is to use the Ansible inventory as a single source of truth for the world map – meaning orphaned Telegraf config files are not allowed!

      paths: /etc/telegraf/telegraf.d/
      file_type: file
      recurse: no
      patterns: "*wm.conf"
    delegate_to: "{{ telegrafhost }}"
    register: files_matched
    run_once: true

      path: "{{ item.path }}"
      state: absent
    loop: "{{ files_matched.files|flatten(levels=1) }}"
      label: "{{ item.path }}"
    delegate_to: "{{ telegrafhost }}"
    when: (item.path | basename | regex_replace('_wm.conf') not in ansible_play_hosts_all)
    run_once: true

It first generates a list of all files in the /etc/telegraf/telegraf.d/ directory which matches the suffix ‘_wm.conf’. We use a special worldmap suffix, because there might be other config files for the same host using other telegraf plugins. The next task loops over this list (files_matched) and deletes every file without a corresponding entry in the Ansible inventory. All the magic hides behind the conditional ‘when’:

item.path |
item.path | basename | regex_replace(‘_wm.conf’)

The handler at the bottom of the playbook reloads the telegraf service, of cause only in the case of changes to the config files. IMPORTANT: Add the telegraf reload command to the /etc/sudoers with the nopasswd attribute for the desired user group, in my case: %pi.

%pi ALL= NOPASSWD: /bin/systemctl reload telegraf.service

This asciinema recording demonstrates the initial onboarding of all AWS regions. Whereas this one shows the declarative behaviour, when random regions get deleted in the Ansible hosts file.

Grafana – World Map plugin

At this point telegraf sends metrics for every host with a tag geohash to the InfluxDB of your TIG-Stack.

pi@grafanapi:~ $ influx
> use telegraf
Using database telegraf
> select * from ping where geohash !='' AND time > now() -2m limit 5
name: ping
time average_response_ms geohash host location maximum_response_ms minimum_response_ms packets_received packets_transmitted percent_packet_loss result_code standard_deviation_ms url
---- ------------------- ------- ---- -------- ------------------- ------------------- ---------------- ------------------- ------------------- ----------- --------------------- ---
1586155683000000000 260.195 6gyc grafanapi 260.416 259.98 4 4 0 0 0.532
1586155683000000000 199.53 9qcp grafanapi 223.627 191.327 4 4 0 0 13.916
1586155683000000000 112.548 dpjn grafanapi 112.679 112.481 4 4 0 0 0.249
1586155683000000000 20.428 gcpu grafanapi 20.65 20.278 4 4 0 0 0.145
1586155683000000000 261.581 wecp grafanapi 262.124 261.21 4 4 0 0 0.335

Now it’s time to get the Grafana Worldmap panel installed.

grafana-cli plugins install grafana-worldmap-panel
sudo service grafana-server restart

A simple Grafana dashboard to start with can be found at the git repo, but here are the steps to create it on your own. Just add a new panel and enter the (Add) Query configuration.

Toggle to Text edit mode and enter the query.

SELECT percent_packet_loss AS "metric" FROM "ping" WHERE ("geohash" != '') AND time > now() - 2m GROUP BY "geohash", "url" LIMIT 1

It selects the newest percent_packet_loss metric from all ‘ping’ measurements with a geohash and renames it to ‘metric’. The table formatted result

tags: geohash=6gyc,

feeds into the Worldmap plugin, which can be selected under the Visualization Tab. The Map Visual Options are pretty self-explanatory, only the Map Data Options need some attention. We expect the Location Data in table format with the following Field Mapping:

Give the panel a meaningful title under the ‘General’ Tab and there you go:

AWS Regions – Packet loss


If the playbook runs frequently as a cronjob (every hour for instance), you now have a fully automated World Map Dashboard reflecting all the hosts in your Ansible inventory with an additional geohash value.

One thought on “Automate your NOC World Map at scale

  1. Hello, Sir
    Can you please do the tutorial without Ansible. Using only Telegraf and Influxdb and Grafana

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.