Zabbix 4 Network Monitoring
上QQ阅读APP看书,第一时间看更新

Active items

Passive Zabbix items are fine if you can connect to all of the monitored hosts from the Zabbix server, but what if you can't allow incoming connections to the monitored hosts because of security or network topology reasons?

This is where active items come into play. As opposed to passive items, for active items, it's the agent that connects to the server; the server never connects to the agent. When connecting, the agent downloads a list of items to check and then reports the new data to the server periodically. Let's create an active item, but this time, we'll try to use some help when selecting the item key:

  1. Go to Configuration | Hosts
  2. Click on Items next to Another host
  3. Click on Create item

For now, use these values:

  • Name: Incoming traffic on interface $1
  • Type: Zabbix agent (active)
  • Update interval: 60s
  • History storage period: 7d

We'll do something different with the Key field this time.

Click on the Select button and, in the upcoming dialog that we saw before, click on net.if.in[if,<mode>]. This will fill in the chosen string, as follows:

Replace the content in the square brackets with the name of your network card, so that the field contents read net.if.in[enp0s3]. When you're done, click on the Add button at the bottom. Never leave placeholders such as <mode>—they will be interpreted as literal values and the item will not work as intended.

If your system has a different network interface name, use that here instead of eth0. You can find out the interface names with the ifconfig or ip addr show commands. In many modern distributions, the standard ethX naming scheme has been changed to one that will result in various different interface names such as enp0s3 and em1. Further, replace any occurrences of eth0 with the correct interface name:

Go to Monitoring | Latest data and check whether new values have arrived.

Well, it doesn't look like they have. You could wait a bit to be completely sure, but most likely, no data will appear for this new active item, which means we're in for another troubleshooting session.

First, we should test basic network connectivity. Remember, active agents connect to the server, so we have to know which port they use (by default, it's port 10051). So, let's start by testing whether the remotely monitored machine can connect to the Zabbix server:

$ telnet <Zabbix server IP or DNS name> 10051

This should produce output similar to the following:

Trying <Zabbix server IP>...
Connected to <Zabbix server IP or DNS name>.
Escape character is '^]'.

Press Ctrl + ] and enter quit in the resulting prompt:

telnet> quit
Connection closed.

Such a sequence indicates that the network connection is working properly. If it isn't, verify possible network configuration issues, including network firewalls and the local firewall on the Zabbix server. Make sure to allow incoming connections on port 10051:

# To check you local firewall rules run
# For iptables
iptables -S

# For firewalld
$ firewall-cmd --list-all
Both agent and server ports for Zabbix are registered with the Internet Assigned Numbers Authority ( IANA).

So, there might be something wrong with the agent; let's take a closer look. We could try to look at the agent daemon's log file, so find the LogFile configuration parameter. If you're using the default configuration files from the source archive, it should be set to log to /tmp/zabbix_agentd.log. If you installed from packages, it is likely to be in /var/log/zabbix or similar. Open this log file and look for any interesting messages regarding active checks. Each line will be prefixed with PID and timestamp in the syntax, PID:YYYYMMDD:HHMMSS. You'll probably see lines similar to these:

15794:20141230:153731.992 active check configuration update from [127.0.0.1:10051] started to fail (cannot connect to [[127.0.0.1]:10051]: [111] Connection refused)

The agent is trying to request the active check list, but the connection fails. The attempt seems to be wrong—our Zabbix server should be on a different system than the localhost. Let's see how we can fix this. On the remote machine, open the zabbix_agentd.conf configuration file and check the ServerActive parameter. (This file can probably be located under /etc/zabbix/) The default configuration file will have a line like this:

ServerActive=127.0.0.1

This parameter tells the agent where it should connect to for active items. In our case, the localhost will not work as the Zabbix server is on a remote machine, so we should modify this. Replace 127.0.0.1 with the IP address or DNS name of the Zabbix server, and then restart the agent either using a systemd script or the manual method: killall.

While you have the configuration file open, take a look at another parameter thereStartAgents. This parameter controls how many processes are handling incoming connections for passive items. If set to 0, it will prevent the agent from listening on incoming connections from the server. This enables you to customize agents to support either or both of the methods. Disabling passive items can be better from a security perspective, but they are very handy for testing and debugging various problems. Also, some items will only work as passive items. Active items can be disabled by not specifying (commenting out) ServerActive. Disabling both active and passive items won't work; the agent daemon will complain and refuse to start up and it's correctstarting with both disabled would be a pointless thing to do. Take a look:

zabbix-agentd [16208]: ERROR: either active or passive checks must be enabled

We could wait for values to appear on the frontend again, but again, they would not. Let's return to the agent daemon log file and see whether there is any hint about what's wrong:

15938:20141230:154544.559 no active checks on server [192.168.1.3:10051]: host [Zabbix server] not monitored 

If we carefully read the entry, we will notice that the agent is reporting its hostname as Zabbix server, but that is the hostname of the default host, which we decided not to use and left disabled. The log message agrees: it says that the host is not monitored.

If we look at the startup messages, there's even another line mentioning this:

15931:20141230:154544.552 Starting Zabbix Agent [Zabbix server]. Zabbix 4.0.0 (revision 85308)  
You might or might not see the SVN revision in this message depending on how the agent was compiled. If it's missing, don't worry about it as it does not affect the ability of the agent to operate.

As that is not the hostname we want to use, let's check the agent daemon configuration file again. There's a parameter named Hostname, which currently reads Zabbix server. Given that the comment for this parameter says Required for active checks and must match hostname as configured on the server, it has to be what we're after. Change the agent configuration parameter to Another host, save and close the configuration file, and then restart the Zabbix agent daemon. Check for new entries in the zabbix_agentd.log file; there should be no more errors.

While we're at it, let's update the agent configuration on A test host as well. Modify zabbix_agentd.conf and set the Hostname=A test host and restart the agent daemon.

If there still are errors about the host not being found on the server, double-check that the hostname in the Zabbix frontend host properties and agent daemon configuration file (the one we just changed) match.

This hostname is case sensitive.

It's now time to return to the frontend and see whether data has started flowing in at the Monitoring | Latest data section:

Notice how the system in this screenshot actually has an interface named enp0s3, not eth0. We will find out how to allow Zabbix to worry about interface names and discover them automatically in Chapter 11, Automating Configuration.

If you see no data and the item shows up unsupported in the configuration section, check the network interface name.

Great, data is indeed flowing, but the values look really weird. If you wait for a while, you'll see how the number in the Last Value column just keeps on increasing. So, what is it? Well, network traffic keys gather data from interface counters, that is, the network interface adds up all traffic, and this total data is fed into the Zabbix database. This has one great advantage—even when data is polled at large intervals, traffic spikes will not go unnoticed as the counter data is present, but it also makes data pretty much unreadable for us, and graphs would also look like an ever-growing line (if you feel like it, click on the Graph link for this item). We could even call them hill graphs:

Luckily, Zabbix provides a built-in capability to deal with data counters like this:

  1. Go to Configuration | Hosts
  2. Click on Items next to Another host
  3. Click on Incoming traffic on interface eth0 in the Name column
  4. Go to the Preprocessing tab and change the Preprocessing steps to Changes per second
  5. Click on Update:

We will have to wait a bit for the changes to take effect, so now is a good moment to discuss our choice for the Type of information option for this item. We set it to Numeric (unsigned), which accepts integers. The values that this item originally receives are indeed integersthey are counter values denoting how many bytes have been received on this interface. The Preprocessing steps option we changed to Changes per second (in previous versions, Delta speed per second), though, will almost always result in some decimal part being there; it is dividing the traffic between two values according to the number of seconds passed between them. In cases where Zabbix has a decimal number and has to store it in an integer field, the behavior will differ depending on how it got that decimal value, as follows:

  • If the decimal value arrived from a Zabbix agent source such as a system.cpu.load item, the item will turn up unsupported
  • If Zabbix received an integer but further calculations resulted in a decimal number appearing, like with our network item, the decimal part will be discarded

This behavior is depicted in the following diagram:

Why is there a difference like this, and why did we leave this item as an integer if doing so results in a loss of precision? Decimal values in the Zabbix database schema have a smaller number of significant digits available before the decimal point than integer values. On a loaded high-speed interface, we might overflow that limit, and it would result in values being lost completely. It is usually better to lose a tiny bit of precisionthe decimal partthan the whole value. Note that precision is lost on the smallest unit: a byte or bit. Even if Zabbix shows 5 Gbps in the frontend, the decimal part will be truncated from this value in bits; hence, this loss of precision should be really, really insignificant. It is suggested to use integers for items that have a risk like this, at least until database schema limits are increased.

Check out Monitoring | Latest data again, you will see that the number under change is negative as we are now calculating a change per second instead of an ever-increasing value. So, our received value will probably be lower then the previous one.

Keep in mind that, in the worst case scenario, configuration changes might take up to three minutes to propagate to the Zabbix agent—one minute to get into the server configuration cache and two minutes until the agent refreshes its own item list. On top of this delay, this item is different from the others we created—it needs to gather two values to compute per second, one of which we are interested in; hence, we will also have to wait for whatever the item interval is before the first value appears in the frontend.

That's better; Zabbix now automatically calculates the change between every two checks (that's what the delta is for) and stores it, but the values still don't seem to be too user friendly. Maybe they're better in the graphlet's click on the Graph link to find out:

Ouch. While we can clearly see the effect our change had, it has also left us with very ugly historical data. The Y-axis of that graph represents the total counter value (hence showing the total since the monitored system was started up), but the X-axis represents the correct (delta) data. You can also take a look at the values numerically, go to the drop-down menu in the upper-right portion, which currently reads Graph. Choose 500 latest values from there. You'll get the following screenshot:

In this list, we can nicely see the change in data representation as well as the exact time when the change was performed. But those huge values have come from the counter data, and they pollute our nice, clean graph by being so much out of scalewe have to get rid of them:

  1. Go to Configuration | Hosts.
  2. Click on Items next to Another host.
  3. Mark the checkbox next to the Incoming traffic on interface enp0s3 (or whatever interface you have) item, and look at the buttons positioned at the bottom of the item list:

The fourth button from the left, named Clear history, probably does what we want. Notice the 1 selected text to the left of the activity buttonsit shows the amount of entries selected, so we always know how many elements we are operating on. Click on the Clear history button. You should get a JavaScript popup asking for confirmation to continue. While history cleaning can take a long time with large datasets, in our case, it should be nearly instant, so click on the OK button to continue. This should get rid of all history values for this item, including the huge ones.

Still, looking at the Y axis in that graph, we see the incoming values being represented as a number without any explanation of what it is, and larger values get K, M, and other multiplier identifiers applied. It would be so much better if Zabbix knew how to calculate it in bytes or a similar unit:

  1. Navigate to Configuration | Hosts.
  2. Click on Items next to Another host.
  3. Click on the Incoming traffic on the enp0s3 (or whatever your interface is) interface in the Name column. Edit the Units field and enter Bps
  4. Click on Update.

Let's check whether there's any improvement in the Monitoring | Latest data:

Wonderful; data is still arriving. Even better, notice how Zabbix now automatically calculates KB, MB, and so on where appropriate. Well, it would in our example host if there were more traffic. Let's look at the network traffic; click on Graph:

Take a look at the Y-axisif you have more traffic, units will be calculated there as well to make the graph readable, and unit calculations are retroactively applied to the previously gathered values.

Units do not affect stored data like the Store value option did, so we do not have to clear the previous values this time.

One parameter that we set, the update interval, could have been smaller, hence resulting in a better-looking graph. But it is important to remember that the smaller the intervals you have on your items, the more data Zabbix has to retrieve and, each second, more data has to be inserted into the database and more calculations have to be performed when displaying this data. While it would have made no notable difference on our test system; you should try to keep intervals as large as possible.

So far, we have created items that gathered numeric dataeither integers or decimal values. Let's create another one, a bit different this time:

  1. As usual, go to Configuration | Hosts.
  2. Click on Items next to Another host. Before continuing with item creation, let's look at what helpful things are available in the configuration section, particularly for items. If we look above the item list, we can see the navigation and information bar.

This area provides quick and useful information about the currently selected host—the hostname, whether the host is monitored, and its availability. Even more importantly, on the right-hand side, it provides quick shortcuts back to the host list and other elements associated with the current host—applications, items, triggers, graphs, discovery rules, and web scenarios. This is a handy way to switch between element categories for a single host without going through the host list all the time. But that's not all yet.

  1. Click on the Filter button to open the filter we got thrown in our face before. The sophisticated filter appears again:

Using this filter, we can make complex rules about what items to display. Looking at the top-left corner of the filter, we can see that we are not limited to viewing items from a single host; we can also choose a Host group. When we need to, we can make filter choices and click on the Filter link underneath. Currently, it has only one condition—the Host field contains Another host, so the Items link from the host list we used was the one that set this filter:

  1. Clear out the Host field
  2. Choose Linux servers from the Host group field
  3. Click on the Apply button below the filter
Host information and the quick link bar is only available when items are filtered for a single host.

Now, look right below the main item filterthat is a Subfilter, which, as its header informs, only affects data already filtered by the main filter.

The entries in the subfilter work like togglesif we switch one on, it works as a filter on the data in addition to all other toggled subfilter controls. Let's click on Zabbix agent (active) now. Notice how the item list now contains only one item; this is what the number 1 represented next to this Subfilter toggle. But the subfilter itself now also looks different:

The option we enabled, Zabbix agent (active), has been highlighted. Numeric (float), on the other hand, is grayed out and disabled, as activating this toggle in addition to already active ones results in no items being displayed at all. While the Numeric (unsigned) toggle still has 1 listed next to it, which shows that enabling it will result in those many items being displayed, the Zabbix agent toggle instead has +3 next to it. This form represents the fact that activating this toggle will display three more items than are currently being displayed, and it is used for toggles in the same category. Currently, the subfilter has five entries, as it only shows existing values. Once we have additional and different items configured, this subfilter will expand. We have finished exploring these filters, so choose Another host from the Host field, click on the Filter button under the filter, and click on Create item.

When you have many different hosts monitored by Zabbix, it's quite easy to forget which version of the Zabbix agent daemon each host has, and even if you have automated software deploying in place, it is nice to be able to see which version each host is at, all in one place.

Use the following values:

  • Name: Enter Zabbix agent version
  • Type: Select Zabbix agent (active) (we're still creating active items)
  • Key: Click on Select and then choose the third entry from the list—agent.version
  • Type of information: Choose Character
  • Update interval: Enter 86400s

When done! Click on the Add button. There are two notable things we did. Firstly, we set the information type to Character, which reloaded the form, slightly changing available options. Most notably, fields that are relevant for numeric information were hidden, such as units, multiplier, and trends.

Secondly, we entered a very large update interval, 86400, which is equivalent to 24 hours. While this might seem excessive, remember what we will be monitoring here, the Zabbix agent version, so it probably (hopefully) won't be changing several times per day. Depending on your needs, you might set it to even larger values, such as a week.

To check out the results of our work, go to Monitoring | Latest data.

If you don't see the data, wait a while; it should appear eventually. When it does, you should see the version of the Zabbix agent installed on the listed remote machine, and it might be a higher number than displayed here, as newer versions of Zabbix have been released. Notice one minor difference—while all the items we added previously have links named Graph on the right-hand side, the last one has one called History. The reason is simple—for textual items, graphs can't be drawn, so Zabbix does not even attempt to do that.

Now, about that waitingwhy did we have to wait for the data to appear? Well, remember how active items work? The agent queries the server for the item list it should report on and then sends in data periodically, but this checking of the item list is also done periodically. To find out how often, open the zabbix_agentd.conf configuration file on the remote machine and look for the RefreshActiveChecks parameter. The default is two minutes, which is configured in seconds, hence listing 120 seconds.

So, in the worst case, you might have had to wait for nearly three minutes to see any data as opposed to normal or passive items, where the server would have queried the agent as soon as the configuration change was available in its cache. In a production environment with many agents using active items, it might be a good idea to increase this value. Usually, item parameters aren't changed that often.