Thursday, May 3, 2007

Cacti Performance Tuning

When using the Thold plug-in under Cacti, the three background pollers are:

1.poller.php
2.cactid
3.thold

Poller.php is the master process run by cron which kicks off cactid to do the actual data collection and then creates graphs and calls plugin functions in the threshold plugin to update its data set.

II.Speed Bottlenecks

When using Cactid and the Thold plug-in, the dominant bottlenecks have to do with its usage of MySQL.

I.Cactid

Cactid is performance-limited to some extent by a lack of internal caching. However, this can be partially ameliorated by changing the value of MAX_MYSQL_BUF_SIZE in cactid's util.h and recompiling it. This value refers to a buffer of mysql update's which are kept in RAM before sent out as a block write. The larger this number, the more writes are internally cached and given to mysql to update in a block, which is more efficient than making many individual calls.

More can be done to Cactid to improve its mysql performance, and this would require more in-memory caching and globally holding open database connections.

The following also should be taken note of:

The dynamics of threading (configurable through the Cacti Settings/Poller UI)

If you increase the number of threads in Cactid, it will run significantly faster, but the impact on mysql will be commensurately greater. You need to carefully balance the load against mysql in order to keep other server performance usable.

II.Poller.php

Poller.php's dominant bottleneck is that it is written in PHP. If you observe its performance, while it is running through the result set of cactid, PHP jumps to the status of top CPU consumer on the server.

III.Thold Plug-in

The easiest way to make the Thold plug-in faster is by allowing for a greater number of concurrent poller processes. The poller calls the plugins, and blocks on the plugins return. The more concurrent pollers you have, the more concurrent plugins you have, it is a one-to-one correspondence.

Thold is very inefficient internally as it performs no internal caching, but operates one host at a time, one threshold at a time through its datastore, requiring a large number of discrete mysql calls. In contrast, poller.php pulls down a full set of data sources in a single mysql call, caches them in an internal array, and then performs its operations from the internal array.

In the longer term, to make thold more scalable would require that greater internal caching take place within the plugin's code. It is not performing a complex task, however it is doing its task quite slowly.

Ameliorating Chaos through Structure

Working on the edge of open source development efforts can be a hair-raising experience. You are so often plugging into code that is on the bleeding edge, which has been committed to SVN an hour ago with a bunch of cool new and completely undocumented features. A person becomes an expert in the art of integration -- Integrating multiple chaotic systems together with meaningful structure to create a working unit. It is akin to a meditation exercise, where you practice in a formal structure and allow for chaos to exist and embrance its perfect nowness while maintaining strong form.

That strong form, in terms of software architecture, must be present in doing edge-work, such as mining from the here-and-now beta development efforts popping up on the internet. Many of these efforts have tremendous worth, and so you can't just pass them up!

Logrotate.d and compressoptions

logrotate.d has a convenient command "compressoptions" which allows you to send options to gzip (or another compressor if you are using something different).

One handy thing about this is that you can rename your rotated log files to have the date-time in them for convenience of both appearance and for running scripts and reports against them.

You can do this (with gzip as your compressor) as follows:

compressoptions -S .`date -I`.gz

This causes logrotate to send the -S (suffix) command to gzip to change its suffix to the date in ISO 8601 format.