Chapter 3. Configuration

Table of Contents

Basics/Definitions
Nodes
Tasks
Vitals
Queries
Notifications
Walkthrough
Global/Main
Modules
Nodes
Tasks
Notifications
Disk
RRD
Soft States

Basics/Definitions

Configurating State means writing your XML configuration file to inform State of its jobs and the properties that she should be using when getting things going. In order to do this, you'll need some definitions and some commentary on how things are organized, then we'll hit upon the actual configuration file and its internals. Notice that some example, template configuration files are provided, and these should provide a good foundation for you to get going.

Nodes

Node is just a general term for machines that have IP addresses and live on the network. Printers, routers, computers, and special purpose servers are all examples of nodes. State needs to know about a node if you plan to monitor anything related to it. It is safe to think of nodes as apliances on the network, if it's got an IP then it's a node, it is important to realize nodes are not just computers, although they almost always are. I think I've bashed that point to death, so we move on.

In order to simplify other areas of configuration that use nodes they can be grouped into Node Groups. Each node group has a unique name that is used to refer to it in other areas of the system. How nodes are grouped is up to you. Some possible groups in an academic installation would be:

  • Servers

  • Administrator Machines

  • User/Lab Machines

  • Research Machines

  • Printers

  • Routers

  • Room/Location

Nodes can belong to more than one node group. It is a good idea to keep a set of node groups for nodes that share a common set of services that need to be checked. Creating good node groups will come up a few more times in the comming sections and after some experience checking nodes you'll gain a better understanding of how best to group them. More organized node groups means that your tasks can be simplified and combined.

Note

Internally, State always uses IPs to refer to nodes. The resolved names of machines are mostly stored for display purposes. If State is referring to a node, it's always going to use the IP. Now, this doesn't mean the names aren't or won't be used. For example, some day in the near future I intend to write a mod_dns which will allow administrators to ensure names are resolving the proper IPS and such.

Tasks

Having your nodes grouped into useful sets now gives you the chance to operate on them; checking their status and gathering information. All of this is done with Tasks. A task is just that, a job that State carries out at a specified interval. Tasks can be very complicated and it is with tasks that much of the setup and configuration of State lies.

A task is always associated with at least one node. Tasks can also be associated with multiple node groups, hence the use in assigning node groups based on similar nodes. The nodes associated with a task are the nodes that the task operates on.

Tasks are executed by State at defined intervals. They are also scheduled internally so that the load on the monitoring server is minimal, since many tasks will operate at similar intervals. Inside each task is a collection of Queries, which are executed to perform the bulk of the processing related to the task. Tasks can also have properties/attributes that are used by extension modules for things not completely related to executing the tasks, these are explained in the extension module docoumentation and in the area walking through the configuration file.

Vitals

State revolves around the concept of Vitals. A vital is the object that changes to reflect changes in the health of the network. Every vital is associated with a node. It's the history and evolution of the vitals that we as system administrators are interested in. Vitals have a number of properties that are interesting. They have a status which describes how healthy they are, such as NORMAL, ALIVE, or WARNING. Each one as a simple text message called a note which describes its status in human terms and gives details we can use to discern problems. Vitals also fall into categories or types which make grouping them easier for certain jobs and purposes. Each vital also has a name, which identifies the vital. Names are unique on a node. No node can have two vitals with the same name. Finally, there are a number of other details that track when and how often the vital is being updated and when the vital changed.

Every vital has a history, a series of changes in status that is recorded. Using this information we can determine how often the vital is healthy and how often it is unhealthy. In addition, we can get a glimpse of how things were at a specific instance in time, all we need is a little creative SQL and a little one on one with the PostgreSQL database.

Queries

Each Task contains one or more Queries which are executed to actually gather information. Queries look very much like simple URIs, much like web URLs, only they are much, much more. There are two types of Query URIs, local and remote. Local Query URIs are executed on the local State server, Remote URIs are executed on remote State instances using HTTP. Here are some simple examples of Query URIs:

/state/kernel/loadavg
state://192.168.0.223:3434/state/network/ping
/state/network/ping
state://192.168.0.25/state/filesys/df
    

Pretty simple actually. The path portion of the Query URI is the only required part, everything else is either assumed or derived. The Path decides which handler is invoked on the State server. Handlers are provided by extension modules. By default, the handler tree, which can be thought of a simple file system like proc, is completely empty. Only when you begin loading modules into a State instance are valid paths created and handlers installed. You can install your own handlers if you like also by writing your own custom handlers. All State URI paths begin with /state, it is the root of the hierarchy.

Notifications

Notifications are your eyes and ears into what's going on with your State installation. Typically, you'll be receiving notifications in the form of e-mails from mod_mail's interaction with mod_notifs. Because e-mail is a common gateway to paging, it's pretty safe to assume that it's a fairly common module to load and let handle your notifications. Each task is placed into a Notification Type, which determines how notifications for problems arising from that task are handled. Notifications are quirky, and so it's necessary that you understand how things work so that you can design them to notify you in the most optimal way. For a description of notifications, see the documentation on mod_notifs, it's all in there.