Supermon is a high-speed cluster monitoring system that emphasizes LowPerturbation, high SamplingRates, and an extensible DataProtocol and ProgrammingInterface. It has been shown to scale from the most basic single processor machines to large scale, 1024 (2048) processor clusters. Although it is most frequently used on Linux-based clusters, the infrastructure is in no way limited to this operating system provided that data can be extracted from the operating system and presented in the correct format.
The project page containing file downloads and the CVS repository is here.
A list of FrequentlyAskedQuestions is available. Some PublishedPapers (and presentations) are also available. We are currently in the process of compiling performance results and comparisons with other monitoring systems to help users choose which monitoring system best fits their needs.
Two
small server programs move
data from the kernel to clients, and
provide that data via TCP at both single and multiple node levels. At a
single node, a kernel module provides data in its two /proc entries
(see above). The mon server acts as a filter
between /proc
and the TCP clients: It parses the s-expressions found in /proc, adds a
minimal amount of information, and passes that data to clients on
demand. For each client that connects to it, mon maintains a bitmask
reflecting the data fields that particular client requests in a sample.
That way, mon filters data and reduces wasteful network traffic. A
second server - Supermon - lets clients see a
snapshot of a set of nodes
in each sample. Supermon connects to nodes that run mon servers, and
concentrates their data. It then presents the data sampled from many
mon servers in a single data sample. The data format provided to
clients by Supermon is identical to mon's data format. That allows many
Supermon servers to be created, each sampling from a subset of the
nodes within a cluster. New Supermon servers could then be started to
connect to the Supermon servers already monitoring portions of the
cluster.
Hierarchical Supermon servers improve performance in situations
where a cluster has many nodes and sampling rates are high. Supermon
provides a bitmask-based filter for each client (similar to mon), which
is then used to improve efficiency between the Supermon/mon and
Supermon/client connections.