Now, mon is prepared to work. You need to create your own mon.cf file,
where you should point to resources mon should watch and
actions mon will start in case of dysfunction and when resources
are available again.
All monitoring scripts are in /usr/lib/mon/mon.d/.
At the beginning of every script you can find explanation how to use it.
All alert scripts are placed in /usr/lib/mon/alert.d/.
Those are scripts triggered in case something went wrong.
In case you are using ipvs on theirs homepage
(www.linuxvirtualserver.org) you can find scripts for adding and
removing servers from an ipvs list.
Yes! Use the ipfail plug-in. For each interface you wish to monitor, specify one or more "ping" nodes in your configuration. Each node in your cluster will monitor these ping nodes. Should one node detect a failure in one of these ping nodes, it will contact the other node in order to determine whether it or the ping node has the problem. If the cluster node has the problem, it will try to failover its resources (if it has any).
To use ipfail, you will need to add the following to your /etc/ha.d/ha.cf
files:
respawn hacluster /usr/lib/heartbeat/ipfail
ping <IPaddr1> <IPaddr2>
... <IPaddrN>
See Kevin's documentation for more details on the concepts.
IPaddr1..N are your ping nodes. NOTE: ipfail requires the "nice_failback
on" option.
This isn't a problem with heartbeat, but rather is caused by various versions of net-tools. Upgrade to the most recent version of net-tools and it will go away. You can test it with ifconfig manually.
If your system was not under moderate to heavy load when it got this message, you probably have the kernel bug. The 2.4.18 Linux kernel had a bug in it which would cause it to not schedule heartbeat for very long periods of time when the system was idle, or nearly so. If this is the case, you need to get a kernel that isn't broken.
server1 10.10.10.1 mysql server2 10.10.10.2 apacheIn this case, the IP address 10.10.10.1 should be replaced with the IP address you want to contact the mysql server at, and 10.10.10.2 should be replaced with the IP address you want users to use to contact the web server. Any time server1 is up, it will run the mysql service. Any time server2 is up, it will run the apache service. If both server1 and server2 are up, both servers will be active. Note that this is contradictory with the nice_failback on parameter (but this is being fixed), which in turn prohibits the use of ipfail.
This means you need to attach 6 or 8 files. Include 6 if your debug output goes into the same file as your normal output and 8 otherwise. For each machine you need to send:
Rev 0.0.5
(c) 2000 Rudy Pawul rpawul@iso-ne.com
(c) 2001 Dusan Djordjevic dj.dule@linux.org.yu