@@ -51,7 +51,7 @@ As a result, here is the standard upgrade plan for pg_auto_failover:
5151 1. Upgrade the pg_auto_failover package on the all the nodes, monitor
5252 included.
5353
54- When using a debian based OS, this looks like the following command when
54+ When using a debian based OS, this looks like the following command when
5555 from 1.4 to 1.5::
5656
5757 sudo apt-get remove pg-auto-failover-cli-enterprise-1.4 postgresql-11-auto-failover-enterprise-1.4
@@ -361,6 +361,43 @@ The monitor reports every state change decision to a LISTEN/NOTIFY channel
361361named ``state ``. PostgreSQL logs on the monitor are also stored in a table,
362362``pgautofailover.event ``, and broadcast by NOTIFY in the channel ``log ``.
363363
364+ .. _replacing_monitor_online :
365+
366+ Replacing the monitor online
367+ ----------------------------
368+
369+ When the monitor node is not available anymore, it is possible to create a
370+ new monitor node and then switch existing nodes to a new monitor by using
371+ the following commands.
372+
373+ 1. Apply the STONITH approach on the old monitor to make sure this node is
374+ not going to show up again during the procedure. This step is sometimes
375+ refered to as “fencing”.
376+
377+ 2. On every node, ending with the (current) Postgres primary node for each
378+ group, disable the monitor while ``pg_autoctl `` is still running::
379+
380+ $ pg_autoctl disable monitor --force
381+
382+ 3. Create a new monitor node::
383+
384+ $ pg_autoctl create monitor ...
385+
386+ 4. On the current primary node first, so that it's registered first and as
387+ a primary still, for each group in your formation(s), enable the
388+ monitor online again::
389+
390+ $ pg_autoctl enable monitor --monitor postgresql://...
391+
392+ 5. On every other (secondary) node, enable the monitor online again::
393+
394+ $ pg_autoctl enable monitor --monitor postgresql://...
395+
396+ This operation relies on the fact that a ``pg_autoctl `` can be operated
397+ without a monitor, and when reconnecting to a new monitor, this process
398+ reset the parts of the node state that comes from the monitor, such as the
399+ node identifier.
400+
364401Trouble-Shooting Guide
365402----------------------
366403
0 commit comments