Skip to content

Commit 84576fd

Browse files
DimCitusJelteF
andauthored
Feature/watch (#809)
* Implement pg_autoctl watch command. The idea is that our users and customers could have an interactive dashboard without needing to build one themselves from the watch(1) command and other utilities. * Add a --watch option to pg_autoctl show state|events. * Add libncurses to the Dockerfile dependencies. * Per review, show logs when failing to contact the monitor. To enable that, we switch the terminal back to "cooked" mode where we can read the logs on stderr. When the connection to the monitor could be established again, we switch back to the "raw" mode with the previous settings and continue displaying our dashboard there. Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
1 parent 87acba1 commit 84576fd

25 files changed

Lines changed: 2845 additions & 10 deletions

Dockerfile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ RUN apt-get update \
2020
libxml2-dev \
2121
libxslt1-dev \
2222
libselinux1-dev \
23+
libncurses-dev \
24+
libncurses6 \
2325
make \
2426
openssl \
2527
pipenv \
@@ -84,7 +86,8 @@ RUN apt-get update \
8486
make \
8587
sudo \
8688
tmux \
87-
watch \
89+
watch \
90+
libncurses6 \
8891
lsof \
8992
psutils \
9093
dnsutils \

docs/architecture-multi-standby.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,8 @@ following three replication settings:
8585
- Replication quorum
8686
- Candidate priority
8787

88+
.. _number_sync_standbys:
89+
8890
Number Sync Standbys
8991
^^^^^^^^^^^^^^^^^^^^
9092

docs/conf.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -417,6 +417,13 @@ def setup(app):
417417
[author],
418418
1,
419419
),
420+
(
421+
"ref/pg_autoctl_watch",
422+
"pg_autoctl watch",
423+
"pg_autoctl watch",
424+
[author],
425+
1,
426+
),
420427
(
421428
"ref/pg_autoctl_stop",
422429
"pg_autoctl stop",

docs/faq.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,20 @@ and your question and its answer might make it to this FAQ.
77

88
__ https://github.com/citusdata/pg_auto_failover/issues_
99

10+
I stopped the primary and no failover is happening for 20s to 30s, why?
11+
-----------------------------------------------------------------------
12+
13+
In order to avoid spurious failovers when the network connectivity is not
14+
stable, pg_auto_failover implements a timeout of 20s before acting on a node
15+
that is known unavailable. This needs to be added to the delay between
16+
health checks and the retry policy.
17+
18+
See the :ref:`configuration` part for more information about how to setup
19+
the different delays and timeouts that are involved in the decision making.
20+
21+
See also :ref:`pg_autoctl watch` to have a dashboard that helps
22+
understanding the system and what's going on in the moment.
23+
1024
The secondary is blocked in the CATCHING_UP state, what should I do?
1125
--------------------------------------------------------------------
1226

docs/how-to.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,15 @@ formation with the following command::
9696

9797
$ pg_autoctl show state
9898

99+
The ``pg_autoctl show state`` commands outputs the current state of the
100+
system only once. Sometimes it would be nice to have an auto-updated display
101+
such as provided by common tools such as `watch(1)` or `top(1)` and the
102+
like. For that, the following commands are available (see also
103+
:ref:`pg_autoctl_watch`)::
104+
105+
$ pg_autoctl watch
106+
$ pg_autoctl show state --watch
107+
99108
To analyze what's been happening to get to the current state, it is possible
100109
to review the past events generated by the pg_auto_failover monitor with the
101110
following command::

docs/ref/configuration.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _configuration:
2+
13
Configuring pg_auto_failover
24
============================
35

docs/ref/manual.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ have their own manual page.
2222
pg_autoctl_perform
2323
pg_autoctl_do
2424
pg_autoctl_run
25+
pg_autoctl_watch
2526
pg_autoctl_stop
2627
pg_autoctl_reload
2728
pg_autoctl_status

docs/ref/pg_autoctl.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ pg_autoctl provides the following commands::
2020
+ set Set a pg_auto_failover node, or formation setting
2121
+ perform Perform an action orchestrated by the monitor
2222
run Run the pg_autoctl service (monitor or keeper)
23+
watch Display a dashboard to watch monitor's events and state
2324
stop signal the pg_autoctl service for it to stop
2425
reload signal the pg_autoctl for it to reload its configuration
2526
status Display the current status of the pg_autoctl service

docs/ref/pg_autoctl_show_events.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ about state changes of the pg_auto_failover nodes managed by the monitor::
1818
--formation formation to query, defaults to 'default'
1919
--group group to query formation, defaults to all
2020
--count how many events to fetch, defaults to 10
21+
--watch display an auto-updating dashboard
2122
--json output data in the JSON format
2223

2324
Options
@@ -46,6 +47,16 @@ Options
4647

4748
By default only the last 10 events are printed.
4849

50+
--watch
51+
52+
Take control of the terminal and display the current state of the system
53+
and the last events from the monitor. The display is updated automatically
54+
every 500 milliseconds (half a second) and reacts properly to window size
55+
change.
56+
57+
Depending on the terminal window size, a different set of columns is
58+
visible in the state part of the output. See :ref:`pg_autoctl_watch`.
59+
4960
--json
5061

5162
Output a JSON formated data instead of a table formatted list.

docs/ref/pg_autoctl_show_state.rst

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ registered to the pg_auto_failover monitor::
1818
--formation formation to query, defaults to 'default'
1919
--group group to query formation, defaults to all
2020
--local show local data, do not connect to the monitor
21+
--watch display an auto-updating dashboard
2122
--json output data in the JSON format
2223

2324
Options
@@ -51,14 +52,24 @@ Options
5152

5253
Print the local state information without connecting to the monitor.
5354

55+
--watch
56+
57+
Take control of the terminal and display the current state of the system
58+
and the last events from the monitor. The display is updated automatically
59+
every 500 milliseconds (half a second) and reacts properly to window size
60+
change.
61+
62+
Depending on the terminal window size, a different set of columns is
63+
visible in the state part of the output. See :ref:`pg_autoctl_watch`.
64+
5465
--json
5566

5667
Output a JSON formated data instead of a table formatted list.
5768

5869
Description
5970
-----------
6071

61-
The ``pg_autoctl show state`` outputs includes the following columns:
72+
The ``pg_autoctl show state`` output includes the following columns:
6273

6374
- Name
6475

0 commit comments

Comments
 (0)