You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/README.adoc
+68-13Lines changed: 68 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,19 +2,28 @@
2
2
:doctype: book
3
3
:toc: left
4
4
5
-
The troubleshooting panel displays a graph of related resources and observability signals.
6
-
Clicking on a node in the graph opens a console page showing the details of each resource or signal.
7
-
Nodes in the graph represent a type of resource or signal, while edges represent relationships.
5
+
The troubleshooting panel displays a graph of resources and observability signals related to whatever is
6
+
shown in the main console window.
7
+
Nodes in the graph represent a type of resource or signal, edges represent relationships.
8
8
9
-
The panel provides a map of related information, so you can navigate more quickly to relevant data.
10
-
It may also help to find related information that you were not aware of.
9
+
Clicking on a node in the graph opens the console page showing details of that resource or signal.
10
+
Clicking the "Focus" button re-calculates the graph starting from the current contents of the main window.
11
11
12
-
Consider an example of troubleshooting an Alert on the OCP console.
12
+
The panel provides a map of related information to help you navigate more quickly to relevant data,
13
+
or to discover relevant data you may not have been aware of.
14
+
15
+
We will show an example of troubleshooting an Alert.
16
+
17
+
NOTE: You can re-create this example alert on your own cluster by following the instructions xref:example-alert[here].
18
+
You can also experiment by using the panel with existing resources in your own cluster.
13
19
14
20
== Opening the panel
15
21
16
-
First open the alert of interest in the console.
17
-
Now open the troubleshooting panel [FIXME screenshot of global button ]
22
+
Open the troubleshooting panel with the "Signal Correlation" entry in the troubleshooting section of
23
+
the "launcher" menu, found at top right of the screen:
24
+
25
+
[.border]
26
+
image::images/launcher.png[]
18
27
19
28
Opening the panel shows a _neighbourhood_ of the resource currently displayed in the console.
20
29
A neighbourhood is a graph that starts at the current resource, and includes related objects up to
@@ -23,6 +32,8 @@ A neighbourhood is a graph that starts at the current resource, and includes rel
23
32
NOTE: Not all resource types are currently supported, more will be added in future.
24
33
For an unsupported resource, the panel will be empty.
25
34
35
+
For example here the panel for a `KubeContainerWaiting` alert.
36
+
26
37
[.border]
27
38
image::images/panel-graph.png[]
28
39
@@ -34,9 +45,8 @@ image::images/panel-graph.png[]
34
45
<4> Metrics(105): There are always many metrics associated with every Pod.
35
46
<6> Network(6): There are network events associated with the pod, which means it has communicated with other resources in the cluster.
36
47
The remaining Service, Deployment and DaemonSet nodes are the resources that the pod has communicated with.
37
-
<7> Focus: Clicking nodes changes what is shown in the main console. You can navigate using links on the console page while the panel is open.
38
-
The graph will not change until you click "Focus".
39
-
This will draw a new graph starting from the resource shown in the main console.
48
+
<7> Focus: Clicking this button will re-calculate the graph starting from the current contents of the main console window.
49
+
This may have changed by clicking nodes in the graph, or by using any other links, menus or navigation features of the console.
40
50
<8> Show Query: enables experimental features detailed below.
41
51
42
52
NOTE: Clicking on a node may sometimes show fewer results than are indicated on the graph.
@@ -48,15 +58,16 @@ This is a known issue that will be addressed in future.
48
58
image::images/query-details.png[]
49
59
50
60
<1> Hide Query hides the experimental features.
51
-
<2> The query that identifies the starting point for the graph. The format of this query is experimental and may change in future.
61
+
<2> The query that identifies the starting point for the graph. This is normally derived automatically from the contents of the main console window.
62
+
You can enter queries manually, but the format of this query language is experimental and likely to change in future.
52
63
footnote:[This query language is part of https://korrel8r.github.io/korrel8r[Korrel8r], the correlation engine used to create the graphs]
53
64
The "Focus" button updates the query to match the resources in the main console window.
54
65
<3> Neighbourhood depth: increase or decrease to see a smaller or larger neighbourhood.
55
66
Note: setting a large value in a large cluster may cause the query to fail if the number of results is too big.
56
67
<4> Goal class: Selecting this option will do a _goal directed search_ instead of a neighbourhood search.
57
68
A goal directed search will show all paths from the starting point to the goal _class_ , which indicates a type of resource or signal.
58
69
59
-
The format of the goal class is experimental and may change. The valid goal classes are:
70
+
The format of the goal class is experimental and may change. Currently the valid goal classes are:
60
71
61
72
[horizontal]
62
73
`k8s:__resource[.version.[group]]__` :: Kind of Kuberenetes resource. For example `k8s:Pod` or `k8s:Deployment.apps.v1`.
@@ -69,7 +80,51 @@ The format of the goal class is experimental and may change. The valid goal clas
69
80
70
81
The troubleshooting panel relies on the observability signal stores installed in your cluster.
71
82
Kuberenetes resources, alerts and metrics are available by default in an OCP cluster.
83
+
72
84
Other types of signal require optional components to be installed:
73
85
74
86
- Logs: "Red Hat Openshift Logging" (collection) and "Loki Operator provided by Red Hat" (store)
75
87
- Network Events: "Network Observability provided by Red Hat" (collection) and "Loki Operator provided by Red Hat" (store)
88
+
89
+
== Creating the example alert
90
+
[id="example-alert"]
91
+
92
+
You can reproduce the example alert shown above as follows.
93
+
94
+
.Procedure
95
+
96
+
. Run the following command to create a broken deployment in a system namespace:
97
+
+
98
+
[source,terminal]
99
+
----
100
+
kubectl apply -f - << EOF
101
+
apiVersion: apps/v1
102
+
kind: Deployment
103
+
metadata:
104
+
name: bad-deployment
105
+
namespace: default <1>
106
+
spec:
107
+
selector:
108
+
matchLabels:
109
+
app: bad-deployment
110
+
template:
111
+
metadata:
112
+
labels:
113
+
app: bad-deployment
114
+
spec:
115
+
containers: <2>
116
+
- name: bad-deployment
117
+
image: quay.io/openshift-logging/vector:5.8
118
+
----
119
+
<1> The deployment must be in a system namespace (such as `default`) to cause the desired alerts.
120
+
<2> This container deliberately tries to start a `vector` server with no configuration file. The server will log a few messages, and then exit with an error. Any container could be used for this.
121
+
122
+
. View the alerts:
123
+
.. Go to *Observe* -> *Alerting* and click *clear all filters*. View the `Pending` alerts.
124
+
+
125
+
[IMPORTANT]
126
+
====
127
+
Alerts first appear in the `Pending` state. They do not start `Firing` until the container has been crashing for some time. By showing `Pending` alerts you can see them much more quickly.
128
+
====
129
+
.. Look for `KubeContainerWaiting`, `KubePodCrashLooping`, or `KubePodNotReady` alerts.
130
+
.. Select one such alert and open the troubleshooting panel, or click the "Focus" button if it is already open.
0 commit comments