|
2 | 2 | :doctype: book |
3 | 3 | :toc: left |
4 | 4 |
|
5 | | -The troubleshooting panel displays a graph of resources and observability signals related to whatever is |
6 | | -shown in the main console window. |
7 | | -Nodes in the graph represent a type of resource or signal, edges represent relationships. |
8 | | - |
9 | | -Clicking on a node in the graph opens the console page showing details of that resource or signal. |
10 | | -Clicking the "Focus" button re-calculates the graph starting from the current contents of the main window. |
11 | | - |
12 | | -The panel provides a map of related information to help you navigate more quickly to relevant data, |
13 | | -or to discover relevant data you may not have been aware of. |
14 | | - |
15 | | -We will show an example of troubleshooting an Alert. |
16 | | - |
17 | | -NOTE: You can re-create this example alert on your own cluster by following the instructions xref:example-alert[here]. |
18 | | -You can also experiment by using the panel with existing resources in your own cluster. |
19 | | - |
20 | | -== Opening the panel |
21 | | - |
22 | | -Open the troubleshooting panel with the "Signal Correlation" entry in the troubleshooting section of |
23 | | -the "launcher" menu, found at top right of the screen: |
24 | | - |
25 | | -[.border] |
26 | | -image::images/launcher.png[] |
27 | | - |
28 | | -Opening the panel shows a _neighbourhood_ of the resource currently displayed in the console. |
29 | | -A neighbourhood is a graph that starts at the current resource, and includes related objects up to |
30 | | -3 steps away from the starting point. |
31 | | - |
32 | | -NOTE: Not all resource types are currently supported, more will be added in future. |
33 | | -For an unsupported resource, the panel will be empty. |
34 | | - |
35 | | -For example here the panel for a `KubeContainerWaiting` alert. |
36 | | - |
37 | | -[.border] |
38 | | -image::images/panel-graph.png[] |
39 | | - |
40 | | - |
41 | | -<1> Alert(1): This node represents the starting point, a `KubeContainerWaiting` alert that was displayed in the console. |
42 | | -<2> Pod(1): This node indicates there is a single Pod resource associated with this alert. Clicking on this node will show the pod details in the console. |
43 | | -<3> Event(2): There are two kuberenetes events associated with the Pod, and you can see them by clicking this node. |
44 | | -<3> Logs(74): The pod has emitted 74 lines of logs. Click to show them. |
45 | | -<4> Metrics(105): There are always many metrics associated with every Pod. |
46 | | -<6> Network(6): There are network events associated with the pod, which means it has communicated with other resources in the cluster. |
47 | | - The remaining Service, Deployment and DaemonSet nodes are the resources that the pod has communicated with. |
48 | | -<7> Focus: Clicking this button will re-calculate the graph starting from the current contents of the main console window. |
49 | | - This may have changed by clicking nodes in the graph, or by using any other links, menus or navigation features of the console. |
50 | | -<8> Show Query: enables experimental features detailed below. |
51 | | - |
52 | | -NOTE: Clicking on a node may sometimes show fewer results than are indicated on the graph. |
53 | | -This is a known issue that will be addressed in future. |
54 | | - |
55 | | -== Experimental features |
56 | | - |
57 | | -[.border] |
58 | | -image::images/query-details.png[] |
59 | | - |
60 | | -<1> Hide Query hides the experimental features. |
61 | | -<2> The query that identifies the starting point for the graph. This is normally derived automatically from the contents of the main console window. |
62 | | - You can enter queries manually, but the format of this query language is experimental and likely to change in future. |
63 | | - footnote:[This query language is part of https://korrel8r.github.io/korrel8r[Korrel8r], the correlation engine used to create the graphs] |
64 | | - The "Focus" button updates the query to match the resources in the main console window. |
65 | | -<3> Neighbourhood depth: increase or decrease to see a smaller or larger neighbourhood. |
66 | | - Note: setting a large value in a large cluster may cause the query to fail if the number of results is too big. |
67 | | -<4> Goal class: Selecting this option will do a _goal directed search_ instead of a neighbourhood search. |
68 | | - A goal directed search will show all paths from the starting point to the goal _class_ , which indicates a type of resource or signal. |
69 | | - |
70 | | -The format of the goal class is experimental and may change. Currently the valid goal classes are: |
71 | | - |
72 | | -[horizontal] |
73 | | -`k8s:__resource[.version.[group]]__` :: Kind of Kuberenetes resource. For example `k8s:Pod` or `k8s:Deployment.apps.v1`. |
74 | | -`alert:alert`:: Any alert. |
75 | | -`metric:metric`:: Any metric. |
76 | | -`netflow:network`:: Any network observability event. |
77 | | -`log:__log_type__`:: Stored logs, __log_type__ must be `application`, `infrastructure` or `audit` |
78 | | - |
79 | | -== Optional signal stores |
80 | | - |
81 | | -The troubleshooting panel relies on the observability signal stores installed in your cluster. |
82 | | -Kuberenetes resources, alerts and metrics are available by default in an OCP cluster. |
83 | | - |
84 | | -Other types of signal require optional components to be installed: |
85 | | - |
86 | | -- Logs: "Red Hat Openshift Logging" (collection) and "Loki Operator provided by Red Hat" (store) |
87 | | -- Network Events: "Network Observability provided by Red Hat" (collection) and "Loki Operator provided by Red Hat" (store) |
88 | | - |
89 | | -== Creating the example alert |
90 | | -[id="example-alert"] |
91 | | - |
92 | | -You can reproduce the example alert shown above as follows. |
93 | | - |
94 | | -.Procedure |
95 | | - |
96 | | -. Run the following command to create a broken deployment in a system namespace: |
97 | | -+ |
98 | | -[source,terminal] |
99 | | ----- |
100 | | -kubectl apply -f - << EOF |
101 | | -apiVersion: apps/v1 |
102 | | -kind: Deployment |
103 | | -metadata: |
104 | | - name: bad-deployment |
105 | | - namespace: default <1> |
106 | | -spec: |
107 | | - selector: |
108 | | - matchLabels: |
109 | | - app: bad-deployment |
110 | | - template: |
111 | | - metadata: |
112 | | - labels: |
113 | | - app: bad-deployment |
114 | | - spec: |
115 | | - containers: <2> |
116 | | - - name: bad-deployment |
117 | | - image: quay.io/openshift-logging/vector:5.8 |
118 | | ----- |
119 | | -<1> The deployment must be in a system namespace (such as `default`) to cause the desired alerts. |
120 | | -<2> This container deliberately tries to start a `vector` server with no configuration file. The server will log a few messages, and then exit with an error. Any container could be used for this. |
121 | | - |
122 | | -. View the alerts: |
123 | | -.. Go to *Observe* -> *Alerting* and click *clear all filters*. View the `Pending` alerts. |
124 | | -+ |
125 | | -[IMPORTANT] |
126 | | -==== |
127 | | -Alerts first appear in the `Pending` state. They do not start `Firing` until the container has been crashing for some time. By showing `Pending` alerts you can see them much more quickly. |
128 | | -==== |
129 | | -.. Look for `KubeContainerWaiting`, `KubePodCrashLooping`, or `KubePodNotReady` alerts. |
130 | | -.. Select one such alert and open the troubleshooting panel, or click the "Focus" button if it is already open. |
| 5 | +See downstream documentation https://docs.redhat.com/en/documentation/red_hat_openshift_cluster_observability_operator/1-latest/html/ui_plugins_for_red_hat_openshift_cluster_observability_operator/troubleshooting-ui-plugin[Chapter 5. Troubleshooting UI plugin] |
0 commit comments