Skip to content
This repository was archived by the owner on Oct 13, 2023. It is now read-only.

Commit c17e991

Browse files
committed
daemon: add grpc.WithBlock option
WithBlock makes sure that the following containerd request is reliable. In one edge case with high load pressure, kernel kills dockerd, containerd and containerd-shims caused by OOM. When both dockerd and containerd restart, but containerd will take time to recover all the existing containers. Before containerd serving, dockerd will failed with gRPC error. That bad thing is that restore action will still ignore the any non-NotFound errors and returns running state for already stopped container. It is unexpected behavior. And we need to restart dockerd to make sure that anything is OK. It is painful. Add WithBlock can prevent the edge case. And n common case, the containerd will be serving in shortly. It is not harm to add WithBlock for containerd connection. Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 9f73396dabf087a8dd5fa74296c2cd4c188ff889) Signed-off-by: Wei Fu <fuweid89@gmail.com> Upstream-commit: 9ed0504592d338890a37e18999f98d69d7103f2d Component: engine
1 parent d9f362f commit c17e991

1 file changed

Lines changed: 18 additions & 0 deletions

File tree

components/engine/daemon/daemon.go

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -866,6 +866,24 @@ func NewDaemon(ctx context.Context, config *config.Config, pluginStore *plugin.S
866866
registerMetricsPluginCallback(d.PluginStore, metricsSockPath)
867867

868868
gopts := []grpc.DialOption{
869+
// WithBlock makes sure that the following containerd request
870+
// is reliable.
871+
//
872+
// NOTE: In one edge case with high load pressure, kernel kills
873+
// dockerd, containerd and containerd-shims caused by OOM.
874+
// When both dockerd and containerd restart, but containerd
875+
// will take time to recover all the existing containers. Before
876+
// containerd serving, dockerd will failed with gRPC error.
877+
// That bad thing is that restore action will still ignore the
878+
// any non-NotFound errors and returns running state for
879+
// already stopped container. It is unexpected behavior. And
880+
// we need to restart dockerd to make sure that anything is OK.
881+
//
882+
// It is painful. Add WithBlock can prevent the edge case. And
883+
// n common case, the containerd will be serving in shortly.
884+
// It is not harm to add WithBlock for containerd connection.
885+
grpc.WithBlock(),
886+
869887
grpc.WithInsecure(),
870888
grpc.WithBackoffMaxDelay(3 * time.Second),
871889
grpc.WithDialer(dialer.Dialer),

0 commit comments

Comments
 (0)