Get Docker Images and Containers Updates Notifications. `, `root@d8766a2befac:/# ping my-web.1.1aj142fcfz7ltg0h23pc8om42 -c 3 Same issue Centos7, after upgrade 19.03.14 to 20.10.3. Security Options: 9 * * * Already on GitHub? But they are unreachable. Operating System: CentOS Linux 8 Feb 24 04:07:45 p1 dockerd[10001]: time="2022-02-24T04:07:45.506170587Z" level=info msg="ignoring event" container=f3a9592298684ed2915e91fbfe3e6927fa8c18ffff79be748c19d159e63fa69c module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" 3 * * * Number of Old Snapshots to Retain: 0 problem with Docker version 20.10.12, build e91ed57 on: no problem with Docker version 20.10.12, build e91ed57 on. You do not have permission to delete messages in this group, Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message, After following the instructions to setup a docker swarm on CNAT, Feb 24 04:07:29 p1 dockerd[10001]: time="2022-02-24T04:07:29Z" level=error msg="enabling default vlan on bridge br0 failed open /sys/class/net/br0/bridge/default_pvid: permission denied" But as soon as I try to run the same containers as a service in swarm mode. However, we later determined that it wasn't the Docker upgrade at all -- it was the reboot that we performed while doing it (which loaded the new Kernel on our test environments). I am facing the same on centos 7 and centos 8. could you (all affected) perhaps provide some debugging to investigate whether we all actually facing the same problem? Looks like this issue actual for CentOS 8 as well. Those "distributions" work fine with latest Docker Swarm: With those the network had an issue with responses larger ~1400 bytes: "com.docker.network.driver.overlay.vxlanid_list". Calculating length of curve based on data points? Init Binary: docker-init CA Configuration: I'm trying to reach a service on port 2002 exposed through an overlay network on my swarm cluster. PING my-web.1.1aj142fcfz7ltg0h23pc8om42 (10.202.0.10) 56(84) bytes of data. 8 * * * Press question mark to learn the rest of the keyboard shortcuts. I have used overlay1 network to start the services which I have created in previous step. to your account, Describe the results you received: 13 * * * Server Version: 20.10.3 that would be a very interesting test, if anyone could do this? Backing Filesystem: xfs Latest version of docker-ce. [BUG] Exception while creating PDO object:could not find driver, https://github.com/sorintlab/pollon/blob/248c68238c160c056cd51d0e782276cef5c64ce4/pollon.go#L130, http://my-web.1.1aj142fcfz7ltg0h23pc8om42:8080/, The container on a specific node miss console button, https://mails.dpdk.org/archives/dev/2018-September/111646.html, portainer swarm docker not compatible with kernel 5+, [Question] migration to ubuntu 20.04/fresh os, (Swarm) Connections to services alternately failing and succeeding, Non-ICMP Traffic Timesout for Non-Local Docker Swarm Nodes, https://www.reddit.com/r/docker/comments/ua1jxz/encrypted_overlay_network_not_work/, Photon OS 4 R2 - Docker Swarm - Issue with Routing Mesh not routing Published Port on all Nodes, Overlay network broken when outside network mtu is smaller than default (1450), Install Docker 19.03 on Ubuntu 20 or CentOS 8, Start some services by docker stack deploy, remove docker_gwbridge from "trusted" zone in firewalld, delete folder /var/lib/docker/network/files/, install latest version of docker and containerd, Networks of bridge ip, docker_gw, ingress, create basic service attached to swarm mesh (see, create a stack of 2 services, then bash into one and curl/ping the other and the service-vip (and provide container ips and service vip), Everything is ok if I downgrade the kernel to version. Step 9 : Lets check how the traffic is getting redirected from the client to vote application containers. Our issue was resolved by a different solution than those presented in this thread, so I'm posting it here for completeness/awareness. Task History Retention Limit: 5 EDIT: will try to recreate the ingress network with default options. ELI5: Why is Russia's requirement for oil to be paid in Roubles abnormal? This problem seems related to this: MTU of the internal veth was set to 1450, after I reconfigured the stack to: I would definitely prefer not to have to configure this, this issue is blocking our update of all systems to Debian 11 and I'm not sure I want to proceed just with this work around. similar problem flannel-io/flannel#1279. Any truth to that? 22 * * * I use overlay networks for my swarm services and it's very common that my services are defined in several networks. have no custom daemon.json applied. Is Pelosi's trip to Taiwan an "official" or "unofficial" visit? # docker inspect bridge [ { "Name": "bridge", "Id": "ffd79f446e2db01624773bcca03273943e7840c00f7e1a722818a15fc4df1e9e", "Created": "2021-02-24T11:10:33.710699325-05:00", "Scope": "local", "Driver": "bridge", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": null, "Config": [ { "Subnet": "172.17.0.0/16", "Gateway": "172.17.0.1" } ] }, "Internal": false, "Attachable": false, "Ingress": false, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": {}, "Options": { "com.docker.network.bridge.default_bridge": "true", "com.docker.network.bridge.enable_icc": "true", "com.docker.network.bridge.enable_ip_masquerade": "true", "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0", "com.docker.network.bridge.name": "docker0", "com.docker.network.driver.mtu": "1500" }, "Labels": {} } ], # docker inspect docker_gwbridge [ { "Name": "docker_gwbridge", "Id": "782eac048eced5098b6385cfd09907b0044139f778267046df2bfb1230cb0ffc", "Created": "2021-02-23T12:28:01.780059612-05:00", "Scope": "local", "Driver": "bridge", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": null, "Config": [ { "Subnet": "172.18.0.0/16", "Gateway": "172.18.0.1" } ] }, "Internal": false, "Attachable": false, "Ingress": false, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "7fab1e416ceff98ff5cfdc0755ebdad6addd7142c5213069a53d261b8d4c705f": { "Name": "gateway_6bbd9caf63c0", "EndpointID": "2669848c2ef5b01c61b0dae07961aabf5c358954ef5890dd49aeacf59c8864a6", "MacAddress": "02:42:ac:12:00:03", "IPv4Address": "172.18.0.3/16", "IPv6Address": "" }, "d4eeee7b07e78bffb712811f6742d24d1f63a0ef582dfe7ff74bab476425f2b6": { "Name": "gateway_d97df4f123eb", "EndpointID": "3cd5aad794d7b4c43afc2fd969833b6ba64792f25aa64dfbcd4a33f698e77bcc", "MacAddress": "02:42:ac:12:00:04", "IPv4Address": "172.18.0.4/16", "IPv6Address": "" }, "ingress-sbox": { "Name": "gateway_ingress-sbox", "EndpointID": "e34ef0a04d734acfe74732899edffb33e5a29ad6edf3abe979d388140ec71ac3", "MacAddress": "02:42:ac:12:00:02", "IPv4Address": "172.18.0.2/16", "IPv6Address": "" } }, "Options": { "com.docker.network.bridge.enable_icc": "false", "com.docker.network.bridge.enable_ip_masquerade": "true", "com.docker.network.bridge.name": "docker_gwbridge" }, "Labels": {} } ], # docker network inspect ingress [ { "Name": "ingress", "Id": "mty0tbdrmvuqhvr444bvfgydd", "Created": "2021-02-24T11:10:35.14459346-05:00", "Scope": "swarm", "Driver": "overlay", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": null, "Config": [ { "Subnet": "10.0.0.0/24", "Gateway": "10.0.0.1" } ] }, "Internal": false, "Attachable": false, "Ingress": true, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "7fab1e416ceff98ff5cfdc0755ebdad6addd7142c5213069a53d261b8d4c705f": { "Name": "my-web.2.yh8cxiw301exaw2n3x1w8rs3l", "EndpointID": "64d2edb8126b55a328c01d08c54e796ed6e5ec6077a7735e41d5c5d967daf8a2", "MacAddress": "02:42:0a:00:00:05", "IPv4Address": "10.0.0.5/24", "IPv6Address": "" }, "d4eeee7b07e78bffb712811f6742d24d1f63a0ef582dfe7ff74bab476425f2b6": { "Name": "my-web.1.1aj142fcfz7ltg0h23pc8om42", "EndpointID": "f30f109f639fe5248ffc99dbc04bb5613ac3199e55879c7eea861495a5820b5d", "MacAddress": "02:42:0a:00:00:0a", "IPv4Address": "10.0.0.10/24", "IPv6Address": "" }, "ingress-sbox": { "Name": "ingress-endpoint", "EndpointID": "d4b5be804a3ac18477b1c14d7acf62133fbac997a96cec66de6939ed6a5a2db2", "MacAddress": "02:42:0a:00:00:02", "IPv4Address": "10.0.0.2/24", "IPv6Address": "" } }, "Options": { "com.docker.network.driver.overlay.vxlanid_list": "4096" }, "Labels": {}, "Peers": [ { "Name": "fb673deb11cb", "IP": "192.168.37.201" }, { "Name": "523e4d83611d", "IP": "192.168.37.202" }, { "Name": "40154e99b45c", "IP": "192.168.37.203" } ] } ], docker service inspect my-web | jq [ { "ID": "8j2n9mzt9cb6nw9b0rjgx9shj", "Version": { "Index": 200 }, "CreatedAt": "2021-02-23T20:52:45.817797547Z", "UpdatedAt": "2021-02-24T16:10:34.144883727Z", "Spec": { "Name": "my-web", "Labels": {}, "TaskTemplate": { "ContainerSpec": { "Image": "nginx:latest@sha256:f3693fe50d5b1df1ecd315d54813a77afd56b0245a404055a946574deb6b34fc", "Init": false, "StopGracePeriod": 10000000000, "DNSConfig": {}, "Isolation": "default" }, "Resources": { "Limits": {}, "Reservations": {} }, "RestartPolicy": { "Condition": "any", "Delay": 5000000000, "MaxAttempts": 0 }, "Placement": { "Platforms": [ { "Architecture": "amd64", "OS": "linux" }, { "OS": "linux" }, { "OS": "linux" }, { "Architecture": "arm64", "OS": "linux" }, { "Architecture": "386", "OS": "linux" }, { "Architecture": "mips64le", "OS": "linux" }, { "Architecture": "ppc64le", "OS": "linux" }, { "Architecture": "s390x", "OS": "linux" } ] }, "Networks": [ { "Target": "1iem30yxy7aztaqzxtxgqxq95" } ], "ForceUpdate": 0, "Runtime": "container" }, "Mode": { "Replicated": { "Replicas": 4 } }, "UpdateConfig": { "Parallelism": 1, "FailureAction": "pause", "Monitor": 5000000000, "MaxFailureRatio": 0, "Order": "stop-first" }, "RollbackConfig": { "Parallelism": 1, "FailureAction": "pause", "Monitor": 5000000000, "MaxFailureRatio": 0, "Order": "stop-first" }, "EndpointSpec": { "Mode": "vip", "Ports": [ { "Protocol": "tcp", "TargetPort": 80, "PublishedPort": 8080, "PublishMode": "ingress" } ] } }, "PreviousSpec": { "Name": "my-web", "Labels": {}, "TaskTemplate": { "ContainerSpec": { "Image": "nginx:latest@sha256:f3693fe50d5b1df1ecd315d54813a77afd56b0245a404055a946574deb6b34fc", "Init": false, "DNSConfig": {}, "Isolation": "default" }, "Resources": { "Limits": {}, "Reservations": {} }, "Placement": { "Platforms": [ { "Architecture": "amd64", "OS": "linux" }, { "OS": "linux" }, { "OS": "linux" }, { "Architecture": "arm64", "OS": "linux" }, { "Architecture": "386", "OS": "linux" }, { "Architecture": "mips64le", "OS": "linux" }, { "Architecture": "ppc64le", "OS": "linux" }, { "Architecture": "s390x", "OS": "linux" } ] }, "Networks": [ { "Target": "1iem30yxy7aztaqzxtxgqxq95" } ], "ForceUpdate": 0, "Runtime": "container" }, "Mode": { "Replicated": { "Replicas": 2 } }, "EndpointSpec": { "Mode": "vip", "Ports": [ { "Protocol": "tcp", "TargetPort": 80, "PublishedPort": 8080, "PublishMode": "ingress" } ] } }, "Endpoint": { "Spec": { "Mode": "vip", "Ports": [ { "Protocol": "tcp", "TargetPort": 80, "PublishedPort": 8080, "PublishMode": "ingress" } ] }, "Ports": [ { "Protocol": "tcp", "TargetPort": 80, "PublishedPort": 8080, "PublishMode": "ingress" } ], "VirtualIPs": [ { "NetworkID": "mty0tbdrmvuqhvr444bvfgydd", "Addr": "10.0.0.4/24" }, { "NetworkID": "1iem30yxy7aztaqzxtxgqxq95", "Addr": "10.202.0.2/24" } ] } } ], `# docker info See : https://mails.dpdk.org/archives/dev/2018-September/111646.html. Can I run a cmd as part of a compose file? This is the same issue described on github here. Is there an update on this issue? But it is as if there was no network from the container of the other machine to my machine. 1 * * * We also recreated all networks that were using the encrypted option. same setup works perfectly fine in gcp and aws (same os, same components terraformed by same script). We have many VMs in Swarm/Clusters that are on v19 and we are on hold for updates to v20 until this is resolved. ;; Got answer: I disable iptables issue goes away. Sign up for a free GitHub account to open an issue and contact its maintainers and the community. This doesn't work:docker run --rm -it alpine ping -c 1 8.8.8.8, This works:docker run --rm -it --network=host alpine ping -c 1 8.8.8.8. Also, I use service names as alias like in your case. Debug Mode: false Cgroup Version: 1 init version: de40ad0 Before you deploy the stack to the swarm, create a Docker Network with the overlay driver (note that network names must be unique): This will create an overlay network that spans the entire swarm. 25 * * * We initially thought the connectivity loss was related to a Docker Swarm upgrade (specifically to 20.10). iptables are showing the MARK which its sets on any traffic hitting IP address 10.0.0.4 (remember this is one of the VIP assigned to vote service from myoverlay1 network) in this case its setting HEX value of 0x103. cheers @txtdevelop ! I have the same problem on a fresh CentOS8.3 install (also Stream, from netinstall), after swarm init (3 managers, no workers), creating a test service, curl always fails on every node except the node where the replica is actually running, disabled selinux (permissive) and firewalld to eliminate as reason What is the nature of a demiplane's walls? Autolock Managers: false Profile: default Following my comment above we recreated the ingress overlay network without the encrypted option (only the default options). Logging Driver: json-file I believe it's because they automtaically updated to a 5+ kernel. What is the gravitational force acting on a massless body? 14 * * * Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Without much proves, I'm inclined to say that there's an incompatibility with the built-in encryption on overlay networks and their recreation after docker or kernel upgrades. To show the network troubleshooting, I am going to start a vote application with replication count of 2 and a client container. ^C The swarm api adress on the other hand is just reachable. 23 * * * It's not. Asking for help, clarification, or responding to other answers. Development container with Visual Studio code - how does Mapping multiple docker networks to interface names Press J to jump to the feed. From inside of a Docker container, how do I connect to the localhost of the machine? In my case its a socket.io server service these need a locked connection tunnel and started losing connection after migration to 20.10. I've been trying this many times now. Did anything change regarding the default networks used for bip, gw or ingress? Tested with Docker 20.10 and 19.03. Snapshot Interval: 10000 whereas docker 20.10.x returns ALTERNATIVELY ip1 and ip2 (round-robin). I've attached a simplified stack.yaml file for reference. More like San Francis-go (Ep. Where developers & technologists share private knowledge with coworkers, Reach developers & technologists worldwide, San Francisco? It "is/was" crazy that he did not attend school for a whole month. there is error in logs, Describe the results you expected: Supports d_type: true By clicking Sign up for GitHub, you agree to our terms of service and The services running in the containers are not accessible using the swarm mode routing mesh but only using the explicit host ip, After some investigation, we found that the problem is related to the 4789 udp packets that docker uses to manage the requests in the swarm: these packets are dropped by the source node and they never reach the destinatation node. We have many VMs in Swarm/Clusters that are on v19 and we are on hold for updates to v20 until this is resolved. Feb 24 04:07:50 p1 dockerd[10001]: time="2022-02-24T04:07:50.137140295Z" level=error msg="fatal task error" error="task: non-zero exit (139)" module=node/agent/taskmanager, This is the same issue described on github. Services lose connectivity between each other in swarm mode. The problem appeared as soon as we updated from Debian 10 -> 11 (which also includes jumped from kernel 4.19 -> 5.0), 3 host swarm, they all struggle to send large packets to each other. Step 1 : Create overlay network which will be used to start the vote and client application. privacy statement. Also, a cloud vm installation on hetzner did also not have this issue despite the same versions. Thanks for contributing an answer to Stack Overflow! a83971678cc1 bridge bridge local2a694f044af5 docker_gwbridge bridge locale7b636b2a419 host host local42fc537a84ec none null local, Version: 20.10.3API version: 1.41Go version: go1.13.15Git commit: 48d30b5Built: Fri Jan 29 14:33:21 2021OS/Arch: linux/amd64Context: defaultExperimental: true, Engine:Version: 20.10.3API version: 1.41 (minimum version 1.12)Go version: go1.13.15Git commit: 46229caBuilt: Fri Jan 29 14:31:32 2021OS/Arch: linux/amd64Experimental: false, containerd:Version: 1.4.3GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939brunc:Version: 1.0.0-rc92GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff, docker-init:Version: 0.19.0GitCommit: de40ad0. Site design / logo 2022 Stack Exchange Inc; user contributions licensed under CC BY-SA. There's although one thing that should be considered. How to copy files from host to Docker container? Paused: 0 I use swarm and I had network connectivity issues right after migrating to docker 20.10.x. And published ports show as being available with netstat -lt. Firewalls usually forward or block, but they don't selectively drop connections when more data is transferred. Seems to be a problem only with VMWARE virtual NIC when used with VMXNET3 driver. 19 * * * These are the package versions we're using, but it's probably not that since one environment has this and it works. 64 bytes from my-web.1.1aj142fcfz7ltg0h23pc8om42.nginx-services (10.202.0.10): icmp_seq=1 ttl=64 time=0.606 ms Step 5 : Login into the each container to see the number of network interfaces present inside the container. What is "Rosencrantz and Guildenstern" in _The Marvelous Mrs. Maisel_ season 3 episode 5? You need to have some containers running to trigger overlay traffic. app: Docker App (Docker Inc., v0.9.1-beta3) Managers: 1 So all I did was install Ubuntu and docker-ce. Plugins: Now, here comes the interesting part: docker 19.03.x and docker 20.10.x behave differently when it comes to resolve the ip of the host server1. 26 * * * Yeah sounds like you don't have an overlay network that is attached to your containers that are in your swarm. To resolve this issue we had to disable the following offload feature: ethtool -K [network] tx-checksum-ip-generic off, update: But impossible when going through localhost, at the oposite it works if targeting a remote node. buildx: Build with BuildKit (Docker Inc., v0.5.1-docker), Server: 6 * * * I'm trying to reach a service on port 2002 exposed through an overlay network on my swarm cluster. Any cloud provider I try (aws), there it works, also vagrant, but where I actually need it (onpremise vsphere, VM installed via PXE/8-stream) it doesnt work. Expiry Duration: 3 months Keep a note of the IP address (10.0.0.2/32) assigned on loopback interface of client container, this is a VIP. failing os was self-installed (not a cloud image), kernel modules setup identical. Everything worked as expected. These are the same IPs which are assigned on lo interface of vote application. - is or was? If it is later determined to be the same issue, we can rejoin them then. I have tried disabling kernel updates, and will post my findings. This seems to be reboot-safe ;; WHEN: Wed Feb 24 17:25:37 UTC 2021 Interested to know if this is getting any attention. Feb 24 04:07:51 p1 dockerd[10001]: time="2022-02-24T04:07:51.765275512Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint q1jpb5oeysmfn3k6uoq4p3maz 7848fd2496431f838fac9506f0f3f8e686a1fcdf7a54fbd6936b6b3c62ea0715], retrying." The ALTERNATIVELY behavior described above would certainly break it. Docker Root Dir: /var/lib/docker We recently encountered a similar issue on Azure, i.e. issue happens only occasionally): They have no outside connections.Can't ping. For me it worked until "Docker version 20.10.4, build d3cb89e", Encountering this same issue, with the caveat that the tx-checksum-ip-generich off fix doesn't seem to work for me, I'm still struggling with this issue with Debian Bullseye running on GCP. OK, I've resolved the issue, but not sure what was the root cause. The way to test this is with tcpdump: When it's broken you only see packets going out, but no packets coming in. traceroute to my-web.1.1aj142fcfz7ltg0h23pc8om42 (10.202.0.10), 30 hops max, 60 byte packets Announcing the Stacks Editor Beta release! root@d8766a2befac:/# dig my-web.1.1aj142fcfz7ltg0h23pc8om42, ; <<>> DiG 9.11.5-P4-5.1+deb10u3-Debian <<>> my-web.1.1aj142fcfz7ltg0h23pc8om42 Two backend IP addresses present corresponds to each vote container IP. Containers: 8 [Docker](http://www.docker.io) is an open-source project to easily create lightweight, portable, self-sufficient containers from any application. For anyone needing it: (ETHTOOL_OPTS= seems not recognized in centos8-stream when using NM), Works for us on the Docker Swarm worker node with CentOS 8.3 and Docker 20.10.5 Thank you @sgohl. root@d8766a2befac:/# curl http://my-web.1.1aj142fcfz7ltg0h23pc8om42:8080/ --max-time 15 rtt min/avg/max/mdev = 0.418/0.443/0.469/0.032 ms Data Path Port: 4789 PING my-web.1.1aj142fcfz7ltg0h23pc8om42 (10.202.0.10) 56(84) bytes of data. Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc On my server I'm running Ubuntu linux 20.04. The symptom is that the overlay network doesn't work. curl: (28) Connection timed out after 15001 milliseconds 7 * * *