跳转至

故障排查 / 容器无法正常运行

1. Docker Stack 环境中容器反复重启

此问题一般是由于配置、防火墙、各种白名单配置不正确引起。

具体表现为:

  1. 使用浏览器无法打开页面
  2. 使用sudo docker ps -a命令查看容器列表时,发现重启在不断重启
  3. 在部署服务器本机使用curl http://localhost:8088返回curl: (7) Failed to connect to localhost port 8088: Connection refused错误
  4. 日志文件中不断输出错误堆栈信息

可能原因及解决方案:

原因 解决方案
手工修改过配置但配置存在错误 检查修改过的配置文件,检查如 YAML 语法、数据库链接信息是否正确
修改配置指定了外部服务器,但实际网络不通 检查防火墙、阿里云安全组配置、数据库链接白名单等配置

2. Docker Stack 环境中容器不存在

此问题一般是因为运行环境不正确引起。

具体表现为:

  1. 执行sudo docker stack ls可以看到dataflux-func
  2. 执行sudo docker ps -a看不到对应容器
  3. 执行sudo docker stack ps dataflux-func --no-trunc,发现容器状态不正常

可能原因及解决方案:

原因 解决方案
系统中安装的是 snap 版 Docker 卸载 snap 版 Docker,重新安装官方途径的 Docker,或使用脚本自带的 Docker
其他 可根据sudo docker stack ps dataflux-func --no-truncERROR栏目排查

典型例子如下,因为磁盘空间不足无法启动:

3. k8s 环境中容器无法启动

此问题一般是因为宿主机 / k8s 集群问题引起。

k8s 中可能存在如下错误:

Text Only
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
Events:
  Type     Reason   Age                   From     Message
  ----     ------   ----                  ----     -------
  Warning  Failed   36m                   kubelet  Error: failed to start container "func-server": Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:60: mounting "/home/cce/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volume-subpaths/user-config/func-server/1" to rootfs at "/home/cce/docker/overlay2/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/merged/data/user-config-template.yaml" caused: no such file or directory: unknown
  Warning  Failed   36m                   kubelet  Error: failed to start container "func-server": Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:60: mounting "/home/cce/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volume-subpaths/user-config/func-server/1" to rootfs at "/home/cce/docker/overlay2/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/merged/data/user-config-template.yaml" caused: no such file or directory: unknown
  Warning  Failed   36m                   kubelet  Error: failed to start container "func-server": Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:60: mounting "/home/cce/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volume-subpaths/user-config/func-server/1" to rootfs at "/home/cce/docker/overlay2/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/merged/data/user-config-template.yaml" caused: no such file or directory: unknown
  Normal   Created  35m (x5 over 118d)    kubelet  Created container func-server
  Warning  Failed   35m                   kubelet  Error: failed to start container "func-server": Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:60: mounting "/home/cce/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volume-subpaths/user-config/func-server/1" to rootfs at "/home/cce/docker/overlay2/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/merged/data/user-config-template.yaml" caused: no such file or directory: unknown
  Normal   Pulled   33m (x6 over 118d)    kubelet  Container image "harbor.xxx.com/obs/dataflux/dataflux-func:2.7.0" already present on machine
  Warning  BackOff  2m8s (x157 over 36m)  kubelet  Back-off restarting failed container

某个 Func 服务中可能存在如下错误:

Text Only
1
2
3
4
5
6
Traceback (most recent call last):
File "_config.py", line 11, in <module>
CONFIG = yaml_resource.load_config(os.path.join(BASE_PATH, './config.yaml'))
File "/usr/src/app/worker/utils/yaml_resources.py", line 83, in load_config
user_config_content = _f.read()
OSError: [Errno 5] Input/output error

这不是 DataFlux Func 的问题,请检查宿主机 / k8s 集群,如涉及 NAS 的使用,也应检查 NAS 是否存在问题。