k8s-doc

2024-08-11 1278 words 3 minutes

Contents

k8s doc

1、节点加入失败

日志信息

......
[control-plane] Creating static Pod manifest for "kube-controller-manager"
W0329 00:01:51.364121   19209 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[control-plane] Creating static Pod manifest for "kube-scheduler"
W0329 00:01:51.373807   19209 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[check-etcd] Checking that the etcd cluster is healthy

error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://10.8.18.105:2379 
with maintenance client: context deadline exceeded
To see the stack trace of this error execute with --v=5 or higher

根据关键信息 "error execution phase check-etcd" 可知，可能是在执行加入 etcd 时候出现的错误，导致 master 无法加入原先的 kubernetes 集群。

解决办法：

1）获取 Etcd 镜像列表

[root@k8s-master01 ~]# kubectl get pods -n kube-system | grep etcd
etcd-k8s-master01                         1/1     Running            4          4d20h
etcd-k8s-master03                         1/1     Running            1          4d20h

2）进入 Etcd 容器并删除节点信息

选择上面两个 etcd 中任意一个 pod，通过 kubectl 工具进入 pod 内部

[root@k8s-master01 ~]# kubectl exec -it -n kube-system etcd-k8s-master01 sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.

进入容器后，按下面步执行

## 配置环境
# export ETCDCTL_API=3
# alias etcdctl='etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'

## 查看 etcd 集群成员列表
# etcdctl member list
a9b6a1341829d62a, started, k8s-master03, https://172.20.5.13:2380, https://172.20.5.13:2379, false
d1c737a26ea4dd70, started, k8s-master01, https://172.20.5.11:2380, https://172.20.5.11:2379, false
fe2d4a2a33304913, started, k8s-master02, https://172.20.5.12:2380, https://172.20.5.12:2379, false

## 删除 etcd 集群成员 k8s-master02
# etcdctl member remove fe2d4a2a33304913
Member fe2d4a2a33304913 removed from cluster 36067d1f1ca3f1db

## 退出容器
# exit

3）再次尝试加入集群

通过 kubeadm 命令再次尝试将 k8s-master02 节点加入集群，在执行前首先进入到 k8s-master02 节点服务器，执行 kubeadm 的清除命令：

$ kubeadm reset

然后尝试加入 kubernetes 集群：

[check-etcd] Checking that the etcd cluster is healthy[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.18" ConfigMap in the kube-system namespace[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"[kubelet-start] Starting the kubelet[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...[etcd] Announced new etcd member joining to the existing etcd cluster[etcd] Creating static Pod manifest for "etcd"[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s{"level":"warn","ts":"2020-12-22T11:26:23.560+0800","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://172.20.5.12:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace[mark-control-plane] Marking the node k8s-master02 as control-plane by adding the label "node-role.kubernetes.io/master=''"[mark-control-plane] Marking the node k8s-master02 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]This node has joined the cluster and a new control plane instance was created:* Certificate signing request was sent to apiserver and approval was received.* The Kubelet was informed of the new secure connection details.* Control plane (master) label and taint were applied to the new node.* The Kubernetes control plane instances scaled up.* A new etcd member was added to the local/stacked etcd cluster.To start administering your cluster from this node, you need to run the following as a regular user:        mkdir -p $HOME/.kube        sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config        sudo chown $(id -u):$(id -g) $HOME/.kube/configRun 'kubectl get nodes' to see this node join the cluster.


[root@k8s-master01 ~]# kubectl get nodesNAME           STATUS   ROLES    AGE     VERSIONk8s-master01   Ready    master   4d20h   v1.18.2k8s-master02   Ready    master   7m38s   v1.18.2k8s-master03   Ready    master   4d20h   v1.18.2k8s-node01     Ready    worker   4d18h   v1.18.2

1、svc命名问题，导致ingress nginx无法映射端口

在ingress-nginx的tcp端口映射过程中，只支持xxx-svc，并且，name中，不能使用下划线

# 错误用法
metadata:
  name: imau1_conf 
  
# 正确用法  
metadata:
  name: imau1-conf

2、deployment创建pod失败

参考链接：https://developer.aliyun.com/article/791261

简介：eployment创建pod失败，通过describe deployment ${DEPLOY_NAME} 没能看到具体原因。最终在“edit deployment ${DEPLOY_NAME}”中看到错误原因。

通过deployment创建pod失败

在k8s集群中，deployment启动后没有成功创建pod，通过“kubectl describe deployment ${DEPLOY_NAME} ”，看到如下日志，只看到“ReplicaFailure True FailedCreate”，但是没有failed的原因。

> kubectl describe deployment ${DEPLOY_NAME}
----------------------------------------------
Conditions:
  Type             Status  Reason
  ----             ------  ------
  Progressing      True    NewReplicaSetCreated
  Available        False   MinimumReplicasUnavailable
  ReplicaFailure   True    FailedCreate
OldReplicaSets:    <none>
NewReplicaSet:     james-mtfnwnbu4z7v5umk-67cc5d6b98 (0/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  40s   deployment-controller  Scaled up replica set james-mtfnwnbu4z7v5umk-67cc5d6b98 to 1

其实原因藏在edit deployment里面。可以通过"edit deploy"来查看。

> kubectl edit deployment ${DEPLOY_NAME}
--------------------------------------------------------------
'pods "james-mtfnwnbu4z7v5umk-67cc5d6b98" is forbidden: error looking up service account ns-james/davis: serviceaccount "davis" not found'

原因很清楚，这个pod是需要指定的serviceaccount创建，但是集群没有提前创建好sa导致pod启动失败。

创建serviceaccount

ns下默认有一个default的sa，其他sa需要自己创建

root@titum:~# kubectl create sa ${SA_NAME} -n ns-james
serviceaccount/davis created
root@titum:~# kubectl get sa -n test
NAME      SECRETS   AGE
default   1         94s
davis     1         2s

重新调度服务

3、debug pod

kubectl run sugar  --image=serialt/debug --restart=Never