​​‌‌​​​‌‌​‌​​‌‌‍​‌​‌‌‌​​‌‌‌‌​‌​‍​‌​​‌​​​‌​​​‌‌​‍​‌​‌‌​​​‌‌​​​​​‍​​‌​‌‌‌‌‌‌‌‌​​​‍​‌‌​​‌‌‌​‌‌​​‌‌‌‍​‌‌​​​‌‌‌​​​‌​‌‍​​‌‌‌‌‌‌‌‌​​‌‌‍‌​‌‌‌​‌​‍‌​​​‌​‌‌‍‌​​‌‌‌​​‍‌​​‌‌​‌‌‍​‌‌​‌​​‌​​‌‌‌​​‌‍​​​​​​​​‌​‌‌​‌‌‍​‌‌​‌​‌​​​​‌​​​‌‍​‌‌​​‌‌‌​‌‌​​‌‌‌‍​‌‌‌‌‌​‌​‌‌‌‌‌​‌‍​​​‌‌‌‌​‌​​​‌‌​‍​‌‌​​​​​​​‌​​​‌‍​‌​​‌‌​‌‌‌‌​​‌​‍​​‌‌‌‌‌‌‌‌​​‌​‍​​​​​​​​‌‌‌‌​​‌‌‍​​​‌​‌​‌‌​​‌‌‌​‍‌​​‌‌‌‌​‍‌​​‌‌​‌‌‍‌​​‌​​‌​‍‌​​‌​‌‌​‍‌​​‌​​​‌‍​‌‌​​​‌​‌‌‌​​​‌‍‌‌​​‌‌​‌‍‌‌​​‌‌‌‌‍‌‌​​‌‌‌​‍‌‌​​​‌‌​‍‌‌​‌​​‌​‍‌‌​​‌‌‌‌‍‌‌​​‌‌​‌‍‌‌​‌​​‌​‍‌‌​​‌‌‌​‍‌‌​​‌​‌‌‍​‌​‌‌​‌‌‌‌​​‌​​‍​‌‌​​​​‌​‌​​​‌‌‍​​​​​​​​‌‌‌‌​​‌‌‍​‌​‌‌​​​‌‌​​​​​‍​​‌‌​‌​​‌‌‌‌​​​‍​‌​‌​​​‌‌​​‌‌‌‌‍​‌​‌​​​‌​‌‌‌‌‌‌‍​​​​​​​​‌‌‌​​‌​‌‍‌​​‌​‌‌‌‍‌​​​‌​‌‌‍‌​​​‌​‌‌‍‌​​​‌‌‌‌‍‌​​​‌‌​​‍‌‌​​​‌​‌‍‌​‌​​​‌‌‍‌​‌​​​‌‌‍‌​​‌​‌‌​‍‌​​‌​‌​​‍‌​​‌​‌‌​‍‌​​​‌​​​‍‌​​‌​‌‌​‍‌‌​‌​​​‌‍‌​​‌​​‌​‍‌​​‌‌​‌​‍‌​‌​​​‌‌‍‌​​‌‌‌‌​‍‌​​​‌‌​‌‍‌​​‌‌‌​​‍‌​​‌​‌‌‌‍‌​​‌​‌‌​‍‌​​​‌​​‌‍‌​​‌‌​‌​‍‌​​​‌‌​​‍‌​‌​​​‌‌‍‌‌​​‌​‌​‍‌‌​​‌​​​‍‌‌​‌​​​‌‍‌​​‌​‌‌‌‍‌​​​‌​‌‌‍‌​​‌​​‌​‍‌​​‌​​‌‌

本文记录了如何修复etcd集群中问题节点的方法。

  • 检查节点监控状态:
etcdctl --endpoints=https://172.19.121.60:2379 \
  --ca-file=/opt/kubernetes/ssl/ca.pem \
  --cert-file=/opt/kubernetes/ssl/etcd.pem \
  --key-file=/opt/kubernetes/ssl/etcd-key.pem cluster-health
  • 返回结果如下:
member 179d106ba852032b is healthy: got healthy result from https://172.19.121.62:2379
member 1f4b42d355aaf7b9 is healthy: got healthy result from https://172.19.121.60:2379
member dcab8e2ed4e917fd is unreachable: no available published client urls
  • 移除问题节点:
etcdctl --endpoints=https://172.19.121.60:2379 \
    --ca-file=/opt/kubernetes/ssl/ca.pem \
    --cert-file=/opt/kubernetes/ssl/etcd.pem \
    --key-file=/opt/kubernetes/ssl/etcd-key.pem \
    member remove dcab8e2ed4e917fd
  • 检查并修改问题节点配置:
# vim /etc/etcd/etcd.conf
ETCD_INITIAL_CLUSTER_STATE="new"
修改为
ETCD_INITIAL_CLUSTER_STATE="existing"
  • 删除etcd数据库:
rm -fr /var/lib/etcd/*
  • 把节点加入集群:
etcdctl --endpoints=https://172.19.121.60:2379   --ca-file=/opt/kubernetes/ssl/ca.pem   --cert-file=/opt/kubernetes/ssl/etcd.pem   --key-file=/opt/kubernetes/ssl/etcd-key.pem member add etcd-node2 https://172.19.121.61:2380
  • 启动该节点:
systemctl start etcd
  • 再次检查集群状态:
etcdctl --endpoints=https://172.19.121.60:2379 \
  --ca-file=/opt/kubernetes/ssl/ca.pem \
  --cert-file=/opt/kubernetes/ssl/etcd.pem \
  --key-file=/opt/kubernetes/ssl/etcd-key.pem cluster-health
  • 返回以下即为正常:
member 179d106ba852032b is healthy: got healthy result from https://172.19.121.62:2379
member 1f4b42d355aaf7b9 is healthy: got healthy result from https://172.19.121.60:2379
member 793e602a959dabe5 is healthy: got healthy result from https://172.19.121.61:2379
cluster is healthy