一、核心目标与核心概念澄清
1. 核心目标
实现 K8s 中type: LoadBalancer类型的 Service 按需对接自定义的多套云平台(如 “一云”“二云”),不同 Service 可通过注解指定对接的云平台及负载策略,且仅LoadBalancer类型 Service 触发自定义逻辑,不影响其他类型 Service。
2. 关键概念修正(易混淆点)
| 易混淆表述 | 精准定义 |
|---|---|
| “修改配置让所有 Service 走负载” | K8s 仅对type: LoadBalancer的 Service 调用云提供商插件(CCM)逻辑,ClusterIP/NodePort/ExternalName类型 Service 完全走原生逻辑,无需额外配置隔离 |
| “Service 通过污点走自定义云” | 污点(Taint)用于节点调度,与 Service 对接云平台无关;Service 通过annotations传递参数(如云平台名称、负载策略),由自定义 CCM 解析后调用对应云接口 |
| “kube-controller-manager 配置影响所有 Service” | --cloud-provider参数仅作用于需要云基础设施交互的场景(仅LoadBalancer类型 Service),其他 Service 不受影响 |
二、全流程开发步骤
1. 前期准备
- 技术栈:Go 1.19+(CCM 开发)、K8s 1.24+ 集群、私有镜像仓库(如 Harbor)、已实现多云负载均衡接口(REST/gRPC,含创建 / 删除 / 更新 / 查询 LB 能力)。
- 核心依赖:
k8s.io/apimachinery、k8s.io/client-go、k8s.io/cloud-provider等 K8s 核心库。
2. 自定义 CCM 插件开发(核心环节)
CCM(Cloud Controller Manager)是 K8s 对接云厂商的核心组件,需基于 K8s 标准接口开发,实现 “解析 Service 注解 → 区分云平台 → 调用对应云接口 → 回填 LB IP” 的逻辑。
2.1 工程结构标准化
plaintext
my-cloud-controller-manager/ ├── cmd/ │ └── manager/ │ └── main.go # CCM 入口,注册云提供商并启动 ├── pkg/ │ ├── cloudprovider/ # 云提供商核心实现 │ │ ├── mycloud/ │ │ │ ├── client.go # 多云接口客户端(对接一云/二云) │ │ │ ├── loadbalancer.go # LoadBalancer 接口实现(核心逻辑) │ │ │ └── provider.go # 注册自定义云提供商 │ └── utils/ # 工具函数 │ ├── node.go # 提取节点 IP │ └── service.go # 解析 Service 注解 ├── deploy/ # 部署配置 │ ├── rbac.yaml # CCM 权限配置 │ └── ccm-deployment.yaml # CCM 部署文件 ├── Dockerfile # 镜像构建文件 ├── go.mod # Go 依赖管理 └── go.sum2.2 核心代码实现
(1)多云接口客户端封装(pkg/cloudprovider/mycloud/client.go)
封装对接 “一云”“二云” 的接口调用逻辑,支持传递注解参数(如 LB 规格、带宽):
go
运行
package mycloud import ( "bytes" "encoding/json" "fmt" "net/http" "time" "k8s.io/klog/v2" ) // 定义 LB 创建响应结构(与云接口一致) type LBCreateResponse struct { LBId string `json:"lbId"` IP string `json:"ip"` Err string `json:"err"` } // 多云客户端结构体 type MyLBClient struct { cloud1BaseURL string // 一云接口地址 cloud2BaseURL string // 二云接口地址 httpClient *http.Client } // 初始化客户端 func NewMyLBClient(cloud1URL, cloud2URL string) *MyLBClient { return &MyLBClient{ cloud1BaseURL: cloud1URL, cloud2BaseURL: cloud2URL, httpClient: &http.Client{Timeout: 30 * time.Second}, } } // 调用一云创建 LB 接口 func (c *MyLBClient) CreateCloud1LB(serviceName string, nodeIPs []string, port, nodePort int, spec, bandwidth string) (*LBCreateResponse, error) { reqBody := map[string]interface{}{ "name": serviceName, "nodes": nodeIPs, "port": port, "nodePort": nodePort, "spec": spec, "bandwidth": bandwidth, } return c.callLBAPI(c.cloud1BaseURL+"/api/v1/lb/create", "POST", reqBody) } // 调用二云创建 LB 接口 func (c *MyLBClient) CreateCloud2LB(serviceName string, nodeIPs []string, port, nodePort int, spec, bandwidth string) (*LBCreateResponse, error) { reqBody := map[string]interface{}{ "name": serviceName, "nodes": nodeIPs, "port": port, "nodePort": nodePort, "spec": spec, "bandwidth": bandwidth, } return c.callLBAPI(c.cloud2BaseURL+"/api/v1/lb/create", "POST", reqBody) } // 通用接口调用方法 func (c *MyLBClient) callLBAPI(url, method string, reqBody interface{}) (*LBCreateResponse, error) { reqBytes, err := json.Marshal(reqBody) if err != nil { return nil, err } req, err := http.NewRequest(method, url, bytes.NewBuffer(reqBytes)) if err != nil { return nil, err } req.Header.Set("Content-Type", "application/json") resp, err := c.httpClient.Do(req) if err != nil { return nil, err } defer resp.Body.Close() var lbResp LBCreateResponse if err := json.NewDecoder(resp.Body).Decode(&lbResp); err != nil { return nil, err } if lbResp.Err != "" { return nil, fmt.Errorf(lbResp.Err) } return &lbResp, nil } // 补充删除/更新/查询接口(略)(2)LoadBalancer 接口实现(pkg/cloudprovider/mycloud/loadbalancer.go)
实现 K8s 标准LoadBalancer接口,核心逻辑:解析注解 → 区分云平台 → 调用对应接口 → 回填 IP:
go
运行
package mycloud import ( "context" "fmt" "github.com/your-name/my-cloud-controller-manager/pkg/utils" "k8s.io/api/core/v1" "k8s.io/cloud-provider" "k8s.io/cloud-provider/pkg/framework" "k8s.io/klog/v2" ) type MyLoadBalancer struct { lbClient *MyLBClient kubeClient framework.ControllerClient } func NewMyLoadBalancer(lbClient *MyLBClient, kubeClient framework.ControllerClient) *MyLoadBalancer { return &MyLoadBalancer{lbClient: lbClient, kubeClient: kubeClient} } // 核心方法:创建 LoadBalancer func (m *MyLoadBalancer) CreateLoadBalancer( ctx context.Context, clusterName string, service *v1.Service, nodes []*v1.Node, ) (*v1.LoadBalancerStatus, error) { // 1. 校验:仅带指定注解的 LoadBalancer Service 触发自定义逻辑 enable, ok := service.Annotations["mycloud.com/enable"] if !ok || enable != "true" { return nil, fmt.Errorf("my-cloud not enabled for service %s", service.Name) } // 2. 解析注解参数(云平台/规格/带宽) provider := service.Annotations["mycloud.com/provider"] // cloud1/cloud2 lbSpec := service.Annotations["mycloud.com/lb-spec"] lbBandwidth := service.Annotations["mycloud.com/lb-bandwidth"] if provider == "" { return nil, fmt.Errorf("missing mycloud.com/provider annotation") } if lbSpec == "" { lbSpec = "standard" } if lbBandwidth == "" { lbBandwidth = "5M" } // 3. 提取节点 IP 和端口 nodeIPs := utils.ExtractNodeIPs(nodes) if len(nodeIPs) == 0 { return nil, fmt.Errorf("no node IPs extracted") } port := service.Spec.Ports[0].Port nodePort := service.Spec.Ports[0].NodePort if nodePort == 0 { return nil, fmt.Errorf("nodePort not allocated for service %s", service.Name) } // 4. 分支逻辑:调用对应云平台接口 var lbResp *LBCreateResponse var err error switch provider { case "cloud1": lbResp, err = m.lbClient.CreateCloud1LB(service.Name, nodeIPs, int(port), int(nodePort), lbSpec, lbBandwidth) case "cloud2": lbResp, err = m.lbClient.CreateCloud2LB(service.Name, nodeIPs, int(port), int(nodePort), lbSpec, lbBandwidth) default: return nil, fmt.Errorf("unsupported provider: %s", provider) } if err != nil { return nil, err } // 5. 记录 LB ID 到 Service 注解,方便后续删除/更新 utils.SetLBIdAnnotation(service, lbResp.LBId) _ = m.kubeClient.Update(ctx, service) // 忽略更新失败(仅日志告警) // 6. 回填 LB IP 到 Service 的 EXTERNAL-IP return &v1.LoadBalancerStatus{ Ingress: []v1.LoadBalancerIngress{{IP: lbResp.IP}}, }, nil } // 实现删除/更新/查询等接口(略) func (m *MyLoadBalancer) DeleteLoadBalancer(ctx context.Context, clusterName string, service *v1.Service) error { // 解析 LB ID → 调用对应云平台删除接口 → 删除注解 return nil } func (m *MyLoadBalancer) UpdateLoadBalancer(ctx context.Context, clusterName string, service *v1.Service, nodes []*v1.Node) error { // 解析 LB ID → 调用对应云平台更新接口 return nil }(3)注册自定义云提供商(pkg/cloudprovider/mycloud/provider.go)
go
运行
package mycloud import ( "io" "k8s.io/cloud-provider" "k8s.io/client-go/kubernetes" "k8s.io/client-go/rest" "k8s.io/klog/v2" ) type MyCloudProvider struct { loadBalancer *MyLoadBalancer initialized bool } // 初始化云提供商 func NewMyCloudProvider(config io.Reader) (cloudprovider.Interface, error) { // 1. 加载 K8s 集群内配置 kubeConfig, err := rest.InClusterConfig() if err != nil { return nil, err } kubeClient, err := kubernetes.NewForConfig(kubeConfig) if err != nil { return nil, err } // 2. 初始化多云客户端(替换为实际接口地址) lbClient := NewMyLBClient("http://cloud1-api:8080", "http://cloud2-api:8080") // 3. 初始化 LoadBalancer 实现 loadBalancer := NewMyLoadBalancer(lbClient, framework.NewControllerClient(kubeClient, nil)) return &MyCloudProvider{ loadBalancer: loadBalancer, initialized: true, }, nil } // 实现 cloudprovider.Interface 接口(仅暴露 LoadBalancer 能力) func (m *MyCloudProvider) Initialize(_ cloudprovider.ControllerClientBuilder, _ <-chan struct{}) {} func (m *MyCloudProvider) LoadBalancer() (cloudprovider.LoadBalancer, bool) { return m.loadBalancer, m.initialized } func (m *MyCloudProvider) Instances() (cloudprovider.Instances, bool) { return nil, false } func (m *MyCloudProvider) Zones() (cloudprovider.Zones, bool) { return nil, false } func (m *MyCloudProvider) ProviderName() string { return "my-cloud" } func (m *MyCloudProvider) HasClusterID() bool { return true } // 注册云提供商(init 函数自动执行) func init() { cloudprovider.RegisterCloudProvider("my-cloud", NewMyCloudProvider) klog.Info("my-cloud provider registered") }(4)CCM 入口文件(cmd/manager/main.go)
go
运行
package main import ( "flag" "os" _ "github.com/your-name/my-cloud-controller-manager/pkg/cloudprovider/mycloud" "k8s.io/cloud-provider/cmd/cloud-controller-manager/app" "k8s.io/cloud-provider/cmd/cloud-controller-manager/app/options" "k8s.io/klog/v2" ) func main() { opts := options.NewCloudControllerManagerOptions() cmd := app.NewCloudControllerManagerCommand(opts) // 默认参数配置 flag.Set("cloud-provider", "my-cloud") // 自定义云提供商名称 flag.Set("leader-elect", "false") // 单实例无需选主(生产环境建议开启) flag.Set("controllers", "service") // 仅启用 Service 控制器 flag.Set("v", "4") // 日志级别 if err := cmd.Execute(); err != nil { klog.Fatalf("run CCM failed: %v", err) os.Exit(1) } }2.3 工具函数实现(pkg/utils/)
- node.go:提取节点内网 / 公网 IP;
- service.go:解析 / 设置 Service 注解(如 LB ID)。
3. 镜像构建与部署
3.1 构建镜像
dockerfile
# 编译阶段 FROM golang:1.20-alpine AS builder WORKDIR /workspace COPY . . ENV GOPROXY=https://goproxy.cn,direct RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o my-cloud-controller-manager ./cmd/manager # 运行阶段 FROM alpine:3.18 COPY --from=builder /workspace/my-cloud-controller-manager /usr/bin/ RUN apk add --no-cache ca-certificates ENTRYPOINT ["/usr/bin/my-cloud-controller-manager"]构建并推送镜像:
bash
运行
docker build -t harbor.your-domain.com/my-ccm:v1 . docker push harbor.your-domain.com/my-ccm:v13.2 部署 CCM 到 K8s 集群
(1)RBAC 权限配置(deploy/rbac.yaml)
yaml
apiVersion: v1 kind: ServiceAccount metadata: name: my-cloud-controller-manager namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: my-cloud-controller-manager-role rules: - apiGroups: [""] resources: ["services", "nodes", "events"] verbs: ["get", "list", "watch", "update", "patch", "create"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: my-cloud-controller-manager-binding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: my-cloud-controller-manager-role subjects: - kind: ServiceAccount name: my-cloud-controller-manager namespace: kube-system(2)CCM 部署文件(deploy/ccm-deployment.yaml)
yaml
apiVersion: apps/v1 kind: Deployment metadata: name: my-cloud-controller-manager namespace: kube-system spec: replicas: 1 selector: matchLabels: app: my-ccm template: metadata: labels: app: my-ccm spec: serviceAccountName: my-cloud-controller-manager tolerations: - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule containers: - name: my-ccm image: harbor.your-domain.com/my-ccm:v1 imagePullPolicy: Always command: - /usr/bin/my-cloud-controller-manager - --cloud-provider=my-cloud - --leader-elect=false - --controllers=service - --v=4 resources: limits: cpu: 100m memory: 128Mi requests: cpu: 50m memory: 64Mi(3)执行部署
bash
运行
kubectl apply -f deploy/rbac.yaml kubectl apply -f deploy/ccm-deployment.yaml # 验证 CCM 运行状态 kubectl get pods -n kube-system -l app=my-ccm kubectl logs -n kube-system <my-ccm-pod-name> # 查看是否输出 "my-cloud provider registered"4. K8s 集群配置
修改 kube-controller-manager 静态 Pod 配置(/etc/kubernetes/manifests/kube-controller-manager.yaml),添加参数指定自定义云提供商:
yaml
spec: containers: - name: kube-controller-manager command: - kube-controller-manager # 新增/修改以下参数 - --cloud-provider=my-cloud - --external-cloud-volume-plugin=my-cloud # 保留原有其他参数...修改后 kube-controller-manager 会自动重启,配置生效。
5. 测试自定义 LoadBalancer Service
5.1 部署测试应用
yaml
# nginx-deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-demo spec: replicas: 2 selector: matchLabels: app: nginx-demo template: metadata: labels: app: nginx-demo spec: containers: - name: nginx image: nginx:alpine ports: - containerPort: 805.2 创建自定义 LoadBalancer Service
yaml
# nginx-lb-cloud1.yaml(对接一云) apiVersion: v1 kind: Service metadata: name: nginx-lb-cloud1 annotations: mycloud.com/enable: "true" mycloud.com/provider: "cloud1" mycloud.com/lb-spec: "standard" mycloud.com/lb-bandwidth: "10M" spec: selector: app: nginx-demo ports: - port: 80 targetPort: 80 type: LoadBalancer # nginx-lb-cloud2.yaml(对接二云) apiVersion: v1 kind: Service metadata: name: nginx-lb-cloud2 annotations: mycloud.com/enable: "true" mycloud.com/provider: "cloud2" mycloud.com/lb-spec: "high-performance" mycloud.com/lb-bandwidth: "20M" spec: selector: app: nginx-demo ports: - port: 80 targetPort: 80 type: LoadBalancer # 普通 ClusterIP Service(不受影响) apiVersion: v1 kind: Service metadata: name: nginx-clusterip spec: selector: app: nginx-demo ports: - port: 80 targetPort: 80 type: ClusterIP5.3 验证结果
bash
运行
# 创建 Service kubectl apply -f nginx-deploy.yaml kubectl apply -f nginx-lb-cloud1.yaml kubectl apply -f nginx-lb-cloud2.yaml kubectl apply -f nginx-clusterip.yaml # 查看 Service 状态 watch kubectl get svc # 预期结果: # - nginx-lb-cloud1/nginx-lb-cloud2 的 EXTERNAL-IP 会变为对应云平台返回的 LB IP; # - nginx-clusterip 为 ClusterIP 类型,EXTERNAL-IP 为空,完全不受影响。 # 访问测试 curl http://<nginx-lb-cloud1-EXTERNAL-IP> # 触发一云负载策略 curl http://<nginx-lb-cloud2-EXTERNAL-IP> # 触发二云负载策略三、关键控制逻辑与容错
1. 仅 LoadBalancer 类型 Service 触发自定义逻辑
K8s 仅对type: LoadBalancer的 Service 调用 CCM 的 LoadBalancer 接口,ClusterIP/NodePort等类型完全走原生逻辑,无需额外隔离。
2. 精细化控制(仅指定注解的 LB Service 生效)
通过mycloud.com/enable: "true"注解开关,仅显式声明的 LB Service 触发自定义逻辑,其他 LB Service 会返回错误(EXTERNAL-IP 处于 pending 状态),可按需兼容原生云厂商逻辑。
3. 容错处理
- 接口调用超时 / 失败:CCM 会重试并记录日志,确保 LB 创建 / 删除逻辑幂等;
- 注解解析失败:返回明确错误,避免无效调用;
- Service 注解更新失败:仅日志告警,不影响核心 LB 逻辑。
四、核心总结
- 开发核心:基于 K8s 标准 CCM 框架开发插件,实现 LoadBalancer 接口,通过解析 Service 注解区分多云平台,调用对应接口生成 LB IP;
- 部署核心:CCM 部署到 kube-system 命名空间,配置 RBAC 权限,修改 kube-controller-manager 指定自定义云提供商;
- 控制核心:K8s 天然区分 Service 类型,仅
type: LoadBalancer触发 CCM 逻辑,通过注解开关可进一步精细化控制; - 隔离核心:自定义 CCM 仅实现 LoadBalancer 接口,其他接口返回未实现,确保集群其他功能(如节点管理、存储)不受影响。