K8s之分布式动态存储卷-Rookceph方案

rook本身支持多种存储,例如ceph,s3,nfs等等, 因此rook是k8s平台中比较好的存储平台

因此使用rook这个项目 https://github.com/rook/rook/

这里部署的版本是 v1.4

部署

1. 准备

磁盘需求

确保每个node节点机器有至少一块新的裸盘(不包含任何文件系统)

如果磁盘包含了文件系统 需要执行以下步骤(会损坏磁盘里面的所有数据)

1
2
dmsetup remove_all
dd if=/dev/zero of=/dev/你的磁盘 bs=1M count=100
系统需求

​ 系统需要内核为4.10的版本,如果内核版本过低,则会导致pv和pvc都能成功创建,但是无法挂载到pod的问题。这个问题主要是因为rook管理的ceph挂载的时候分了命名空间挂载,于是需要支持参数 namespace=xxxx ,内核版本过低的话,则不支持,会导致挂载失败,然后导致pod一直处于启动中的状态

集群所有机器的内核版本

1
4.18.16-1.el7.elrepo.x86_64

2. 安装

基础组件
1
2
3
4
5
6
7
#安装基础组件
git clone https://github.com/rook/rook.git rook-git
git checkout release-1.4
cd /data/install_k8s/plugin/rook/rook-git/cluster/examples/kubernetes/ceph
kubectl create -f common.yaml
kubectl create -f operator.yaml
kubectl create -f cluster.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
最终应该会有如下pod
>>> -n rook-ceph get pod
NAME                                                              READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-2vrd9                                            3/3     Running     3          5h8m
csi-cephfsplugin-8r5vm                                            3/3     Running     3          5h55m
csi-cephfsplugin-m9qxq                                            3/3     Running     3          5h9m
csi-cephfsplugin-provisioner-598854d87f-dhnkk                     6/6     Running     0          21m
csi-cephfsplugin-provisioner-598854d87f-m9zpg                     6/6     Running     0          21m
csi-cephfsplugin-wk994                                            3/3     Running     3          5h12m
csi-rbdplugin-5zhcj                                               3/3     Running     3          5h55m
csi-rbdplugin-7mqh5                                               3/3     Running     3          5h12m
csi-rbdplugin-h9kph                                               3/3     Running     3          5h8m
csi-rbdplugin-phx9q                                               3/3     Running     3          5h9m
csi-rbdplugin-provisioner-dbc67ffdc-2zqhb                         6/6     Running     0          52m
csi-rbdplugin-provisioner-dbc67ffdc-njzbw                         6/6     Running     0          52m
rook-ceph-crashcollector-ywkf-node01-120.31.139.242-6f5874j9pjk   1/1     Running     0          76m
rook-ceph-crashcollector-ywkf-node02-120.31.139.187-7d8475zswn5   1/1     Running     0          20m
rook-ceph-crashcollector-ywkf-node03-120.31.139.245-6dccc9vw9j4   1/1     Running     0          21m
rook-ceph-crashcollector-ywkf-node04-120.31.68.226-f4fc47f7jtt6   1/1     Running     0          52m
rook-ceph-mds-myfs-a-89cd9bb89-c4v6n                              1/1     Running     0          20m
rook-ceph-mds-myfs-b-787f86fcb9-dnt67                             1/1     Running     0          20m
rook-ceph-mgr-a-578bf44795-8x487                                  1/1     Running     1          5h8m
rook-ceph-mon-a-7c94c87658-j7wd8                                  1/1     Running     0          5h8m
rook-ceph-mon-b-7775859d99-dvthw                                  1/1     Running     1          5h53m
rook-ceph-mon-c-7fc4c55b74-8fv8x                                  1/1     Running     0          5h9m
rook-ceph-operator-667756ddb6-gclrt                               1/1     Running     0          71m
rook-ceph-osd-0-746847b458-cd7b6                                  1/1     Running     0          5h12m
rook-ceph-osd-1-57f88d7848-pbmtk                                  1/1     Running     0          5h9m
rook-ceph-osd-2-54bc678f89-zxx54                                  1/1     Running     0          5h8m
rook-ceph-osd-3-65f78f7875-bk6qh                                  1/1     Running     1          5h47m
rook-ceph-osd-prepare-ywkf-node01-120.31.139.242-p7ggf            0/1     Completed   0          71m
rook-ceph-osd-prepare-ywkf-node02-120.31.139.187-w4899            0/1     Completed   0          71m
rook-ceph-osd-prepare-ywkf-node03-120.31.139.245-gzcxm            0/1     Completed   0          71m
rook-ceph-osd-prepare-ywkf-node04-120.31.68.226-v2g4w             0/1     Completed   0          71m
rook-ceph-tools-6b4889fdfd-5qzt9                                  1/1     Running     0          5h6m
rook-discover-7m662                                               1/1     Running     2          5h8m
rook-discover-k7tpv                                               1/1     Running     1          5h9m
rook-discover-nfg8l                                               1/1     Running     1          5h55m
rook-discover-tqgj6                                               1/1     Running     1          5h12m
>>> 
配置ingress,访问dashboard
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: rook-ceph-mgr-dashboard
  namespace: rook-ceph
  annotations:
    kubernetes.io/ingress.class: "nginx"
    kubernetes.io/tls-acme: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
    nginx.ingress.kubernetes.io/server-snippet: |
            proxy_ssl_verify off;
spec:
  tls:
   - hosts:
     - ywkf-ceph-dev.gz4399.com
     secretName: ywkf-ceph-dev.gz4399.com
  rules:
  - host: ywkf-ceph-dev.gz4399.com
    http:
      paths:
      - path: /
        backend:
          serviceName: rook-ceph-mgr-dashboard
          servicePort: https-dashboard
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#查看dashboard密码
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

#修改dashboard密码
#在dashboard里面改没用的,pod重启后密码就不见了,这个密码用的是k8s secret注入的
echo -n "你的新密码" | base64
kubectl -n rook-ceph edit secret rook-ceph-dashboard-password 
#修改
#data:
  #password: MTIzNDU2
#password 为上面的base64的值

访问dashboard http://ywkf-ceph-dev.gz4399.com/#/dashboard 账户admin 密码,刚刚设置的密码

会有如下页面

image-20200918161645711

安装toolbox

toolbox作用是ceph命令行工具,方便定位和配置一些内容

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#toolbox.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rook-ceph-tools
  namespace: rook-ceph
  labels:
    app: rook-ceph-tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rook-ceph-tools
  template:
    metadata:
      labels:
        app: rook-ceph-tools
    spec:
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: rook-ceph-tools
        image: rook/ceph:master
        command: ["/tini"]
        args: ["-g", "--", "/usr/local/bin/toolbox.sh"]
        imagePullPolicy: IfNotPresent
        env:
          - name: ROOK_CEPH_USERNAME
            valueFrom:
              secretKeyRef:
                name: rook-ceph-mon
                key: ceph-username
          - name: ROOK_CEPH_SECRET
            valueFrom:
              secretKeyRef:
                name: rook-ceph-mon
                key: ceph-secret
        volumeMounts:
          - mountPath: /etc/ceph
            name: ceph-config
          - name: mon-endpoint-volume
            mountPath: /etc/rook
      volumes:
        - name: mon-endpoint-volume
          configMap:
            name: rook-ceph-mon-endpoints
            items:
            - key: data
              path: mon-endpoints
        - name: ceph-config
          emptyDir: {}
      tolerations:
        - key: "node.kubernetes.io/unreachable"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 5
配置storageclass

下面配置创建一个叫做 ceph-disks的 storageclass,底层存储为cephFileSystem,支持k8s原生扩容

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
  name: myfs
  namespace: rook-ceph
spec:
  metadataPool:
    replicated: #2个副本
      size: 2
  dataPools:
    - replicated:
        size: 2
  preservePoolsOnDelete: false #删除后自动把底层的pool干掉
  metadataServer:
    activeCount: 1
    activeStandby: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-disks #申请pvc的时候申请ceph-disks就会自动创建
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: rook-ceph.cephfs.csi.ceph.com
allowVolumeExpansion: true #允许扩容
parameters:
  # clusterID is the namespace where operator is deployed.
  clusterID: rook-ceph

  # CephFS filesystem name into which the volume shall be created
  fsName: myfs

  # Ceph pool into which the volume shall be created
  # Required for provisionVolume: "true"
  pool: myfs-data0

  # Root path of an existing CephFS volume
  # Required for provisionVolume: "false"
  # rootPath: /absolute/path

  # The secrets contain Ceph admin credentials. These are generated automatically by the operator
  # in the same namespace as the cluster.
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

reclaimPolicy: Delete #pv释放行为,Delete

为master节点的plugin打补丁

如果不为 csi-cephfsplugin 和 csi-rbdplugin 去patch相关的部署容忍,那么在这两个节点将不能挂载ceph卷


3. 使用

创建pvc
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#创建一个ceph的pvc
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: cephfs-test
spec:
  storageClassName: ceph-disks #ceph-disks
  # ReadWriteOnce:简写RWO,读写权限,且只能被单个node挂载;
  # ReadOnlyMany:简写ROX,只读权限,允许被多个node挂载;
  # ReadWriteMany:简写RWX,读写权限,允许被多个node挂载;
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      # 注意格式,不能写“GB”
      storage: 4096Mi
创建一个关联到这个pvc的pod
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
apiVersion: v1
kind: Pod
metadata:
  name: temp-centos
spec:
  containers:
    - name: centos
      image: centos:7
      command: ["sh", "-c", "while true;do date;sleep 30;done"]
      imagePullPolicy: IfNotPresent
      volumeMounts:
        - name: cephfs-test #挂载到 /cephfs-data
          mountPath: "/cephfs-data"
  volumes:
    - name: cephfs-test
      persistentVolumeClaim:
        claimName: cephfs-test
  restartPolicy: Always

4. 性能基准测试

rdb 类测试(该存储类只能被一个容器消费)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#随机小文件写入 2g 
fio -ioengine=libaio -bs=2k -direct=1 -thread -rw=randwrite -size=2G -filename=./a.img -name=Ceph_test -iodepth=8 -runtime=30
WRITE: bw=117KiB/s (119kB/s), 117KiB/s-117KiB/s (119kB/s-119kB/s), io=3520KiB (3604kB), run=30187-30187msec

#随机小文件读取 2g
fio -ioengine=libaio -bs=2k -direct=1 -thread -rw=randread -size=2G -filename=./a.img -name=Ceph_test -iodepth=8 -runtime=30
READ: bw=7180KiB/s (7353kB/s), 7180KiB/s-7180KiB/s (7353kB/s-7353kB/s), io=210MiB (221MB), run=30004-30004msec

#大文件顺序写入2g
fio -ioengine=libaio -bs=1024k -direct=1 -thread -rw=write -size=2G -filename=./a.img -name=Ceph_test -iodepth=8 -runtime=30
WRITE: bw=43.6MiB/s (45.7MB/s), 43.6MiB/s-43.6MiB/s (45.7MB/s-45.7MB/s), io=1327MiB (1391MB), run=30417-30417msec

#大文件顺序读取2g
fio -ioengine=libaio -bs=1024k -direct=1 -thread -rw=read -size=2G -filename=./a.img -name=Ceph_test -iodepth=8 -runtime=30
READ: bw=105MiB/s (110MB/s), 105MiB/s-105MiB/s (110MB/s-110MB/s), io=2048MiB (2147MB), run=19527-19527msec
cephFileSystem存储类测试(可以被多个容器同时消费)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#随机小文件写入 2g 
fio -ioengine=libaio -bs=2k -direct=1 -thread -rw=randwrite -size=2G -filename=./a.img -name=Ceph_test -iodepth=8 -runtime=30
WRITE: bw=704KiB/s (721kB/s), 704KiB/s-704KiB/s (721kB/s-721kB/s), io=20.6MiB (21.6MB), run=30020-30020msec

#随机小文件读取 2g
fio -ioengine=libaio -bs=2k -direct=1 -thread -rw=randread -size=2G -filename=./a.img -name=Ceph_test -iodepth=8 -runtime=30
READ: bw=22.9MiB/s (24.0MB/s), 22.9MiB/s-22.9MiB/s (24.0MB/s-24.0MB/s), io=687MiB (720MB), run=30001-30001msec

#大文件顺序写入2g
fio -ioengine=libaio -bs=1024k -direct=1 -thread -rw=write -size=2G -filename=./a.img -name=Ceph_test -iodepth=8 -runtime=30
WRITE: bw=53.4MiB/s (56.0MB/s), 53.4MiB/s-53.4MiB/s (56.0MB/s-56.0MB/s), io=1620MiB (1699MB), run=30329-30329msec

#大文件顺序读取2g
fio -ioengine=libaio -bs=1024k -direct=1 -thread -rw=read -size=2G -filename=./a.img -name=Ceph_test -iodepth=8 -runtime=30
READ: bw=177MiB/s (186MB/s), 177MiB/s-177MiB/s (186MB/s-186MB/s), io=2048MiB (2147MB), run=11551-11551msec
本地SSD测试(参考)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#随机小文件写入 2g 
fio -ioengine=libaio -bs=2k -direct=1 -thread -rw=randwrite -size=2G -filename=./a.img -name=Ceph_test -iodepth=8 -runtime=30
WRITE: bw=5940KiB/s (6083kB/s), 5940KiB/s-5940KiB/s (6083kB/s-6083kB/s), io=174MiB (182MB), run=30001-30001msec

随机小文件读取 2g
fio -ioengine=libaio -bs=2k -direct=1 -thread -rw=randread -size=2G -filename=./a.img -name=Ceph_test -iodepth=8 -runtime=30
READ: bw=6658KiB/s (6818kB/s), 6658KiB/s-6658KiB/s (6818kB/s-6818kB/s), io=195MiB (205MB), run=30005-30005msec

#大文件顺序写入2g
fio -ioengine=libaio -bs=1024k -direct=1 -thread -rw=write -size=2G -filename=./a.img -name=Ceph_test -iodepth=8 -runtime=30
WRITE: bw=101MiB/s (105MB/s), 101MiB/s-101MiB/s (105MB/s-105MB/s), io=2048MiB (2147MB), run=20373-20373msec

#大文件顺序读取2g
fio -ioengine=libaio -bs=1024k -direct=1 -thread -rw=read -size=2G -filename=./a.img -name=Ceph_test -iodepth=8 -runtime=30
READ: bw=100MiB/s (105MB/s), 100MiB/s-100MiB/s (105MB/s-105MB/s), io=2048MiB (2147MB), run=20384-20384msec

5.调试

在宿主机挂载ceph
1
2
3
4
#1. 进入toolbox容器查看secret
cat /etc/ceph/keyring
#2. 在宿主机挂载
 mount -t ceph -o mds_namespace=myfs,name=admin,secret={上面看到的secret} {任意一台mon的ip}:6789:/xxx /你得挂载点
updatedupdated2021-02-092021-02-09