ceph集群
ceph集群创建
可使用普通账户创建ceph集群
export username="ceph-admin"export passwd="ceph-admin"export node1="node1"export node2="node2"export node3="node3"export node1_ip="192.168.122.101"export node2_ip="192.168.122.102"export node3_ip="192.168.122.103"
创建部署用户和ssh免密码登录
useradd ${username}echo "${passwd}" | passwd --stdin ${username}echo "${username} ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/${username}chmod 0440 /etc/sudoers.d/${username}sudo mkdir /etc/cephsudo chown -R ceph-admin.ceph-admin /etc/ceph
安装 ceph-deploy升级pip
sudo yum install -y python-pippip install --upgrade pippip install ceph-deploy
部署节点
创建工作目录,在部署节点时会产生很多信息
mkdir my-clustercd my-clusterceph-deploy new $node1 $node2 $node3[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf...[node2][INFO ] Running command: /usr/sbin/ip addr show[ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...
编辑 ceph.conf 配置文件添加cluster与public网络
# lsceph.conf ceph-deploy-ceph.log ceph.mon.keyringvim ceph.conf[global]fsid = 07ef58d8-3457-4cac-aa45-95166c738c16mon_initial_members = node1, node2, node3mon_host = 192.168.122.101,192.168.122.102,192.168.122.103auth_cluster_required = cephxauth_service_required = cephxauth_client_required = cephxpublic network = 192.168.122.0/24cluster network = 192.168.122.0/24
安装 ceph相关软件
建议使用镜像源
替代 ceph-deploy install node1 node2,不过下面的命令需要在每台node上安装
sudo wget -O /etc/yum.repos.d/ceph.repo https://raw.githubusercontent.com/aishangwei/ceph-demo/master/ceph-deploy/ceph.reposudo yum install -y ceph ceph-radosgw
配置初始 monitor(s)、并生成所有密钥
ceph-deploy mon create-initialls -l *.keyring-rw------- 1 root root 71 3月 12 12:53 ceph.bootstrap-mds.keyring-rw------- 1 root root 71 3月 12 12:53 ceph.bootstrap-mgr.keyring-rw------- 1 root root 71 3月 12 12:53 ceph.bootstrap-osd.keyring-rw------- 1 root root 71 3月 12 12:53 ceph.bootstrap-rgw.keyring-rw------- 1 root root 63 3月 12 12:53 ceph.client.admin.keyring-rw------- 1 root root 73 3月 12 12:50 ceph.mon.keyring
把配置信息拷贝到各节点
ceph-deploy admin $node1 $node2 $node3
配置 osd
for node in node{1..3};do ceph-deploy disk zap $node /dev/vdc;donefor node in node{1..3};do ceph-deploy osd create $node --data /dev/vdc;done
部署 mgr
ceph-deploy mgr create node{1..3}
开启 dashboard 模块,用于UI查看
sudo ceph mgr module enable dashboardcurl http://localhost:7000
创建 ceph 块客户端用户名和认证密钥
sudo ceph auth get-or-create client.rbd mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd'|tee ./ceph.client.rbd.keyring
- 把密钥文件拷贝到客户端
for node in node{1..3};do scp ceph.client.rbd.keyring /etc/ceph/ceph.conf $node:/etc/ceph/;done
创建pool
通常在创建pool之前,需要覆盖默认的pg_num,官方推荐:
- 若少于5个OSD, 设置pg_num为128。
- 5~10个OSD,设置pg_num为512。
- 10~50个OSD,设置pg_num为4096。
- 超过50个OSD,可以参考pgcalc计算。
PG和PGP数量一定要根据OSD的数量进行调整,计算公式如下,但是最后算出的结果一定要接近或者等于一个2的指数。
Total PGs = (Total_number_of_OSD * 100) / max_replication_count
修改ceph.conf文件
[ceph-admin@v31 my-cluster]$ cat ceph.conf[global]fsid = 61b3125d-1a74-4901-997e-2cb4625367abmon_initial_members = v31, v32, v33mon_host = 192.168.4.31,192.168.4.32,192.168.4.33auth_cluster_required = cephxauth_service_required = cephxauth_client_required = cephxosd pool default pg num = 1024osd pool default pgp num = 1024[ceph-admin@v31 my-cluster]$ ceph-deploy --overwrite-conf config push v31 v32 v33[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy --overwrite-conf config push v31 v32 v33[ceph_deploy.cli][INFO ] ceph-deploy options:[ceph_deploy.cli][INFO ] username : None[ceph_deploy.cli][INFO ] verbose : False[ceph_deploy.cli][INFO ] overwrite_conf : True[ceph_deploy.cli][INFO ] subcommand : push[ceph_deploy.cli][INFO ] quiet : False[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f89fc4c9128>[ceph_deploy.cli][INFO ] cluster : ceph[ceph_deploy.cli][INFO ] client : ['v31', 'v32', 'v33'][ceph_deploy.cli][INFO ] func : <function config at 0x7f89fc6f7c08>[ceph_deploy.cli][INFO ] ceph_conf : None[ceph_deploy.cli][INFO ] default_release : False[ceph_deploy.config][DEBUG ] Pushing config to v31[v31][DEBUG ] connection detected need for sudo[v31][DEBUG ] connected to host: v31[v31][DEBUG ] detect platform information from remote host[v31][DEBUG ] detect machine type[v31][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf[ceph_deploy.config][DEBUG ] Pushing config to v32[v32][DEBUG ] connection detected need for sudo[v32][DEBUG ] connected to host: v32[v32][DEBUG ] detect platform information from remote host[v32][DEBUG ] detect machine type[v32][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf[ceph_deploy.config][DEBUG ] Pushing config to v33[v33][DEBUG ] connection detected need for sudo[v33][DEBUG ] connected to host: v33[v33][DEBUG ] detect platform information from remote host[v33][DEBUG ] detect machine type[v33][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
- 请不要直接修改某个节点的
/etc/ceph/ceph.conf文件,而是在部署机下修改ceph.conf, 采用推送的方式更加方便安全,修改完成之后,使用下面的命名将conf文件推送到各个节点上:ceph-deploy --overwrite-conf config push v31 v32 v33此时需要修改各个节点的monitor服务:systemctl restart ceph-mon@{hostname}.service
例如15个OSD,副本数为3的情况下,根据公式计算的结果应该为500,最接近512,所以需要设定该pool(volumes)的pg_num和pgp_num都为512.
ceph osd pool set volumes pg_num 1024ceph osd pool set volumes pgp_num 1024
ceph的pool有两种类型,一种是副本池,一种是ec池,创建时也有所区别
创建副本池
ceph osd pool create testpool 128 128 pool 'testpool' created
创建ec池
设置profile
[root@v31 ~]# ceph osd erasure-code-profile set EC-profile k=3 m=1 ruleset-failure-domain=osd[root@v31 ~]# ceph osd erasure-code-profile get EC-profilecrush-device-class=crush-failure-domain=osdcrush-root=defaultjerasure-per-chunk-alignment=falsek=3m=1plugin=jerasuretechnique=reed_sol_vanw=8
创建pool
[root@v31 ~]# ceph osd pool create ecpool 1024 1024 erasure EC-profileFor better initial performance on pools expected to store a large number of objects, consider supplying the expected_num_objects parameter when creating the pool.[root@v31 ~]# ceph dfGLOBAL:SIZE AVAIL RAW USED %RAW USED8.17TiB 8.13TiB 36.3GiB 0.43POOLS:NAME ID USED %USED MAX AVAIL OBJECTSkube 14 1.55GiB 0.06 2.57TiB 612ecpool 20 0B 0 5.79TiB 0
$ sudo ceph osd pool create pool-name pg_num pgp_num erasure
如:
$ ceph osd pool create ecpool 12 12 erasurepool 'ecpool' created
创建mds ceph文件系统
创建 mds 服务
使用 cephFS 集群中必须有 mds 服务
[ceph-admin@v31 my-cluster]$ ceph-deploy mds create v33[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy mds create v33[ceph_deploy.cli][INFO ] ceph-deploy options:[ceph_deploy.cli][INFO ] username : None[ceph_deploy.cli][INFO ] verbose : False[ceph_deploy.cli][INFO ] overwrite_conf : False[ceph_deploy.cli][INFO ] subcommand : create[ceph_deploy.cli][INFO ] quiet : False[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fbc3c5e05f0>[ceph_deploy.cli][INFO ] cluster : ceph[ceph_deploy.cli][INFO ] func : <function mds at 0x7fbc3c82eed8>[ceph_deploy.cli][INFO ] ceph_conf : None[ceph_deploy.cli][INFO ] mds : [('v33', 'v33')][ceph_deploy.cli][INFO ] default_release : False[ceph_deploy.mds][DEBUG ] Deploying mds, cluster ceph hosts v33:v33[v33][DEBUG ] connection detected need for sudo[v33][DEBUG ] connected to host: v33[v33][DEBUG ] detect platform information from remote host[v33][DEBUG ] detect machine type[ceph_deploy.mds][INFO ] Distro info: CentOS Linux 7.6.1810 Core[ceph_deploy.mds][DEBUG ] remote host will use systemd[ceph_deploy.mds][DEBUG ] deploying mds bootstrap to v33[v33][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf[v33][WARNIN] mds keyring does not exist yet, creating one[v33][DEBUG ] create a keyring file[v33][DEBUG ] create path if it doesn't exist[v33][INFO ] Running command: sudo ceph --cluster ceph --name client.bootstrap-mds --keyring /var/lib/ceph/bootstrap-mds/ceph.keyring auth get-or-create mds.v33 osd allow rwx mds allow mon allow profile mds -o /var/lib/ceph/mds/ceph-v33/keyring[v33][INFO ] Running command: sudo systemctl enable ceph-mds@v33[v33][WARNIN] Created symlink from /etc/systemd/system/ceph-mds.target.wants/ceph-mds@v33.service to /usr/lib/systemd/system/ceph-mds@.service.[v33][INFO ] Running command: sudo systemctl start ceph-mds@v33[v33][INFO ] Running command: sudo systemctl enable ceph.target
创建pool
[ceph-admin@v31 my-cluster]$ ceph osd pool create cluster_data_metadata 1024 1024For better initial performance on pools expected to store a large number of objects, consider supplying the expected_num_objects parameter when creating the pool.[ceph-admin@v31 my-cluster]$ ceph osd pool create cluster_data 1024 1024For better initial performance on pools expected to store a large number of objects, consider supplying the expected_num_objects parameter when creating the pool.[ceph-admin@v31 my-cluster]$ ceph fs new cephfs cluster_data_metadata cluster_datanew fs with metadata pool 11 and data pool 12[ceph-admin@v31 my-cluster]$ ceph dfGLOBAL:SIZE AVAIL RAW USED %RAW USED8.17TiB 8.14TiB 30.7GiB 0.37POOLS:NAME ID USED %USED MAX AVAIL OBJECTScluster_data_metadata 11 0B 0 2.58TiB 0cluster_data 12 0B 0 2.58TiB 0[ceph-admin@v31 my-cluster]$ ceph mds statcephfs-0/0/1 up[ceph-admin@v31 my-cluster]$ ceph osd pool lscluster_data_metadatacluster_data[ceph-admin@v31 my-cluster]$ ceph fs lsname: cephfs, metadata pool: cluster_data_metadata, data pools: [cluster_data ]
[ceph-admin@v31 ~]$ ceph osd pool create cluster_data_metadata 1024 1024 replicated_rule 1pool 'cluster_data_metadata' created[ceph-admin@v31 ~]$ ceph osd pool create cluster_data_data 1024 1024 replicated_rule 1Error ERANGE: pg_num 1024 size 3 would mean 9216 total pgs, which exceeds max 7500 (mon_max_pg_per_osd 250 * num_in_osds 30)[ceph-admin@v31 ~]$ ceph dfGLOBAL:SIZE AVAIL RAW USED %RAW USED8.17TiB 8.13TiB 36.4GiB 0.43POOLS:NAME ID USED %USED MAX AVAIL OBJECTSkube 14 1.57GiB 0.06 2.57TiB 614cluster_data_metadata 21 0B 0 2.57TiB 0[ceph-admin@v31 ~]$ ceph osd pool create cluster_data_data 1024 replicated_rule 1Error ERANGE: pg_num 1024 size 3 would mean 9216 total pgs, which exceeds max 7500 (mon_max_pg_per_osd 250 * num_in_osds 30)[ceph-admin@v31 ~]$ ceph osd pool create cluster_data_data 100 replicated_rule 1pool 'cluster_data_data' created[ceph-admin@v31 ~]$ ceph dfGLOBAL:SIZE AVAIL RAW USED %RAW USED8.17TiB 8.13TiB 36.4GiB 0.43POOLS:NAME ID USED %USED MAX AVAIL OBJECTSkube 14 1.57GiB 0.06 2.57TiB 614cluster_data_metadata 21 0B 0 2.57TiB 0cluster_data_data 22 0B 0 2.57TiB 0
创建osd存存储池
ceph osd pool create rbd 50ceph osd pool create kube 50# 开启监控ceph osd pool application enable kube monceph osd pool application enable rbd mon
创建用户(可选)
ceph auth get-or-create client.cephfs mon 'allow r' mds 'allow r, allow rw path=/' osd 'allow rw pool=cephfs_data' -o ceph.client.cephfs.keyringscp ceph.client.cephfs.keyring <node>:/etc/ceph/
对应ceph服务器上获取client-key
ceph auth get-key client.cephfs
这里可以直接使用admin账户的keyring
cat ceph.client.admin.keyring[client.admin]key = AQBcs4hcKPiaChAAk12oiD79FpIjeGo1PmXFXw==
通过内核驱动挂载 Ceph FS
安装 ceph-fuse
yum install ceph-fuse -y
确认kernel 加载 ceph 模块
lsmod | grep cephceph 358802 0libceph 306625 1 cephdns_resolver 13140 2 nfsv4,libcephlibcrc32c 12644 4 ip_vs,libceph,nf_nat,nf_conntrack
创建挂载目录
mkdir -p /data
挂载
[ceph-admin@v31 my-cluster]$ sudo mount -t ceph v31:6789:/ /data -o name=admin,secret=AQBcs4hcKPiaChAAk12oiD79FpIjeGo1PmXFXw==[ceph-admin@v31 my-cluster]$ df -Th |grep ceph192.168.4.31:6789:/ ceph 2.6T 0 2.6T 0% /data
写入/etc/fstab
[ceph-admin@v31 my-cluster]$ cd /etc/ceph/[ceph-admin@v31 ceph]$ cp ceph.client.admin.keyring cephfs.key[ceph-admin@v31 ceph]$ vim cephfs.keyAQBcs4hcKPiaChAAk12oiD79FpIjeGo1PmXFXw==echo "v31:6789:/ /data ceph name=admin,secretfile=/etc/ceph/cephfs.key,noatime,_netdev 0 0 " >>/etc/fstab
CephFS性能测试
fio
随机读测试
[root@v31 ~]# fio -filename=/mnt/data/test1 -direct=1 -iodepth 1 -thread -rw=randread -ioengine=psync -bs=16k -size=10G -numjobs=10 -runtime=1000 -group_reporting -name=mytestmytest: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1...fio-3.1Starting 10 threadsmytest: Laying out IO file (1 file / 10240MiB)Jobs: 8 (f=8): [r(1),_(1),r(2),_(1),r(5)][99.8%][r=160MiB/s,w=0KiB/s][r=10.2k,w=0 IOPS][eta 00m:02s]mytest: (groupid=0, jobs=10): err= 0: pid=3824106: Tue Mar 26 09:13:04 2019read: IOPS=7359, BW=115MiB/s (121MB/s)(100GiB/890546msec)clat (usec): min=155, max=215229, avg=1355.08, stdev=1870.48lat (usec): min=155, max=215229, avg=1355.40, stdev=1870.48clat percentiles (usec):| 1.00th=[ 200], 5.00th=[ 217], 10.00th=[ 231], 20.00th=[ 265],| 30.00th=[ 486], 40.00th=[ 578], 50.00th=[ 660], 60.00th=[ 799],| 70.00th=[ 1037], 80.00th=[ 1893], 90.00th=[ 3982], 95.00th=[ 5080],| 99.00th=[ 7701], 99.50th=[ 9110], 99.90th=[15664], 99.95th=[19530],| 99.99th=[28705]bw ( KiB/s): min= 3040, max=28610, per=10.01%, avg=11782.72, stdev=3610.33, samples=17792iops : min= 190, max= 1788, avg=736.38, stdev=225.63, samples=17792lat (usec) : 250=16.50%, 500=14.76%, 750=25.96%, 1000=11.70%lat (msec) : 2=11.60%, 4=9.52%, 10=9.62%, 20=0.30%, 50=0.04%lat (msec) : 100=0.01%, 250=0.01%cpu : usr=0.39%, sys=1.82%, ctx=6694389, majf=0, minf=5367IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%issued rwt: total=6553600,0,0, short=0,0,0, dropped=0,0,0latency : target=0, window=0, percentile=100.00%, depth=1Run status group 0 (all jobs):READ: bw=115MiB/s (121MB/s), 115MiB/s-115MiB/s (121MB/s-121MB/s), io=100GiB (107GB), run=890546-890546msec
顺序读测试
[root@v33 ~]# fio -filename=/mnt/data/test2 -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=16k -size=10G -numjobs=30 -runtime=1000 -group_reporting -name=mytestmytest: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1...fio-3.1Starting 30 threadsmytest: Laying out IO file (1 file / 10240MiB)Jobs: 30 (f=30): [R(30)][100.0%][r=138MiB/s,w=0KiB/s][r=8812,w=0 IOPS][eta 00m:00s]mytest: (groupid=0, jobs=30): err= 0: pid=411789: Tue Mar 26 09:33:03 2019read: IOPS=10.0k, BW=156MiB/s (164MB/s)(153GiB/1000005msec)clat (usec): min=141, max=38416, avg=2992.85, stdev=2478.50lat (usec): min=141, max=38416, avg=2993.14, stdev=2478.52clat percentiles (usec):| 1.00th=[ 161], 5.00th=[ 174], 10.00th=[ 188], 20.00th=[ 260],| 30.00th=[ 652], 40.00th=[ 1467], 50.00th=[ 2999], 60.00th=[ 3949],| 70.00th=[ 4490], 80.00th=[ 5342], 90.00th=[ 6325], 95.00th=[ 7111],| 99.00th=[ 8848], 99.50th=[ 9503], 99.90th=[10814], 99.95th=[11731],| 99.99th=[18482]bw ( KiB/s): min= 1472, max=47743, per=3.34%, avg=5349.53, stdev=4848.75, samples=60000iops : min= 92, max= 2983, avg=334.11, stdev=303.03, samples=60000lat (usec) : 250=19.25%, 500=7.26%, 750=5.25%, 1000=3.04%lat (msec) : 2=9.00%, 4=17.07%, 10=38.87%, 20=0.26%, 50=0.01%cpu : usr=0.17%, sys=1.04%, ctx=14529895, majf=0, minf=3600IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%issued rwt: total=10015571,0,0, short=0,0,0, dropped=0,0,0latency : target=0, window=0, percentile=100.00%, depth=1Run status group 0 (all jobs):READ: bw=156MiB/s (164MB/s), 156MiB/s-156MiB/s (164MB/s-164MB/s), io=153GiB (164GB), run=1000005-1000005msec
随机写测试
[root@v31 ~]# fio -filename=/mnt/data/test3 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=4k -size=10G -numjobs=30 -runtime=1000 -group_reporting -name=mytest_4k_10G_randwritemytest_4k_10G_randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1...fio-3.1Starting 30 threadsmytest_4k_10G_randwrite: Laying out IO file (1 file / 10240MiB)Jobs: 30 (f=30): [w(30)][100.0%][r=0KiB/s,w=11.8MiB/s][r=0,w=3009 IOPS][eta 00m:00s]mytest_4k_10G_randwrite: (groupid=0, jobs=30): err= 0: pid=3852817: Tue Mar 26 09:59:25 2019write: IOPS=3107, BW=12.1MiB/s (12.7MB/s)(11.9GiB/1000067msec)clat (usec): min=922, max=230751, avg=9651.32, stdev=16589.93lat (usec): min=923, max=230751, avg=9651.74, stdev=16589.93clat percentiles (usec):| 1.00th=[ 1188], 5.00th=[ 1319], 10.00th=[ 1418], 20.00th=[ 1565],| 30.00th=[ 1745], 40.00th=[ 1991], 50.00th=[ 2343], 60.00th=[ 3097],| 70.00th=[ 6325], 80.00th=[ 11994], 90.00th=[ 30278], 95.00th=[ 46924],| 99.00th=[ 79168], 99.50th=[ 91751], 99.90th=[121111], 99.95th=[130548],| 99.99th=[158335]bw ( KiB/s): min= 112, max= 1162, per=3.34%, avg=414.50, stdev=92.93, samples=60000iops : min= 28, max= 290, avg=103.60, stdev=23.22, samples=60000lat (usec) : 1000=0.01%lat (msec) : 2=40.50%, 4=23.80%, 10=13.13%, 20=7.97%, 50=10.27%lat (msec) : 100=4.00%, 250=0.32%cpu : usr=0.06%, sys=0.30%, ctx=3110768, majf=0, minf=141484IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%issued rwt: total=0,3107281,0, short=0,0,0, dropped=0,0,0latency : target=0, window=0, percentile=100.00%, depth=1Run status group 0 (all jobs):WRITE: bw=12.1MiB/s (12.7MB/s), 12.1MiB/s-12.1MiB/s (12.7MB/s-12.7MB/s), io=11.9GiB (12.7GB), run=1000067-1000067msec
顺序写测试
[root@v33 ~]# fio -filename=/mnt/data/test4 -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=16k -size=10G -numjobs=30 -runtime=1000 -group_reporting -name=mytestmytest: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1...fio-3.1Starting 30 threadsmytest: Laying out IO file (1 file / 10240MiB)Jobs: 30 (f=30): [W(30)][100.0%][r=0KiB/s,w=50.3MiB/s][r=0,w=3219 IOPS][eta 00m:00s]mytest: (groupid=0, jobs=30): err= 0: pid=454215: Tue Mar 26 10:19:27 2019write: IOPS=3322, BW=51.9MiB/s (54.4MB/s)(50.7GiB/1000007msec)clat (usec): min=1130, max=121544, avg=9026.88, stdev=2132.29lat (usec): min=1131, max=121545, avg=9027.49, stdev=2132.30clat percentiles (usec):| 1.00th=[ 4047], 5.00th=[ 6325], 10.00th=[ 7308], 20.00th=[ 7963],| 30.00th=[ 8291], 40.00th=[ 8586], 50.00th=[ 8848], 60.00th=[ 9110],| 70.00th=[ 9503], 80.00th=[10028], 90.00th=[10814], 95.00th=[11600],| 99.00th=[17171], 99.50th=[20317], 99.90th=[25035], 99.95th=[26608],| 99.99th=[44303]bw ( KiB/s): min= 896, max= 3712, per=3.34%, avg=1772.81, stdev=213.20, samples=60000iops : min= 56, max= 232, avg=110.76, stdev=13.32, samples=60000lat (msec) : 2=0.08%, 4=0.88%, 10=79.28%, 20=19.23%, 50=0.53%lat (msec) : 100=0.01%, 250=0.01%cpu : usr=0.06%, sys=0.55%, ctx=3581559, majf=0, minf=4243IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%issued rwt: total=0,3322270,0, short=0,0,0, dropped=0,0,0latency : target=0, window=0, percentile=100.00%, depth=1Run status group 0 (all jobs):WRITE: bw=51.9MiB/s (54.4MB/s), 51.9MiB/s-51.9MiB/s (54.4MB/s-54.4MB/s), io=50.7GiB (54.4GB), run=1000007-1000007msec
混合随机读写
fio -filename=/data/test5 -direct=1 -iodepth 1 -thread -rw=randrw -rwmixread=70 -ioengine=psync -bs=16k -size=10G -numjobs=30 -runtime=100 -group_reporting -name=mytest -ioscheduler=noop
同步i/o(顺序写)测试
[root@v31 data]# fio -filename=/mnt/data/test6 -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=4k -size=50G -numjobs=10 -runtime=1000 -group_reporting -name=mytestmytest: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1...fio-3.1Starting 10 threadsmytest: Laying out IO file (1 file / 51200MiB)Jobs: 10 (f=10): [W(10)][100.0%][r=0KiB/s,w=25.6MiB/s][r=0,w=6549 IOPS][eta 00m:00s]mytest: (groupid=0, jobs=10): err= 0: pid=3883680: Tue Mar 26 10:48:08 2019write: IOPS=6180, BW=24.1MiB/s (25.3MB/s)(23.6GiB/1000001msec)clat (usec): min=825, max=176948, avg=1615.44, stdev=989.83lat (usec): min=826, max=176949, avg=1615.81, stdev=989.83clat percentiles (usec):| 1.00th=[ 1020], 5.00th=[ 1106], 10.00th=[ 1188], 20.00th=[ 1303],| 30.00th=[ 1369], 40.00th=[ 1434], 50.00th=[ 1500], 60.00th=[ 1565],| 70.00th=[ 1647], 80.00th=[ 1778], 90.00th=[ 2024], 95.00th=[ 2245],| 99.00th=[ 2933], 99.50th=[ 4817], 99.90th=[18744], 99.95th=[19268],| 99.99th=[21890]bw ( KiB/s): min= 1280, max= 3920, per=10.00%, avg=2473.24, stdev=365.21, samples=19998iops : min= 320, max= 980, avg=618.27, stdev=91.30, samples=19998lat (usec) : 1000=0.63%lat (msec) : 2=88.90%, 4=9.90%, 10=0.26%, 20=0.30%, 50=0.01%lat (msec) : 100=0.01%, 250=0.01%cpu : usr=0.27%, sys=1.59%, ctx=6286666, majf=0, minf=1148IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%issued rwt: total=0,6180315,0, short=0,0,0, dropped=0,0,0latency : target=0, window=0, percentile=100.00%, depth=1Run status group 0 (all jobs):WRITE: bw=24.1MiB/s (25.3MB/s), 24.1MiB/s-24.1MiB/s (25.3MB/s-25.3MB/s), io=23.6GiB (25.3GB), run=1000001-1000001msec
异步i/o(顺序写)测试
fio -filename=/data/test7 -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=4k -size=50G -numjobs=10 -runtime=1000 -group_reporting -name=mytest
磁盘性能测试
为了对比Ceph文件的性能,此处做了一个单块磁盘的性能测试,为了确保测试的真实性,单块磁盘就选择为一个OSD对应的磁盘。
随机读测试-单块硬盘
fio -filename=/var/lib/ceph/osd/ceph-4/disktest/dlw1 -direct=1 -iodepth 1 -thread -rw=randread -ioengine=psync -bs=16k -size=10G -numjobs=10 -runtime=1000 -group_reporting -name=mytest
rados性能测试
4M写入测试
rados bench -p cluster_data_data 60 write -t 32 --no-cleanupTotal time run: 60.717291Total writes made: 2238Write size: 4194304Object size: 4194304Bandwidth (MB/sec): 147.437Stddev Bandwidth: 20.1603Max bandwidth (MB/sec): 168Min bandwidth (MB/sec): 48Average IOPS: 36Stddev IOPS: 5Max IOPS: 42Min IOPS: 12Average Latency(s): 0.865663Stddev Latency(s): 0.40126Max latency(s): 3.58639Min latency(s): 0.185036
4k写入测试
rados bench -p cluster_data_data 60 write -t 32 -b 4096 --no-cleanupTotal time run: 60.035923Total writes made: 201042Write size: 4096Object size: 4096Bandwidth (MB/sec): 13.0808Stddev Bandwidth: 1.10742Max bandwidth (MB/sec): 17.1133Min bandwidth (MB/sec): 9.71875Average IOPS: 3348Stddev IOPS: 283Max IOPS: 4381Min IOPS: 2488Average Latency(s): 0.00955468Stddev Latency(s): 0.0164307Max latency(s): 0.335681Min latency(s): 0.00105769
4M顺序读
rados bench -p cluster_data_data 60 seq -t 32 --no-cleanupTotal time run: 22.129977Total reads made: 201042Read size: 4096Object size: 4096Bandwidth (MB/sec): 35.4867Average IOPS: 9084Stddev IOPS: 1278Max IOPS: 14011Min IOPS: 7578Average Latency(s): 0.0035112Max latency(s): 0.181241Min latency(s): 0.000287577
删除CephFS
[root@v32 ~]# ceph dfGLOBAL:SIZE AVAIL RAW USED %RAW USED8.17TiB 7.91TiB 262GiB 3.13POOLS:NAME ID USED %USED MAX AVAIL OBJECTScluster_data_metadata 11 231MiB 0 2.49TiB 51638cluster_data 12 65.2GiB 2.49 2.49TiB 317473[root@v32 ~]# ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-itError EBUSY: pool 'cluster_data_metadata' is in use by CephFS[root@v32 ~]# ceph fs lsname: cephfs, metadata pool: cluster_data_metadata, data pools: [cluster_data ][root@v32 ~]# ceph fs rm cephfs --yes-i-really-mean-itError EINVAL: all MDS daemons must be inactive before removing filesystem[root@v33 ~]# systemctl stop ceph-mds@v33.service[root@v33 ~]# systemctl desable ceph-mds@v33.serviceUnknown operation 'desable'.[root@v33 ~]# systemctl disable ceph-mds@v33.serviceRemoved symlink /etc/systemd/system/ceph-mds.target.wants/ceph-mds@v33.service.[root@v32 ~]# ceph fs rm cephfs --yes-i-really-mean-it[root@v32 ~]# ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-itError EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool[root@v32 ~]# cat /etc/ceph/ceph.conf[global]...[mon]mon allow pool delete = true[root@v32 ~]# ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-itpool 'cluster_data_metadata' removed[root@v32 ~]# ceph osd pool delete cluster_data cluster_data --yes-i-really-really-mean-itpool 'cluster_data' removed
CRUSH map
1、提取已有的CRUSH map ,使用-o参数,ceph将输出一个经过编译的CRUSH map 到您指定的文件` ceph osd getcrushmap -o crushmap.txt`2、反编译你的CRUSH map ,使用-d参数将反编译CRUSH map 到通过-o 指定的文件中`crushtool -d crushmap.txt -o crushmap-decompile`3、使用编辑器编辑CRUSH map`vi crushmap-decompile`4、重新编译这个新的CRUSH map`crushtool -c crushmap-decompile -o crushmap-compiled`5、将新的CRUSH map 应用到ceph 集群中`ceph osd setcrushmap -i crushmap-compiled`
参考https://blog.csdn.net/heivy/article/details/50592244
查看pool
列出所有的poll
[ceph-admin@v31 my-cluster]$ ceph dfGLOBAL:SIZE AVAIL RAW USED %RAW USED8.17TiB 8.14TiB 30.5GiB 0.37POOLS:NAME ID USED %USED MAX AVAIL OBJECTScluster_data_metadata 2 0B 0 2.58TiB 0[ceph-admin@v31 my-cluster]$ rados lspoolscluster_data_metadata
删除cluster_data_metadata pool
查看pool的详细配置信息
[ceph-admin@v31 my-cluster]$ ceph osd pool ls detailpool 2 'cluster_data_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 146 flags hashpspool stripe_width 0
[ceph-admin@v31 my-cluster]$ ceph osd dump|grep poolpool 2 'cluster_data_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 146 flags hashpspool stripe_width 0
查看每个pool的空间使用及IO情况
[root@v32 ~]# rados dfPOOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WRkube 36B 4 0 12 0 0 0 5538 34.1MiB 142769 10.4GiBtotal_objects 4total_used 31.8GiBtotal_avail 8.14TiBtotal_space 8.17TiB
获取pool参数
查看osd分布
ceph osd treeID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF-1 8.16879 root default-3 2.72293 host v310 hdd 0.27229 osd.0 up 1.00000 1.000001 hdd 0.27229 osd.1 up 1.00000 1.000002 hdd 0.27229 osd.2 up 1.00000 1.000003 hdd 0.27229 osd.3 up 1.00000 1.000004 hdd 0.27229 osd.4 up 1.00000 1.000005 hdd 0.27229 osd.5 up 1.00000 1.000006 hdd 0.27229 osd.6 up 1.00000 1.000007 hdd 0.27229 osd.7 up 1.00000 1.0000024 hdd 0.27229 osd.24 up 1.00000 1.0000025 hdd 0.27229 osd.25 up 1.00000 1.00000-5 2.72293 host v328 hdd 0.27229 osd.8 up 1.00000 1.000009 hdd 0.27229 osd.9 up 1.00000 1.0000010 hdd 0.27229 osd.10 up 1.00000 1.0000011 hdd 0.27229 osd.11 up 1.00000 1.0000012 hdd 0.27229 osd.12 up 1.00000 1.0000013 hdd 0.27229 osd.13 up 1.00000 1.0000014 hdd 0.27229 osd.14 up 1.00000 1.0000015 hdd 0.27229 osd.15 up 1.00000 1.0000027 hdd 0.27229 osd.27 up 1.00000 1.0000029 hdd 0.27229 osd.29 up 1.00000 1.00000-7 2.72293 host v3316 hdd 0.27229 osd.16 up 1.00000 1.0000017 hdd 0.27229 osd.17 up 1.00000 1.0000018 hdd 0.27229 osd.18 up 1.00000 1.0000019 hdd 0.27229 osd.19 up 1.00000 1.0000020 hdd 0.27229 osd.20 up 1.00000 1.0000021 hdd 0.27229 osd.21 up 1.00000 1.0000022 hdd 0.27229 osd.22 up 1.00000 1.0000023 hdd 0.27229 osd.23 up 1.00000 1.0000026 hdd 0.27229 osd.26 up 1.00000 1.0000028 hdd 0.27229 osd.28 up 1.00000 1.00000
删除poll
sudo ceph osd pool delete {pool-name} {pool-name} --yes-i-really-really-mean-it
sudo ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-it
如果删除pool时提示error请参考: 删除pool error的解决方法
集群添加ODS
[ceph-admin@v31 my-cluster]$ ceph -scluster:id: ffdda80f-a48a-431a-a71b-525e5f1965d9health: HEALTH_OKservices:mon: 3 daemons, quorum v31,v32,v33mgr: v31(active), standbys: v32, v33osd: 24 osds: 24 up, 24 indata:pools: 0 pools, 0 pgsobjects: 0 objects, 0Busage: 24.3GiB used, 6.51TiB / 6.54TiB availpgs:
- 补充知识:osd状态
up:守护进程运行中,能够提供IO服务;down:守护进程不在运行,无法提供IO服务;in:包含数据;out:不包含数据
列出所有磁盘
[root@v33 ~]# sudo ceph-disk list/dev/dm-0 other, ext4, mounted on //dev/dm-1 other, swap/dev/dm-2 other, unknown/dev/dm-3 other, unknown/dev/dm-4 other, unknown/dev/dm-5 other, unknown/dev/dm-6 other, unknown/dev/dm-7 other, unknown/dev/dm-8 other, unknown/dev/dm-9 other, unknown/dev/sda :/dev/sda1 other, vfat, mounted on /boot/efi/dev/sda2 other, xfs, mounted on /boot/dev/sda3 other, LVM2_member/dev/sdb other, unknown/dev/sdc other, unknown/dev/sdd other, LVM2_member/dev/sde other, LVM2_member/dev/sdf other, LVM2_member/dev/sdg other, LVM2_member/dev/sdh other, LVM2_member/dev/sdi other, LVM2_member/dev/sdj other, LVM2_member/dev/sdk other, LVM2_member
添加时报错
[ceph-admin@v31 my-cluster]$ ceph-deploy osd create v31 --data /dev/sdb[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy osd create v31 --data /dev/sdb[ceph_deploy.cli][INFO ] ceph-deploy options:[ceph_deploy.cli][INFO ] verbose : False[ceph_deploy.cli][INFO ] bluestore : None[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fe69d002830>[ceph_deploy.cli][INFO ] cluster : ceph[ceph_deploy.cli][INFO ] fs_type : xfs[ceph_deploy.cli][INFO ] block_wal : None[ceph_deploy.cli][INFO ] default_release : False[ceph_deploy.cli][INFO ] username : None[ceph_deploy.cli][INFO ] journal : None[ceph_deploy.cli][INFO ] subcommand : create[ceph_deploy.cli][INFO ] host : v31[ceph_deploy.cli][INFO ] filestore : None[ceph_deploy.cli][INFO ] func : <function osd at 0x7fe69d2478c0>[ceph_deploy.cli][INFO ] ceph_conf : None[ceph_deploy.cli][INFO ] zap_disk : False[ceph_deploy.cli][INFO ] data : /dev/sdb[ceph_deploy.cli][INFO ] block_db : None[ceph_deploy.cli][INFO ] dmcrypt : False[ceph_deploy.cli][INFO ] overwrite_conf : False[ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys[ceph_deploy.cli][INFO ] quiet : False[ceph_deploy.cli][INFO ] debug : False[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sdb[v31][DEBUG ] connection detected need for sudo[v31][DEBUG ] connected to host: v31[v31][DEBUG ] detect platform information from remote host[v31][DEBUG ] detect machine type[v31][DEBUG ] find the location of an executable[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.6.1810 Core[ceph_deploy.osd][DEBUG ] Deploying osd to v31[v31][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf[ceph_deploy.osd][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs[ceph-admin@v31 my-cluster]$ ceph-deploy --overwrite-conf osd create v31 --data /dev/sdb[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy --overwrite-conf osd create v31 --data /dev/sdb[ceph_deploy.cli][INFO ] ceph-deploy options:[ceph_deploy.cli][INFO ] verbose : False[ceph_deploy.cli][INFO ] bluestore : None[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7ff6abd72830>[ceph_deploy.cli][INFO ] cluster : ceph[ceph_deploy.cli][INFO ] fs_type : xfs[ceph_deploy.cli][INFO ] block_wal : None[ceph_deploy.cli][INFO ] default_release : False[ceph_deploy.cli][INFO ] username : None[ceph_deploy.cli][INFO ] journal : None[ceph_deploy.cli][INFO ] subcommand : create[ceph_deploy.cli][INFO ] host : v31[ceph_deploy.cli][INFO ] filestore : None[ceph_deploy.cli][INFO ] func : <function osd at 0x7ff6abfb78c0>[ceph_deploy.cli][INFO ] ceph_conf : None[ceph_deploy.cli][INFO ] zap_disk : False[ceph_deploy.cli][INFO ] data : /dev/sdb[ceph_deploy.cli][INFO ] block_db : None[ceph_deploy.cli][INFO ] dmcrypt : False[ceph_deploy.cli][INFO ] overwrite_conf : True[ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys[ceph_deploy.cli][INFO ] quiet : False[ceph_deploy.cli][INFO ] debug : False[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sdb[v31][DEBUG ] connection detected need for sudo[v31][DEBUG ] connected to host: v31[v31][DEBUG ] detect platform information from remote host[v31][DEBUG ] detect machine type[v31][DEBUG ] find the location of an executable[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.6.1810 Core[ceph_deploy.osd][DEBUG ] Deploying osd to v31[v31][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf[v31][DEBUG ] find the location of an executable[v31][INFO ] Running command: sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb[v31][WARNIN] usage: ceph-volume lvm create [-h] --data DATA [--filestore][v31][WARNIN] [--journal JOURNAL] [--bluestore][v31][WARNIN] [--block.db BLOCK_DB] [--block.wal BLOCK_WAL][v31][WARNIN] [--osd-id OSD_ID] [--osd-fsid OSD_FSID][v31][WARNIN] [--cluster-fsid CLUSTER_FSID][v31][WARNIN] [--crush-device-class CRUSH_DEVICE_CLASS][v31][WARNIN] [--dmcrypt] [--no-systemd][v31][WARNIN] ceph-volume lvm create: error: GPT headers found, they must be removed on: /dev/sdb[v31][ERROR ] RuntimeError: command returned non-zero exit status: 2[ceph_deploy.osd][ERROR ] Failed to execute command: /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs[ceph-admin@v31 my-cluster]$ /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb--> Falling back to /tmp/ for logging. Can't use /var/log/ceph/ceph-volume.log--> [Errno 13] Permission denied: '/var/log/ceph/ceph-volume.log'stderr: error: /dev/sdb: Permission denied--> SuperUserError: This command needs to be executed with sudo or as root[ceph-admin@v31 my-cluster]$ sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdbusage: ceph-volume lvm create [-h] --data DATA [--filestore][--journal JOURNAL] [--bluestore][--block.db BLOCK_DB] [--block.wal BLOCK_WAL][--osd-id OSD_ID] [--osd-fsid OSD_FSID][--cluster-fsid CLUSTER_FSID][--crush-device-class CRUSH_DEVICE_CLASS][--dmcrypt] [--no-systemd]ceph-volume lvm create: error: GPT headers found, they must be removed on: /dev/sdb
转换为mbr
[ceph-admin@v31 my-cluster]$ sudo parted -s /dev/sdb mklabel msdos
再次格式化
[root@v32 ~]# /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdcRunning command: /bin/ceph-authtool --gen-print-keyRunning command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new d47d7861-9a83-4879-847d-693e3aa794b6Running command: vgcreate --force --yes ceph-fad7bf25-dd60-4eff-a932-970c376af00b /dev/sdcstdout: Wiping dos signature on /dev/sdc.stdout: Physical volume "/dev/sdc" successfully created.stdout: Volume group "ceph-fad7bf25-dd60-4eff-a932-970c376af00b" successfully createdRunning command: lvcreate --yes -l 100%FREE -n osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 ceph-fad7bf25-dd60-4eff-a932-970c376af00bstdout: Logical volume "osd-block-d47d7861-9a83-4879-847d-693e3aa794b6" created.Running command: /bin/ceph-authtool --gen-print-keyRunning command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-27Running command: restorecon /var/lib/ceph/osd/ceph-27Running command: chown -h ceph:ceph /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6Running command: chown -R ceph:ceph /dev/dm-10Running command: ln -s /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 /var/lib/ceph/osd/ceph-27/blockRunning command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-27/activate.monmapstderr: got monmap epoch 2Running command: ceph-authtool /var/lib/ceph/osd/ceph-27/keyring --create-keyring --name osd.27 --add-key AQB4yohcMW8eLhAAkdQmhIavIcF+FcPjkKooSQ==stdout: creating /var/lib/ceph/osd/ceph-27/keyringstdout: added entity osd.27 auth auth(auid = 18446744073709551615 key=AQB4yohcMW8eLhAAkdQmhIavIcF+FcPjkKooSQ== with 0 caps)Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27/keyringRunning command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27/Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 27 --monmap /var/lib/ceph/osd/ceph-27/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-27/ --osd-uuid d47d7861-9a83-4879-847d-693e3aa794b6 --setuser ceph --setgroup ceph--> ceph-volume lvm prepare successful for: /dev/sdcRunning command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 --path /var/lib/ceph/osd/ceph-27Running command: ln -snf /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 /var/lib/ceph/osd/ceph-27/blockRunning command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-27/blockRunning command: chown -R ceph:ceph /dev/dm-10Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27Running command: systemctl enable ceph-volume@lvm-27-d47d7861-9a83-4879-847d-693e3aa794b6stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-27-d47d7861-9a83-4879-847d-693e3aa794b6.service to /usr/lib/systemd/system/ceph-volume@.service.Running command: systemctl enable --runtime ceph-osd@27stderr: Created symlink from /run/systemd/system/ceph-osd.target.wants/ceph-osd@27.service to /usr/lib/systemd/system/ceph-osd@.service.Running command: systemctl start ceph-osd@27--> ceph-volume lvm activate successful for osd ID: 27--> ceph-volume lvm create successful for: /dev/sdc[root@v32 ~]# /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sddRunning command: /bin/ceph-authtool --gen-print-keyRunning command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new fade79d7-8bee-49c6-85f8-d6c141e6bd4eRunning command: vgcreate --force --yes ceph-fc851010-f2c6-43f7-9c12-843d3a023a65 /dev/sddstderr: Physical volume '/dev/sdd' is already in volume group 'ceph-00af8489-3599-427d-bae6-de1e61c4c38a'Unable to add physical volume '/dev/sdd' to volume group 'ceph-00af8489-3599-427d-bae6-de1e61c4c38a'/dev/sdd: physical volume not initialized.--> Was unable to complete a new OSD, will rollback changes--> OSD will be fully purged from the cluster, because the ID was generatedRunning command: ceph osd purge osd.29 --yes-i-really-mean-itstderr: purged osd.29--> RuntimeError: command returned non-zero exit status: 5ceph-deploy --overwrite-conf osd create v31 --data /dev/sdb
rbd数据查看
[root@v32 ~]# rados ls -p kuberbd_id.kubernetes-dynamic-pvc-28e6ad1e-4675-11e9-8901-6602f4085af9rbd_directoryrbd_inforbd_header.149046b8b4567
删除rbd
[root@v32 ~]# rados -p kube rm rbd_id.kubernetes-dynamic-pvc-28e6ad1e-4675-11e9-8901-6602f4085af9 rbd_directory rbd_info rbd_header.149046b8b4567
故障解决
Kubenertes使用ceph集群存储
https://akomljen.com/using-existing-ceph-cluster-for-kubernetes-persistent-storage/
创建ceph kube 存储池kube 账户的权限
ceph --cluster ceph auth get-or-create client.kube mon 'allow r' osd 'allow rwx pool=kube'[client.kube]key = AQBHTIdcSqnbMhAATB4BJJXyA8K5NFQlE815/A==
在kube-system namespace中为rbd-provisioner RBAC授权并创建pod
vim rbd-provisioner.yamlkind: ClusterRoleapiVersion: rbac.authorization.k8s.io/v1metadata:name: rbd-provisionerrules:- apiGroups: [""]resources: ["persistentvolumes"]verbs: ["get", "list", "watch", "create", "delete"]- apiGroups: [""]resources: ["persistentvolumeclaims"]verbs: ["get", "list", "watch", "update"]- apiGroups: ["storage.k8s.io"]resources: ["storageclasses"]verbs: ["get", "list", "watch"]- apiGroups: [""]resources: ["events"]verbs: ["create", "update", "patch"]- apiGroups: [""]resources: ["services"]resourceNames: ["kube-dns","coredns"]verbs: ["list", "get"]- apiGroups: [""]resources: ["endpoints"]verbs: ["get", "list", "watch", "create", "update", "patch"]---kind: ClusterRoleBindingapiVersion: rbac.authorization.k8s.io/v1metadata:name: rbd-provisionersubjects:- kind: ServiceAccountname: rbd-provisionernamespace: kube-systemroleRef:kind: ClusterRolename: rbd-provisionerapiGroup: rbac.authorization.k8s.io---apiVersion: rbac.authorization.k8s.io/v1beta1kind: Rolemetadata:name: rbd-provisionerrules:- apiGroups: [""]resources: ["secrets"]verbs: ["get"]---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata:name: rbd-provisionerroleRef:apiGroup: rbac.authorization.k8s.iokind: Rolename: rbd-provisionersubjects:- kind: ServiceAccountname: rbd-provisionernamespace: kube-system---apiVersion: v1kind: ServiceAccountmetadata:name: rbd-provisioner---apiVersion: extensions/v1beta1kind: Deploymentmetadata:name: rbd-provisionerspec:replicas: 1strategy:type: Recreatetemplate:metadata:labels:app: rbd-provisionerspec:containers:- name: rbd-provisionerimage: ivano/rbd-provisionerenv:- name: PROVISIONER_NAMEvalue: ceph.com/rbdserviceAccount: rbd-provisionerkubectl -n kube-system apply -f rbd-provisioner.yaml
- 创建
rbd-provisionerpod时要注意使用的容器镜像中的ceph版本 ``` ceph -v ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)我这为luminous
docker history ivano/rbd-provisioner:latest|grep CEPH_VERSION
<a name="bfa7d08b"></a>## `rbd-provisioner` ceph存储集群授权配置RBD卷配置器需要Ceph的管理密钥来配置存储
ceph —cluster ceph auth get-key client.admin AQBDO4dcUktZLxAAwByPxao2QROhQpoWYAWsGg==
<a name="05bf04db"></a>## 添加Ceph集群admin账户权限使用上面的Ceph admin账户的密钥创建secret
kubectl create secret generic ceph-secret \ —type=”kubernetes.io/rbd” \ —from-literal=key=’AQBDO4dcUktZLxAAwByPxao2QROhQpoWYAWsGg==’ \ —namespace=kube-system
<a name="66d1a225"></a>## 创建ceph 存储池
sudo ceph —cluster ceph osd pool create kube 1024 1024 sudo ceph —cluster ceph auth get-or-create client.kube mon ‘allow r’ osd ‘allow rwx pool=kube’ sudo ceph —cluster ceph auth get-key client.kube
<a name="af0b934a"></a>## 添加Ceph集群kube账户权限
ceph —cluster ceph auth get-key client.kube AQBHTIdcSqnbMhAATB4BJJXyA8K5NFQlE815/A==
kubectl create secret generic ceph-secret-kube \ —type=”kubernetes.io/rbd” \ —from-literal=key=’AQBHTIdcSqnbMhAATB4BJJXyA8K5NFQlE815/A==’ \ —namespace=kube-system
<a name="85166765"></a>### 查看secret资源
kubectl get secrets -n kube-system |grep ceph ceph-secret kubernetes.io/rbd 1 54m ceph-secret-kube kubernetes.io/rbd 1 51m
<a name="b1f3ed87"></a>## 创建`storageClassName` 并绑定ceph集群节点后续pod调用直接使用`storageClassName`调用
vim fast-rbd.yaml
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast-rbd provisioner: ceph.com/rbd parameters: monitors: 192.168.122.101:6789, 192.168.122.102:6789, 192.168.122.103:6789 adminId: admin adminSecretName: ceph-secret adminSecretNamespace: kube-system pool: kube userId: kube userSecretName: ceph-secret-kube userSecretNamespace: kube-system imageFormat: “2” imageFeatures: layering
kubectl create -f fast-rbd.yaml
<a name="1a63ac23"></a>## 示例<a name="e28198f4"></a>### 创建pvc请求
cat <<EOF | kubectl create -f - kind: PersistentVolumeClaim apiVersion: v1 metadata: name: myclaim spec: accessModes:
- ReadWriteOnce
resources: requests: storage: 8Gi storageClassName: fast-rbd EOF
查看是否bond
kubectl get pvc myclaim NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE myclaim Bound pvc-5ef52eea-44a7-11e9-9d6f-525400d7a6ef 8Gi RWO fast-rbd 52m
<a name="6b232ac3"></a>### 创建pod示例
cat test-pod.yaml apiVersion: v1 kind: Pod metadata: name: ceph-pod1 spec: containers:
- name: ceph-busybox
image: busybox
command: [“sleep”, “60000”]
volumeMounts:
- name: ceph-vol1 mountPath: /usr/share/busybox readOnly: false volumes:
- name: ceph-vol1 persistentVolumeClaim: claimName: ceph-claim
kind: PersistentVolumeClaim apiVersion: v1 metadata: name: ceph-claim spec: accessModes:
- ReadWriteOnce
resources: requests: storage: 2Gi storageClassName: fast-rbd
检查pv和pvc的创建状态,是否都已经创建;
kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-278c2462-448d-11e9-b632-525400804e1e 8Gi RWO Delete Terminating jx/myclaim fast-rbd 129m pvc-5ef52eea-44a7-11e9-9d6f-525400d7a6ef 8Gi RWO Delete Bound default/myclaim fast-rbd 66m pvc-d3a5095a-4225-11e9-8d3b-525400d7a6ef 8Gi RWO Delete Bound default/jenkins nfs-dynamic-class 3d5h pvc-ed9c1211-44af-11e9-9d6f-525400d7a6ef 2Gi RWO Delete Bound default/ceph-claim fast-rbd 4m59s pvc-f25b4ce2-44a1-11e9-9d6f-525400d7a6ef 2Gi RWO Delete Terminating kube-system/ceph-claim ceph-rbd 96m
kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ceph-claim Bound pvc-ed9c1211-44af-11e9-9d6f-525400d7a6ef 2Gi RWO fast-rbd 5m2s jenkins Bound pvc-d3a5095a-4225-11e9-8d3b-525400d7a6ef 8Gi RWO nfs-dynamic-class 3d5h myclaim Bound pvc-5ef52eea-44a7-11e9-9d6f-525400d7a6ef 8Gi RWO fast-rbd 66m
在ceph服务器上,检查rbd镜像创建情况和镜像的信息;
rbd ls —pool rbd kubernetes-dynamic-pvc-1e569f60-44a3-11e9-8e60-fa9f2d515699
rbd ls —pool kube kubernetes-dynamic-pvc-2e7038b1-449d-11e9-bd3d-46e50dc4cee6 kubernetes-dynamic-pvc-6038cc76-44a7-11e9-a834-029380302ed2 kubernetes-dynamic-pvc-84a5d823-449e-11e9-bd3d-46e50dc4cee6 kubernetes-dynamic-pvc-edb72324-44af-11e9-a834-029380302ed2
rbd info kube/kubernetes-dynamic-pvc-2e7038b1-449d-11e9-bd3d-46e50dc4cee6 rbd image ‘kubernetes-dynamic-pvc-2e7038b1-449d-11e9-bd3d-46e50dc4cee6’: size 8GiB in 2048 objects order 22 (4MiB objects) block_name_prefix: rbd_data.11136b8b4567 format: 2 features: layering flags: create_timestamp: Tue Mar 12 16:02:30 2019
检查busybox内的文件系统挂载和使用情况,确认能正常工作;
kubectl exec -it ceph-pod1 mount |grep rbd /dev/rbd0 on /usr/share/busybox type ext4 (rw,relatime,stripe=1024,data=ordered)
kubectl exec -it ceph-pod1 df |grep rbd /dev/rbd0 1998672 6144 1976144 0% /usr/share/busybox ```
