614K IOPS for Local NVMe kubernetes POD

Jerry(이정훈)
11 min readFeb 16, 2021

--

현재 사용 중인 OpenEBS Local PV의 주요 이유는 성능이다. Ceph 등 네트워크 및 분산 스토리지 환경에 비하여 로컬 Disk를 사용하므로 성능이 월등하다.

반면 서버 Down 시 data가 사라지는 단점이 있다. 이는 1) Application 단 복제 구성(보통 3벌 복제) 2) 외부 NAS 스토리지 등으로 백업 3) Local Disk RAID 6 또는 노드 간 데이터 복제 구성으로 대처한다.

성능 최대화를 위하여 NVMe Disk 환경에서 FIO Tool을 사용하여 IOPS를 검증해 보았다. 테스트 결과 단일 POD에서 Read는 614K, Write는 667k(empty disk 환경이라 실 운영 환경과 차이는 많을 듯)까지 나온다. 지난 번 일반 SSD 환경에서 대략 80K 정도 나왔으니 약 7배 정도 차이가 난다.

이 구성이면 IO를 많이 필요하는 Database 환경에 아주 적합하다. 일반적으로 NVMe Disk 서버가 대략 1천 ~ 2천만원 정도이므로 AWS 등 Public Cloud 또는 Pure Storage 등 외장 NVMe All Flash Storage 에 비하여 훨씬 비용 효과적으로 구성 가능하다.

아래는 상세 테스트 결과이다.

read 기준이며 재사용성을 위하여 ConfigMap으로 설정하였다.

# fio fio-read.conf --output=randread.out
# cat randread.out
read: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.19
Starting 16 processes
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: Laying out IO file (1 file / 1024MiB)
read: (groupid=0, jobs=16): err= 0: pid=170: Mon Feb 15 23:14:05 2021
read: IOPS=614k, BW=2399MiB/s (2516MB/s)(281GiB/120002msec)
slat (nsec): min=1373, max=2591.1k, avg=6844.14, stdev=8974.66
clat (usec): min=22, max=9466, avg=408.53, stdev=247.68
lat (usec): min=53, max=9471, avg=415.57, stdev=247.61
clat percentiles (usec):
| 1.00th=[ 129], 5.00th=[ 172], 10.00th=[ 198], 20.00th=[ 235],
| 30.00th=[ 269], 40.00th=[ 302], 50.00th=[ 338], 60.00th=[ 388],
| 70.00th=[ 449], 80.00th=[ 537], 90.00th=[ 701], 95.00th=[ 873],
| 99.00th=[ 1336], 99.50th=[ 1565], 99.90th=[ 2180], 99.95th=[ 2507],
| 99.99th=[ 3228]
bw ( MiB/s): min= 2360, max= 2436, per=100.00%, avg=2400.71, stdev= 0.83, samples=3824
iops : min=604381, max=623768, avg=614575.29, stdev=213.45, samples=3824
lat (usec) : 50=0.01%, 100=0.12%, 250=24.50%, 500=51.76%, 750=15.45%
lat (usec) : 1000=5.05%
lat (msec) : 2=2.96%, 4=0.16%, 10=0.01%
cpu : usr=12.00%, sys=37.14%, ctx=29914918, majf=0, minf=17701
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=73707322,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=2399MiB/s (2516MB/s), 2399MiB/s-2399MiB/s (2516MB/s-2516MB/s), io=281GiB (302GB), run=120002-120002msec
Disk stats (read/write):
nvme0n1: ios=73698190/11, merge=0/3, ticks=29491462/0, in_queue=34596187, util=100.00%
[root@localhost ~]# kubectl exec -it fio-0 -- sh
# fio /configs/fio.job --output=randwrite.out
# cat randwrite.out
write: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.19
Starting 16 processes
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: Laying out IO file (1 file / 1024MiB)
write: (groupid=0, jobs=16): err= 0: pid=80: Mon Feb 15 20:06:24 2021
write: IOPS=667k, BW=2604MiB/s (2731MB/s)(305GiB/120001msec); 0 zone resets
slat (usec): min=2, max=1824, avg= 9.58, stdev= 5.99
clat (nsec): min=502, max=3827.1k, avg=373454.12, stdev=250664.46
lat (usec): min=10, max=3836, avg=383.18, stdev=250.50
clat percentiles (usec):
| 1.00th=[ 23], 5.00th=[ 56], 10.00th=[ 85], 20.00th=[ 130],
| 30.00th=[ 178], 40.00th=[ 251], 50.00th=[ 359], 60.00th=[ 453],
| 70.00th=[ 519], 80.00th=[ 578], 90.00th=[ 668], 95.00th=[ 791],
| 99.00th=[ 1090], 99.50th=[ 1221], 99.90th=[ 1516], 99.95th=[ 1729],
| 99.99th=[ 2606]
bw ( MiB/s): min= 2355, max= 2751, per=100.00%, avg=2607.29, stdev= 5.49, samples=3808
iops : min=602876, max=704348, avg=667462.10, stdev=1406.09, samples=3808
lat (nsec) : 750=0.01%, 1000=0.01%
lat (usec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.75%, 50=3.53%
lat (usec) : 100=9.04%, 250=26.64%, 500=27.15%, 750=26.70%, 1000=4.58%
lat (msec) : 2=1.56%, 4=0.03%
cpu : usr=8.42%, sys=46.02%, ctx=35804771, majf=0, minf=15206
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,80008279,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
WRITE: bw=2604MiB/s (2731MB/s), 2604MiB/s-2604MiB/s (2731MB/s-2731MB/s), io=305GiB (328GB), run=120001-120001msec
Disk stats (read/write):
nvme0n1: ios=0/79864283, merge=0/2, ticks=0/29430890, in_queue=37016620, util=100.00%

아마도 Public Cloud 환경과 비교해 보면 성능 및 비용 차이가 엄청 날 것 이다.

--

--

Jerry(이정훈)
Jerry(이정훈)

Written by Jerry(이정훈)

DevOps/Automation Engineer 60살까지 콘솔 잡는 엔지니어를 목표로 느리게 생각하고 있습니다.

No responses yet