设计工具
存储

微米 6500 ION provides massive WEKA performance on AMD-based 服务器

Sujit Somandepalli | July 2023

With the launch of the 微米 6500 ION NVMe 固态硬盘, we recently had the opportunity to execute some interesting scaling studies with WEKA™ 1 on 4th Gen AMD EPYC™ 9554-based (64-core) Supermicro platforms.

WEKA is a high performance, NVMe™-based software defined 存储 solution that is commonly used in large-scale file 存储 deployments for a variety of use cases, including high-performance computing (HPC) and artificial intelligence (AI).

We tested on a cluster of six Supermicro AS-1115CS-TNR2 single-socket 服务器 based on AMD EPYC 9004 Series processors, with 微米 DDR5 memory and 400GbE-capable, PCIe®Gen5网络. This server is well-suited for designing a clustered 存储 system with WEKA due to its high-performance Zen 4 cores and simplified single-socket design. These 服务器 can support up to 10 NVMe 固态硬盘s per node, for a total of 60 drives in the test cluster.

This 6-node WEKA 存储 cluster was connected to 12 clients, each of which ran flexible input/output tester (fio )3 with 32 jobs at various IO depths (queue depths).

Our initial testing with this cluster focused on using 36 drives (six drives per six nodes), which we then expanded to 60 drives (10 drives per six nodes). The results speak for themselves.

Sequential performance

In the 1MB sequential read workloads, we observed that going from six drives to 10 drives per node gave us up to 229GB/s for reads, up from about 164GB/s. This results represents a nearly 40% improvement in sequential reads. For 128KB sequential read workloads, we observed almost double the performance from the six-drive configuration.

The write performance for the sequential workloads maxes out around 106GB/s, which is limited by the compute on the WEKA backends.

1M Seq Read - 12 客户端s Graph
128K Seq Read - 12 客户端s Graph
128K Seq Write - 12 客户端s Graph

4KB random performance

We also ran small block random IO testing (four corners, 4KB), 我们观察到, in addition to getting higher input/output operations per second (IOPs), the 10-drive configuration provides lower latency for each of the operations. For the 100% random read workload, the 10-drive configuration was able to achieve more than 16 million IOPs at 0.59ms average read latency, and the 100% random write workload achieved more than 3.3 . 100万IOPs.19ms average write latency.

4K Random Read and Avg Latency (ms) - 12 客户端s Graph
4K Random Write and Avg Latency (ms) - 12 客户端s Graph

结论

We saw that WEKA provides near-linear performance scaling as we moved from six to 10 drives per node. This result proves an easy way to grow your WEKA deployments on 4th Gen AMD EPYC 9004 Series processors by using cost-competitive 30TB 微米 6500 NVMe 固态硬盘s. 

  1. For more information on Weka, see http://www.weka.io/ 
  2. For details on the Supermicro AS-115CS-TNS platform, see http://www.supermicro.com/en/products/system/clouddc/1u/as-1115cs-tnr
  3. For details on FIO, see http://fio.readthedocs.io/en/latest/fio_doc.html

Principal 存储 Solutions Engineer

Sujit Somandepalli

Sujit Somandepalli is Principal 存储 Solutions Engineer at 微米 Technology.