WebP Cloud Services Blog

The performance review of Hetzner's CAX-line ARM64 servers and the practical experience of WebP Cloud Services on them.

· Nova Kwok

这篇文章有简体中文版本,在: Hetzner CAX 系列 ARM64 服务器性能简评以及 WebP Cloud Services 在其上的实践

TL;DR:

  • Hetzner ARM64 performs very well, with the 4-core CAX21 (ARM64, 4 cores, 8GB RAM) machine only being 8% slower in WebP conversion speed compared to the 3-core CPX21 (AMD64, 3 cores, 4GB RAM), while the price difference between the two is 14% (8.40 USD/mo vs 9.76 USD/mo). Additionally, the CAX21 offers twice the amount of RAM compared to the CPX21.
  • Due to the impressive performance of ARM64 in testing, we have migrated all WebP Cloud Services to Hetzner’s ARM64 servers.
  • Hetzner Volumes are not exceptionally fast, roughly about one-third the speed of LocalSSD. However, their advantage lies in higher data security.

A long time ago, in 2015, Scaleway introduced its C1 servers, which were based on ARM64 processors. The C1 servers were built on the Marvell Armada 370/XP quad-core ARM Cortex A9 processor and featured 2GB of RAM. These servers were designed by Scaleway themselves and were sold in a bare metal form, without any virtualization. The official price was approximately $3 per month. Here is the physical appearance of the machine hardware:

https://twitter.com/edouardb_/status/787212549628526592

Due to the use of their self-developed motherboards and other components, C1 servers had a very high density within their chassis. In the publicly available images by Scaleway, the internal layout of their chassis looked like this:

This became the first significant provider of ARM64 architecture servers in an era dominated by almost everyone using Intel Xeon. Although we can see from a benchmark, such as https://browser.geekbench.com/geekbench2/2576212 and https://medium.com/amarao/scaleway-arm-servers-50f85c4cefbe, that the performance of this ARM processor is far behind mainstream AMD64 architecture servers, it also made people realize that ARM64 architecture could have a role to play in the server field. At that time, I even made a dedicated tweet about it: https://twitter.com/n0vad3v/status/931344460633403394.

However, three years later, in 2020, Scaleway issued a statement announcing the discontinuation of ARM64 machines:

In response to that, I tweeted again: https://twitter.com/n0vad3v/status/1253577191280930817.

Nevertheless, after three years of discontinuing C1 ARM64 machines, in 2023, Scaleway resumed offering servers with ARM64 processors, known as the AMP series, utilizing the Altra Max processor.

Between 2020 and 2023, among the mainstream cloud service providers, only AWS continued to offer ARM64 machines, using their own Graviton processors. However, those familiar with AWS might know that while the AWS Graviton instances have become more cost-effective compared to traditional machines, calculating the pricing reveals that as of now (June 2023), the cheapest ARM64 instance, t4g.nano (2 cores, 0.5 GiB RAM), costs $0.0042 USD per hour, which translates to $3 per month. However, considering the need to run workloads on it, 0.5 GiB of RAM may not be sufficient, and a more usable configuration could be 1 core with 2 GiB of RAM, which corresponds to t4g.small (2 cores, 2 GiB RAM) at $0.0168 USD per hour, or $18 per month. Additionally, this cost does not include potential fees for traffic, storage, or other resources. It’s also worth noting that these instances are burstable performance instances, and sustained high CPU usage may result in limitations or additional charges.

Therefore, we have compiled a table listing the currently popular service providers that offer ARM64 processing capabilities:

Service ProviderMachine NameDisk SpacePrice (Monthly, USD)LinkAdditional Description
HetznerCAX2180GB8.38Hetzner CloudStarting now, we also have four brand new Hetzner Cloud server plans which we’ve built around innovative Arm technology. You can get your hands on up to 16 vCPUs based on Ampere® Altra® processors.
AWSa1.xlargeAdditional73AWS EC2 Pricing
ScalewayAMP2-C410GB15Scaleway AMP2 InstancesPlease note that these Instances are currently in a trial phase. It is not recommended to use them to host critical services.
Oracle CloudVM.Standard.A1.FlexAdditional0 (Free Tier)Oracle Cloud Cost EstimatorEach tenancy gets the first 3,000 OCPU hours and 18,000 GB hours per month for free to create Ampere A1 Compute instances using the VM.Standard.A1.Flex shape (equivalent to 4 OCPUs and 24 GB of memory).
AlibabacloudARM General purpose instance ecs.g8y.xlargeAdditional92.26Alibaba Cloud ECS

As we can see, excluding the Oracle Cloud Free Tier, Hetzner offers the lowest price and does not consider their ARM64 machines as experimental products with no SLA guarantee, unlike Scaleway.

All the mentioned providers, except AWS, use Ampere processors.

In a news article by Hetzner on April 23, 2023, titled “ARM64 Cloud” ( https://www.hetzner.com/news/arm64-cloud ), they publicly introduced their ARM64 cloud servers under the CAX line for the first time, based on Ampere Altra processors. However, the specific model is not mentioned. In their news article about ARM64 dedicated servers ( https://www.hetzner.com/news/07-22-rx-line/ ), we know that the RX line servers utilize Ampere Altra Q80-30 SoC. Therefore, we can speculate that the CAX line might use the same processor.

Hetzner ARM64 Pricing

From the Pricing page, we can see that ARM64 servers offer excellent value for money, with a 4-core 8GB machine available for just 7.73 EUR/mo.

At the WebP Cloud Services team, we are very interested in the benefits of using ARM64 machines and are willing to test our products on ARM64 platforms. Therefore, we conducted some tests on different machines and shared the results in this article for readers with similar needs to reference.

Test Machines

We have four five machines:

  • A dedicated server with a Xeon E3-1230 v3 @ 3.30GHz CPU, 8 cores(4 core, 8 threads), 32GB DDR3 memory, priced at $30 USD per month, referred to as Xeon for simplicity.
  • Hetzner CPX21, with a virtualized AMD EPYC 2.4GHz CPU, 3 cores(vCPU), 4GB memory, priced at $9.76 USD per month, referred to as CPX21 for simplicity.
  • Hetzner CAX11, with a virtualized ARM64 processor, 2 cores(vCPU), 4GB memory, priced at $4.91 USD per month, referred to as CAX11 for simplicity.
  • Hetzner CAX21, with a virtualized ARM64 processor, 4 cores(vCPU), 8GB memory, priced at $8.40 USD per month, referred to as CAX21 for simplicity.
  • Oracle Cloud is equipped with a virtualized ARM64 processor with 4 cores and 20GB of memory. As it falls under the Free Tier, the monthly price is 0. From here on, it will be referred to as Oracle.

The test script used is located at https://github.com/masonr/yet-another-bench-script, and the command is:

curl -sL yabs.sh | bash -s -- -i

This script utilizes fio for disk performance testing, iperf3 for network performance testing, and Geekbench for CPU/memory performance testing. However, since we have a 1Gbps bandwidth provided by the service provider, we will skip the network testing and only perform Geekbench and disk testing.

GeekBench Test

We begin with GeekBench 6 tests.

The scores for Xeon are:

Processor  : Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz
CPU cores  : 8 @ 3700.000 MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ✔ Enabled

Single Core     | 1103
Multi Core      | 3353

The scores for CPX21 are:

Processor  : AMD EPYC Processor
CPU cores  : 3 @ 2495.310 MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ❌ Disabled

Single Core     | 1222
Multi Core      | 3107

The scores for CAX11 are:

Processor  : Neoverse-N1
CPU cores  : 2 @ ??? MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ❌ Disabled

Single Core     | 1072
Multi Core      | 1921

The scores for CAX21 are:

Processor  : Neoverse-N1
CPU cores  : 4 @ ??? MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ❌ Disabled

Single Core     | 1068
Multi Core      | 3444

The scores for Oracle are:

Processor  : Neoverse-N1
CPU cores  : 4 @ ??? MHz
AES-NI     :  Enabled
VM-x/AMD-V :  Disabled

Single Core     | 1066
Multi Core      | 2666

We can draw some conclusions:

  • The performance of the server CPU is not solely determined by the clock frequency. The Xeon processor, with a clock frequency of 3.8GHz and 8 cores, only matches the performance of the virtualized AMD EPYC processor, which has a clock frequency of 2.4GHz and 3 cores.
  • The performance of the Ampere Altra based on the Neoverse N1 architecture is noteworthy. In the GeekBench 6 test, a 4-core ARM64 processor outperforms a 3-core virtualized AMD processor.
  • The ARM64 cores of Oracle Cloud, also labeled as Neoverse-N1, seem to have slightly lower performance compared to Hetzner. This could be due to the high number of users on the Free Tier, causing resource limitations.

WebP Encode Test

Since we plan to run our services on ARM64, let’s discuss our situation. Currently, WebP Cloud Services has two services:

  • Public Service
    • Provides a reverse proxy for Gravatar and GitHub Avatar, solving two problems:
    • This is a public service that is completely free, and currently has a large number of users, this includes, but is not limited to CNX Software,Indienova
  • WebP Cloud
    • This is our recently launched new service, which has the following main features:
      • It allows users to convert their website’s images to WebP format and serve them through a new domain provided by WebP Cloud, without the need to host our open-source component, WebP Server Go (especially suitable for static blogs such as Hugo or Hexo).
      • By registering an account on WebP Cloud and providing your website address, WebP Cloud will provide you with a new domain. When users access the images on your website using the new domain and the original image’s URI, WebP Cloud will convert the images to the WebP format and deliver them. This process significantly reduces the image size without compromising the image quality, resulting in faster overall website loading speed.
      • For example, if the original image URL of your website is https://yyets.dmesg.app/api/user/avatar/Benny, WebP Cloud will provide a new URL like https://vz4w427.webp.ee. By accessing https://vz4w427.webp.ee/api/user/avatar/Benny, you can see the compressed and optimized version of the image.
      • All the served images are automatically cached in WebP Cloud. This means that after the initial access, all subsequent accesses are served directly from WebP Cloud without going back to the origin server, reducing the traffic and bandwidth load on the source server.
    • During the initial Alpha phase, free users can get a daily limit of 2000 images for free. This limit is sufficient for websites/blogs with moderate traffic. Additionally, paid quotas can be purchased at a lower price.
    • Additionally, we support Custom Domain, which means you can use your own domain name to serve the images. For example, two of our users, Keshane’s Simple Blog and STRRL’s backyard, are using their respective domain names, https://webp.keshane.moe and https://webp.strrl.dev, to access WebP Cloud.

Since our services (excluding the frontend) are written in Golang and our CI/CD pipeline is built using GitHub Actions, we have built images for both AMD64 and ARM64 architectures from the beginning. Therefore, testing the services simply involves migrating and starting the containers without the need to modify the image names.

Among the two services mentioned above, the most important and resource-intensive part is the WebP conversion (Encode) process. We can easily test the conversion speed on different machines using the Prefetch feature of WebP Server Go. To evaluate machine performance, we used a set of test images totaling 2.4 GB. Around 80% of the images were taken with a Sony A7 camera, with file sizes averaging around 15 MiB. The remaining 20% were smaller images with sizes ranging from 1 MiB to 5 MiB.

The testing command is as follows:

./webp-server-go -prefetch

The shorter the execution time, the better the performance in this aspect.

Prefetch time on the Xeon server:

Prefetching... 100% |██████████████████████████████████████████████████████| (438/438, 10 it/s)         
Prefetch completeY(^_^)Y in 44.414660644s

Prefetch time on CPX21:

Prefetching... 100% |██████████████████████████████████████████████████| (438/438, 6 it/s)          
Prefetch completeY(^_^)Y in 1m9.87966334s

Due to insufficient memory, CAX11 encountered an out-of-memory (OOM) issue and was not included in this round of testing.

The Prefetch time for CAX21 is:

Prefetching... 100% |█████████████████████████████████████████████████████| (438/438, 6 it/s)           
Prefetch completeY(^_^)Y in 1m15.080679651s

The Prefetch time for Oracle is:

Prefetching... 100% |███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (438/438, 5 it/s)
Prefetch completeY(^_^)Y in 1m24.932026118s

From the results of the Prefetch, it can be seen that the 8-core Xeon(R) CPU E3-1230 v3 @ 3.30GHz dedicated server takes the lead here, with a conversion time of 44s, which is 36% faster than CPX21. However, since CPX21 has better Geekbench scores than Xeon, this also indicates that we cannot blindly rely on Geekbench scores as the sole indicator of performance comparison. It is necessary to consider our own business requirements.

On the other hand, the performance of ARM64 is quite impressive. The 4-core CAX21 machine has a conversion speed that is only 8% slower compared to the 3-core CPX21, and the price difference between them is 14%. Additionally, CAX21 also has twice the amount of memory compared to CPX21.

Disk Testing

Whether it’s Public Services or WebP Cloud, for subsequent requests, all images are served from cache. Our cache is persisted on disk, so testing the disk performance is crucial in this context.

The tests mentioned above are included in the test commands using fio for performance testing. Four random read and write fio disk tests are conducted as part of this script with 4k, 64k, 512k, and 1m block sizes. The tests are designed to evaluate disk throughput in near-real world (using random) scenarios with a 50/50 split (50% reads and 50% writes per test).

First, let’s provide the disk performance of Oracle Cloud. The test was conducted on the machine’s built-in disk.

fio Disk Speed Tests (Mixed R/W 50/50):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 75.84 MB/s   (18.9k) | 228.99 MB/s   (3.5k)
Write      | 75.79 MB/s   (18.9k) | 235.80 MB/s   (3.6k)
Total      | 151.63 MB/s  (37.9k) | 464.79 MB/s   (7.2k)
           |                      |
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 143.03 MB/s    (279) | 141.04 MB/s    (137)
Write      | 155.27 MB/s    (303) | 157.35 MB/s    (153)
Total      | 298.31 MB/s    (582) | 298.39 MB/s    (290)

Hetzner Cloud offers two types of disks: LocalSSD, which refers to the disks that come with the machines, and Volumes. Hetzner describes Volumes as follows:

  • Volumes offer highly available and reliable SSD storage for your cloud servers. You can expand each Volume to up to 10 TB at any time, and you can connect them to your Hetzner cloud servers.

    Our Volumes are based on the networked block storage model, and every block of data is stored on three different physical servers at our Hetzner data centers.

Under Hetzner Cloud, both AMD and ARM64 machines showed similar performance for LocalSSD and Volume. Therefore, the summary is as follows:

The test results for LocalSSD are as follows:

fio Disk Speed Tests (Mixed R/W 50/50):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 146.57 MB/s  (36.6k) | 1.13 GB/s    (17.7k)
Write      | 146.48 MB/s  (36.6k) | 1.17 GB/s    (18.3k)
Total      | 293.06 MB/s  (73.2k) | 2.30 GB/s    (36.0k)
           |                      |                     
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 2.50 GB/s     (4.8k) | 2.65 GB/s     (2.5k)
Write      | 2.71 GB/s     (5.3k) | 2.95 GB/s     (2.8k)
Total      | 5.21 GB/s    (10.1k) | 5.60 GB/s     (5.4k)

The test results for Volumes are as follows:

fio Disk Speed Tests (Mixed R/W 50/50):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 29.70 MB/s    (7.4k) | 314.04 MB/s   (4.9k)
Write      | 29.68 MB/s    (7.4k) | 323.38 MB/s   (5.0k)
Total      | 59.38 MB/s   (14.8k) | 637.43 MB/s   (9.9k)
           |                      |                     
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 298.03 MB/s    (582) | 288.82 MB/s    (282)
Write      | 323.52 MB/s    (631) | 322.23 MB/s    (314)
Total      | 621.56 MB/s   (1.2k) | 611.05 MB/s    (596)

It can be seen that although Volumes have the advantage of triple replication, the overall performance may only be about one-third of LocalSSD. Therefore, additional attention is required when planning applications, especially database applications.

Additionally, we can observe a significant difference in speed between Hetzner’s LocalSSD and Oracle Cloud’s SSD.


From the above results, we can see that if we don’t want to go bankrupt, we won’t consider AWS. Among the remaining ARM64 service providers, Scaleway doesn’t guarantee SLA, so we are hesitant to choose them. Oracle Cloud’s disk and processor performance are not satisfactory, and there is a possibility of account closure due to being on the Free Tier. Alibaba Cloud is closely tied to the Chinese company Alibaba, and as a European service provider, we won’t consider them. Additionally, their prices are also very high. In the end, we have chosen Hetzner’s CAX series machines as our server provider.

The aforementioned tests were conducted on Hetzner ARM64 servers. Due to their excellent cost-performance ratio and our fondness for ARM64 machines, WebP Cloud Services has fully migrated to Hetzner’s CAX series ARM64 processors at the time of this article’s publication. Although we encountered some peculiar incidents during the migration, such as the CPU inexplicably spiking when using the alpine Clickhouse image, which was resolved by switching to a non-alpine image:

Apart from that, ARM64 machines have performed well in terms of response latency and compatibility. If we make any new discoveries in the future, we will be sure to share them.


If you’re interested in Hetzner’s ARM64 machines after reading this article, you can try using our referral link to sign up for Hetzner and experience it: https://hetzner.cloud/?ref=6moYBzkpMb9s

By registering through our link, you can directly receive a €20 credit upon successful registration, and we will also receive a €10 reward. This way, you can support the development of our product as well.

However, it’s important to note that Hetzner has strict risk controls. Using a VPN during registration or intentionally providing incorrect information may easily result in your account being banned. This can be seen as both a disadvantage and an advantage. The disadvantage is that the registration threshold is relatively high, but the advantage is that Hetzner’s customers are relatively “clean” compared to mainstream service providers that offer large credit limits (such as DO and Vultr), without noisy and disruptive neighbors. Moreover, from our observations, once an account is successfully registered and has a few successful paid orders, there is generally no issue with account closure.

References

  1. Scaleway C2 and ARM64 instances will reach end-of-life in December 2020
  2. Scaleway ARM servers
  3. RETHINK YOUR CLOUD. RELY ON OUR NEW ARM64 CAX SERVERS
  4. Scaleway Provides Dedicated ARM Servers for 10 Euros per Month, 0.02 Euro per Hour - CNX Software

The WebP Cloud Services team is a small team of three individuals from Shanghai and Helsingborg. Since we are not funded and have no profit pressure, we remain committed to doing what we believe is right. We strive to do our best within the scope of our resources and capabilities. We also engage in various activities without affecting the services we provide to the public, and we continuously explore novel ideas in our products.

If you find this service interesting, feel free to log in to the WebP Cloud Dashboard to experience it. If you’re curious about other magical features it offers, take a look at our WebP Cloud Services Docs. We hope everyone enjoys using it!


Discuss on Hacker News