Hybrid Deployment of GitHub Actions Runner: Multi-Arch Image Building Speed Soars 10x!
这篇文章有简体中文版本,在: 混合部署 GitHub Actions Runner:Multi Arch 镜像构建速度飙升 10 倍!
At WebP Cloud Services, all our components are containerized, and both code hosting and CI/CD processes are handled on GitHub and GitHub Actions. This modern workflow has significantly reduced our workload and costs.
The benefits of containerized deployment allow us to focus on the functionality of each environment instead of spending a lot of time dealing with environment issues on different machines.
Since we are using containerized deployment, we must mention image hosting. Due to our deep integration with GitHub, we directly use GHCR (GitHub Container Registry) to host our images instead of services like ECR or GCR that could be costly for us.
Background Information
As mentioned earlier, all our services are containerized. For example, the service that provides the user API is named webppt
. If you’ve looked at our API documentation, you might have noticed that our API address is different from mainstream API service providers. It is https://webppt.webp.se
instead of https://api.webp.se
, and the reason behind this is that the service is named webppt
.
- We have an interesting story behind this decision, and we plan to share the whole story of WebP Cloud Services later on.
Our organization on GitHub is named webp-pt
(please do not follow this Org as it does not contain any public repositories). Therefore, the image name for the webppt
component naturally becomes ghcr.io/webp-pt/webppt
.
Before we explored and discovered the advantages of ARM64 (as mentioned in the article “The performance review of Hetzner’s CAX-line ARM64 servers and the practical experience of WebP Cloud Services on them.”), our infrastructure was based on AMD64 architecture dedicated servers. Our workflow was as follows:
- All code changes went through pull requests for review. GitHub Actions ran all CI tests and
trivy
scans on the images (used to detect any obvious vulnerabilities). - After the code was merged into the
master
branch (we prefermaster
overmain
), GitHub Actions built an image namedghcr.io/webp-pt/webppt:latest
. - When we decided to release a version to production, we would create a
tag
on a specific commit, e.g.,31
. Then GitHub Actions would build an image namedghcr.io/webp-pt/webppt:31
.
The corresponding GitHub Actions steps were straightforward, and they looked something like this:
- name: Login to GitHub Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push latest images
if: github.ref_name == 'master'
uses: docker/build-push-action@v4
with:
context: .
platforms: linux/amd64
push: true
tags: |
ghcr.io/${{ github.event.repository.full_name }}:latest
- name: Build and push tagged images
if: startsWith(github.ref, 'refs/tags/')
uses: docker/build-push-action@v3
with:
context: .
platforms: linux/amd64
push: true
tags: |
ghcr.io/${{ github.event.repository.full_name }}:${{ steps.imageTag.outputs.tag }}
At that time, the image build speed looked like this:
After discovering the cost-effective ARM64 machines at Hetzner and migrating our services to ARM64, we wanted to build ARM64 images alongside AMD64 images. Since most of our components (except the frontend part) were written in Golang, it was as simple as modifying the platforms in the GitHub Actions step like this:
platforms: linux/amd64, linux/arm64
This change allowed us to continue using the GitHub and GitHub Actions + GHCR workflow while supporting both AMD64 and ARM64 architectures. Moreover, GHCR’s private repositories seemed to have no capacity restrictions, and we were able to build many different versions of images without storage usage. It meant we could use GHCR for free.
However, we soon noticed a new problem: the image build speed became incredibly slow.
The reason for the slowdown was apparent. By default, GitHub Actions used Runner machines with 2-core CPU (x86_64) and 7 GB of RAM (corresponding to Azure Standard_DS2_v2 instance type). When specifying ARM64 platforms, QEMU was used to emulate ARM64, which significantly affected the speed.
Attempting to build both AMD64 and ARM64 images on these low-spec machines, along with QEMU emulation, naturally resulted in slow speeds.
Hence, our problems and requirements were clear:
- We needed to build both AMD64 and ARM64 images.
- We wanted to continue using the GitHub and GitHub Actions + GHCR workflow to minimize mental burden on the pipeline.
- We couldn’t accept the excessively long 20+ minutes build time.
Self-hosted Runner
So we quickly came up with the first idea, based on Nova Kwok’s previous experience: “Accelerate Multi-Arch Image Building on GitHub Actions with Multiple Parallel Jobs.” We decided to use multiple parallel GitHub Actions Runners to build ARM64 and AMD64 images separately and then merge them together. The process looks like this:
- Runner 1 builds an image named
ghcr.io/webp-pt/webppt:31-amd64
. - Runner 2 builds an image named
ghcr.io/webp-pt/webppt:31-arm64
. - Runner 3 combines the two images using the
manifest
operation after Runners 1 and 2 have finished their tasks. The new image name becomesghcr.io/webp-pt/webppt:31
. - In this way, we obtain a Multi-Arch image
http://ghcr.io/webp-pt/webppt:31
, which allows us to pull the corresponding architecture-specific images on both AMD64 and ARM64 using the same image name.
This approach is currently being used to build Multi-Arch Runner images on https://github.com/knatnetwork/github-runner, an open-source project by Nova Kwok. However, based on actual usage results, we have observed that even when using a Runner with QEMU simulating an ARM64 environment, the build speed on https://github.com/knatnetwork/github-runner is still far slower than on AMD64, as shown in the following figure:
Therefore, this approach may not be as elegant and optimal as initially thought.
Instead, we came up with a new approach. Since Nova Kwok’s https://github.com/knatnetwork/github-runner could be easily deployed, and we had plenty of idle Hetzner ARM64 resources, why not natively build the ARM64 part of the image on ARM64 machines? So, we designed the following workflow:
- GitHub Actions’ official Runner (
amd64
job) builds an image namedghcr.io/webp-pt/webppt:31-amd64
. - Our self-hosted Runner (
arm64
job) builds an image namedghcr.io/webp-pt/webppt:31-arm64
. - After both
amd64
andarm64
jobs finish, thecombine-two-images
job merges the two images into a single Multi-Arch image namedghcr.io/webp-pt/webppt:31
.
Let’s do this!
Spin up runner
Creating a Self-hosted Runner can be both straightforward and complex. For simple scenarios, it can be done easily. However, for more complex environments, such as those involving Kubernetes (K8s) and requiring elastic scaling, you can utilize the GitHub open-source project called https://github.com/actions/actions-runner-controller.
However, our goal is to keep things as simple as possible. We don’t want to deal with Kubernetes or set up multiple controllers. So, we decided to use Nova Kwok’s open-source project https://github.com/knatnetwork/github-runner directly. To get started, we selected an idle ARM64 machine and created an empty directory. In this directory, we wrote a docker-compose.yml
file with the following content:
version: '3'
services:
runner:
image: knatnetwork/github-runner:latest
restart: always
environment:
RUNNER_REGISTER_TO: 'webp-pt'
RUNNER_LABELS: 'docker,webpcloud'
KMS_SERVER_ADDR: 'http://kms:3000'
ADDITIONAL_FLAGS: '--ephemeral'
volumes:
- /var/run/docker.sock:/var/run/docker.sock
kms:
image: knatnetwork/github-runner-kms:latest
restart: always
environment:
PAT_webp-pt: 'ghp_kh4GxxxxxxC'
After modifying the RUNNER_REGISTER_TO
and PAT_webp-pt
with the appropriate values (the organization and the GitHub Personal Access Token), we used docker-compose up -d
to start the Runner, and it quickly registered successfully.
The last time I had such a smooth experience was in the previous time.
Re-write some stuff
Next, we modified our GitHub Actions pipeline to schedule the jobs on the respective Self-hosted Runner. The pipeline structure looked like this:
name: Build docker images and push
on:
push:
branches:
- 'master'
tags:
- "*"
paths-ignore:
- '**.md'
- '*.yml'
jobs:
amd64:
runs-on: ubuntu-latest
steps:
- name: Build and push tagged images
if: startsWith(github.ref, 'refs/tags/')
uses: docker/build-push-action@v3
with:
context: .
platforms: linux/amd64
push: true
provenance: false
sbom: false
tags: |
ghcr.io/${{ github.event.repository.full_name }}:${{ steps.imageTag.outputs.tag }}-amd64
arm64:
runs-on: self-hosted
steps:
- name: Build and push tagged images
if: startsWith(github.ref, 'refs/tags/')
uses: docker/build-push-action@v3
with:
context: .
platforms: linux/arm64
push: true
provenance: false
sbom: false
tags: |
ghcr.io/${{ github.event.repository.full_name }}:${{ steps.imageTag.outputs.tag }}-arm64
combine-two-images:
runs-on: ubuntu-latest
needs:
- arm64
- amd64
steps:
- name: Combine two tagged images
if: startsWith(github.ref, 'refs/tags/')
run: |
docker manifest create ghcr.io/${{ github.event.repository.full_name }}:${{ steps.imageTag.outputs.tag }} --amend ghcr.io/${{ github.event.repository.full_name }}:${{ steps.imageTag.outputs.tag }}-amd64 --amend ghcr.io/${{ github.event.repository.full_name }}:${{ steps.imageTag.outputs.tag }}-arm64
docker manifest push ghcr.io/${{ github.event.repository.full_name }}:${{ steps.imageTag.outputs.tag }}
With three jobs:
arm64
: Running on the self-hosted Runner (our ARM64 machine).amd64
: Running on GitHub’s official Runner.combine-two-images
: Depends onarm64
andamd64
jobs and merges the two images together.
The actual runtime results were impressive, taking around 2 minutes on average to complete.
This was a significant improvement from the initial 22+ minutes and represented roughly a 10x speedup!
Of course, we encountered some small issues during the process, such as ghcr.io/webp-pt/webppt:latest-amd64 is a manifest list
. However, Nova Kwok had previously encountered a similar issue and documented the solution in their post “Docker Buildx Attestations Check Maintenance”. By adding provenance: false
and sbom: false
flags in the docker/build-push-action@v3
step, we resolved the problem.
The WebP Cloud Services team is a three-person team from Shanghai and Helsingborg. As we are not funded and have no profit pressure, we focus on doing what we believe is right, striving to do our best within our available resources and capabilities. We also engage in various activities without affecting our external services and try out various exciting new things on our products.
As you can see, this time we achieved a 10x speedup in our GitHub Actions pipeline using a hybrid Runner deployment. This improvement allowed our product deployment to become more agile.
If you find Hetzner’s ARM64 machines interesting after reading this article, you can try using our referral link to register with Hetzner: https://hetzner.cloud/?ref=6moYBzkpMb9s (using our referral link will provide you with 20EUR usable credit after successful registration, and we will receive a 10EUR reward, which will support our product development).
References
The WebP Cloud Services team is a small team of three individuals from Shanghai and Helsingborg. Since we are not funded and have no profit pressure, we remain committed to doing what we believe is right. We strive to do our best within the scope of our resources and capabilities. We also engage in various activities without affecting the services we provide to the public, and we continuously explore novel ideas in our products.
If you find this service interesting, feel free to log in to the WebP Cloud Dashboard to experience it. If you’re curious about other magical features it offers, take a look at our WebP Cloud Services Docs. We hope everyone enjoys using it!
Discuss on Hacker News