Using Cloudflare Workers at the edge to enable services to source data from nearby locations – reducing global average latency.

Oct 9, 2023

If you have a service and wish to achieve fast response times globally, what would be your initial thought? CDN? GeoDNS?

Let’s start with a simple example. Suppose you have a service whose purpose is to provide a UUID to people who visit it. Imagine that your service is deployed in Germany, and your server’s IP address is 159.69.27.1. Then, you configure DNS to point uuid.example.com to this IP address. So, when someone visits uuid.example.com, they receive a UUID.

This is the first step: when anyone accesses uuid.example.com, it resolves to the corresponding IP address. Then, the browser initiates a request, and after a few seconds, it successfully returns the desired result. However, users from different parts of the world may experience varying delays. For instance, the latency from mainland China might look like this:

PING 159.69.27.1 (159.69.27.1) 56(84) bytes of data.
64 bytes from 159.69.27.1: icmp_seq=1 ttl=37 time=373 ms
64 bytes from 159.69.27.1: icmp_seq=2 ttl=37 time=423 ms
64 bytes from 159.69.27.1: icmp_seq=6 ttl=37 time=388 ms

While the latency from the United States appears as follows:

PING 159.69.27.1 (159.69.27.1) 56(84) bytes of data.
64 bytes from 159.69.27.1: icmp_seq=1 ttl=56 time=114 ms
64 bytes from 159.69.27.1: icmp_seq=2 ttl=56 time=113 ms
64 bytes from 159.69.27.1: icmp_seq=4 ttl=56 time=113 ms

To prevent exposing your server’s IP address, which could lead to attacks, and to “accelerate access,” many people might choose to use a CDN, such as Cloudflare. When you integrate your domain with Cloudflare, you will notice that uuid.example.com no longer resolves to your server’s IP but might point to an IP like 104.16.132.229 owned by Cloudflare.

At this point, when other requests access your service, you will observe that the latency is around 10ms when using the ping command:

PING 104.16.133.229 (104.16.133.229) 56(84) bytes of data.
64 bytes from 104.16.133.229: icmp_seq=1 ttl=62 time=10.8 ms
64 bytes from 104.16.133.229: icmp_seq=2 ttl=62 time=10.2 ms
64 bytes from 104.16.133.229: icmp_seq=3 ttl=62 time=10.3 ms

At this stage, the latency appears to have decreased based on the ping results, leading many to believe that the actual service latency has improved. However, it’s important to note that this latency is only from your visitors to Cloudflare’s Anycast nodes. While the ping times are low, the actual HTTP requests still need to traverse the public internet to reach your origin server. To measure the actual latency, you need to look at the Time To First Byte (TTFB).

KeyCDN provides a Performance Test that makes it easy to assess a service’s latency in various major regions. Taking WebP Cloud Services—Public Service as an example, since your servers are all located at Hetzner in Germany, the CONNECT latency is low due to Cloudflare’s CDN. However, when examining TTFB, you’ll notice that only the latency in the German region is low, while other regions experience latencies above 100ms.

WebP Cloud Services—Public Service offers a reverse proxy for Gravatar and GitHub Avatars, addressing two main issues:

Chinese mainland users cannot directly access Gravatar, for example, at this address: https://www.gravatar.com/avatar/09eba3a443a7ea91cf818f6b27607d66.
When serving these images, it provides WebP conversion, significantly reducing image file sizes with minimal impact on image quality, thereby accelerating overall website loading speed.

Additionally, this service is public and completely free, with a substantial user base, including but not limited to websites like CNX Software and Indienova.

From Cloudflare’s statistics dashboard, we can see that in the past 30 days, the service has handled over 6 million requests, with the majority coming from the United States and China:

Cloudflare Stats

It’s worth noting that, except for China Mobile users, most Chinese visitors are routed to Cloudflare’s Western US nodes, typically located in SJC.

Based on our theory, approximately half of our users first access Cloudflare’s US West nodes and then traverse the public internet to reach our origin servers in Germany, resulting in an additional latency of over 110ms. This gives users the impression of slow service response times.

So, how should we address this?

In this scenario, several implicit conditions exist:

We need to continue using Cloudflare to protect our origin server’s address and perform certain computations and WAF rules at Cloudflare’s edge.
We cannot directly “migrate” the service to the United States because we still have European users. Thus, we need servers in both the United States and Europe.
Given that the majority of visitors are from the United States and China, with Chinese users routed to US nodes, our optimization focus should be on improving access speed in the United States.
Our goal is to direct users from the United States and China to servers in the United States, European users to servers in Germany, and users from other regions to the nearest servers.

With these considerations in mind, we’ve come up with several solutions:

Using Private ASN + IPv6: Deploy nodes in the United States and Europe using services like Vultr, and use BGP Anycast for load balancing. This approach is similar to the one discussed in Nova’s blog post, “Simulate Argo——Building an IPv6 AnyCast Network Behind Cloudflare.” The cost includes ASN fees, IPv6 costs, Vultr fees, and the overhead of maintaining the network, making it somewhat complex.
Directly Using BuyVM’s Anycast Service: Purchase a VPS in three locations (US and Europe) and use BuyVM’s Anycast service for load balancing. This option involves the cost of BuyVM’s VPS, approximately $10.5 per month.
Using Cloudflare Load Balancer for Geo Load Balancing: Utilize Cloudflare Load Balancer for geo load balancing. The cost includes Cloudflare Load Balancer fees, roughly $5 per month for 500,000 requests, with an additional $0.5 per month for every 500,000 requests beyond that. If we aim to achieve region-based routing as per our requirements, the cost would be $20.5 per month, based on our 6 million monthly requests.
Using Cloudflare Workers: Leverage Cloudflare Workers, a serverless service deployed in all of Cloudflare’s data centers. The cost involves Cloudflare Workers fees, approximately $5 per month (handling up to 10 million requests per month, exceeding our monthly request count).

From the above plan, it appears that using Cloudflare Workers is the more worry-free solution. Pay up and let’s roll!

Cloudflare Workers

Cloudflare Workers is a serverless service provided by Cloudflare that allows us to deploy code in all of Cloudflare’s data centers. It also enables us to store data using Cloudflare Workers KV, allowing us to deploy code globally and read/write data worldwide.

For our specific use case, the primary logic for using Cloudflare Workers is as follows:

Given a request, on the Workers platform, determine the country of origin based on the source IP address (which is highly likely to be the location of the executing Workers machine).
Based on a predefined mapping, route the request to the physically closest server.
Additionally, handle various types of abnormal requests and implement automatic failover logic.

From https://developers.cloudflare.com/fundamentals/reference/http-request-headers/#cf-ipcountry, we can learn that for each request, we can obtain the code of the request’s source region using CF-IPCountry. In our case, for simplicity, we are planning traffic based on continents. Therefore, we can quickly create a simple mapping as follows:

function getContinentByISOCode(isoCode) {
    const continentMap = {
        'AD': 'Europe',
        'AE': 'Asia',
        ...
        'CN': 'North America', // China should be Asia, but we're using North America because China users are routed to the North America Edge
        ...
        'ZW': 'Africa',
      };
  
    const continent = continentMap[isoCode];
  
    if (continent) {
      return continent;
    } else {
      return 'Unknown';
    }
  }

The next step is to launch services in various regions, plan the service’s endpoint addresses, and map our actual backend services in Workers based on continents as follows:

const BACKEND_MAP = {
    ...
    'Europe': 'https://eu-west-2-entrance.webp.se',
    'North America': 'https://us-west-2-entrance.webp.se',
    ...
    'Unknown': 'https://eu-west-1-entrance.webp.se'
}

Finally, our Workers code can look something like this:

export default {
	async fetch(request, env, ctx) {
		const url = new URL(request.url);
		const path = url.pathname;
		// Original Path: /avatar/?d=mm, full path with query string
		const original_path = url.pathname + url.search;
		const CF_IP_COUNTRY = request.headers.get('cf-ipcountry');

		const continent = getContinentByISOCode(CF_IP_COUNTRY);
		const backend_url = BACKEND_MAP[continent];

		return handleProxy(request, backend_url, original_path, url.hostname,CF_IP_COUNTRY);
	},
};

handleProxy function as follows：

async function handleProxy(request,backend_url, path, secret_host,CF_IP_COUNTRY){
	// Host: eu-public-service.webp.se
	// some-secret-header-to-backend: gravatar.webp.se
	const headers = {
		'Accept': request.headers.get('Accept'),
		'User-Agent': request.headers.get('User-Agent'),
		'Referer': request.headers.get('Referer'),
		'x-real-ip': request.headers.get('x-real-ip'),
		'some-secret-header-to-backend': secret_host,
		'country-code': CF_IP_COUNTRY,
	};

	const backend_path = backend_url + path;

	const timeoutPromise = new Promise((resolve, reject) => {
		setTimeout(() => {
		  reject(new Error('Timed out'));
		}, 2500);
	  });
	  
	const fetchPromise = fetch(backend_path, {
		headers: headers
	  });

	try {
		const res = await Promise.race([fetchPromise, timeoutPromise]);
		if (res.ok) {
			return res;
		}
	} catch (error) {
		// Timed out, continue with backup backends
	}

	// Some other failover logic here...

	return res;
}

Isn’t it straightforward?

Note that the handleProxy(request, backend_url, path, url.hostname, CF_IP_COUNTRY); function has three parameters: backend_url, path, and url.hostname. This is because our service is accessed externally via addresses like gravatar.webp.se rather than eu-west-2-entrance.webp.se. However, when Workers uses fetch() to access the origin, it can only use the latter. So, here we need to pass an additional header in fetch() to inform the backend service of the actual domain being requested.
For example, in a Fetch request, the Host header is eu-west-2-entrance.webp.se, and the some-secret-header-to-backend header is gravatar.webp.se. When our actual backend detects the presence of the some-secret-header-to-backend header, it treats this header as the Host for evaluation.

Effect Comparison

Before using Workers, all origin servers were in Hetzner Germany.

After using Workers, the origin servers are in Hetzner Germany and Hetzner Hillsboro. You can see a significant decrease in TTFB at both testing points in the United States, from 300+ms to 100+ms.

You can determine from the x-powered-by header which region’s node is serving the request:

Hetzner Germany

Hetzner Hillsboro

As of the time of this article’s publication, we have been running on this architecture for nearly 3 days. Based on monitoring data, the HIO node took over approximately half of the traffic immediately upon startup, which aligns with our expectations.

Stats from Workers:

References

The WebP Cloud Services team is a small team of three individuals from Shanghai and Malmö. Since we are not funded and have no profit pressure, we remain committed to doing what we believe is right. We strive to do our best within the scope of our resources and capabilities. We also engage in various activities without affecting the services we provide to the public, and we continuously explore novel ideas in our products.

If you find this service interesting, feel free to log in to the WebP Cloud Dashboard to experience it. If you’re curious about other magical features it offers, take a look at our WebP Cloud Services Docs. We hope everyone enjoys using it!

Discuss on Hacker News