How to achieve cross-zone highly available Layer 7 load balancing in Google Cloud

AllCloud Blog:
Cloud Insights and Innovation

For a while now Google Cloud have rolled out their Layer 7 load balancing service , offering a means of balancing your requests at HTTP level, based on URL patterns.

You can use this type of load balancing to distribute your traffic across zones , regions or based on requested content. Or a custom combination of all.

Please be aware that while this feature has been long awaited for on GCP and you may feel the urge to try it out immediately (like we have), there are still some limitations:

  • The service is not yet generally available, and given that it’s subject to possibly backward incompatible changes, Google stress out it should not be used in production.
  • Because it’s still in preview, the service is not configurable from the developers’ console, you need to use the latest gcloud tools to achieve this.
  • For now it only supports balancing incoming HTTP requests on port 80. No HTTPS either, yet.

In this tutorial I will provide an example of how to configure a simple cross-zone HTTP load balancer . It assumes you have already created the VM instances you want to distribute the requests between.

Pre-reqs :

curl | bash

If you already have it installed, once you try to use gcloud you will be notified if there’s a more recent version available.

  • If it’s the first time you’re using gcloud, make sure you’re authenticated and your default project is set :
gcloud auth login
  • Install the  preview component :
gcloud components update preview

For the purposes of this how-to, let’s consider we have these 2 VMs, one in us-central1-a and the other in us-central1-b : pink-1a and floyd-1b .

Now let’s start setting up the load balancer:

1. We need to first create resource views for each zone we’re aiming at using .  Resource views are used to group your VMs so that you can target specific operations at them. In our load balancing scenario, we will create them to later have these groups referenced by the backend services.

gcloud preview resource-views create pf-resources --zone us-central1-a
gcloud preview resource-views create pf-resources --zone us-central1-b

Once these are created , let’s add the VMs accordingly:

gcloud preview resource-views resources --resourceview pf-resources addinstance pink-1a --zone us-central1-a
gcloud preview resource-views resources --resourceview pf-resources addinstance floyd-1b --zone us-central1-b

2.  Remember to make note of the URIs corresponding to the above resource groups, as we’ll be needing them further on :

gcloud preview resource-views list --zone us-central1-b
gcloud preview resource-views list --zone us-central1-a

You should get something similar to :<project-id>/zones/us-central1-b/resourceViews/pf-resources<project-id>/zones/us-central1-b/resourceViews/pf-resources

3. If you do not already possess a health check , create one at this step, to pair it later with the backend service that you’re going to configure :

gcloud compute http-health-checks create pf-hc

This will setup a health check named pf-hc, requesting / on the default port 80 .

You can further edit it by using

gcloud compute http-health-checks edit pf-hc

, or from the dev console.

4. Next object we are creating is the backend service. This will allow us to define groups of instances ( the resource views we configured at step #1) , as well as define their service capacities , which can be based either on CPU utilization or on RPS .

Issue :

gcloud compute backend-services create pf-service --http-health-check pf-hc

Note that we specified the health check created at the previous step.

5. Now we can go ahead and add our resource views to the backend service. To do so we will edit our backend service :

gcloud compute backend-services edit pf-service

In the file opened with your default text editor you can already see the health check , being presented similar to :

port: 80
timeoutSec: 30

What you need to do is update the backend values , by uncommenting the following lines and inserting the URIs you collected at step#2 , choosing the desired balancing mode , etc. When you choose the balancingMode, you can opt between RATE (requests per second) and UTILIZATION (CPU) . In the bellow example we are using RATE, with a maxRate of 20000 RPS for each resource group .

- balancingMode: RATE
  maxRate: 20000
- balancingMode: RATE
  maxRate: 20000

Pay attention to the indentation of the file,it’s in yaml format , so you can use an online tool like to validate it.

Save & quit , and now you should get an overview of what you have setup, similar to :

Updated [<project-id>/global/backendServices/pf-service].



- balancingMode: RATE

  capacityScaler: 1.0

  description: ''

  group: us-central1-b/resourceViews/pf-resources

  maxRate: 20000

- balancingMode: RATE

  capacityScaler: 1.0

  description: ''

  group: us-central1-a/resourceViews/pf-resources

  maxRate: 20000

creationTimestamp: '2014-08-12T12:34:56.445-07:00'

fingerprint: mpibG2i6bJo=


- pf-hc

id: '7997606913310935188'

kind: compute#backendService

name: pf-service

port: 80

protocol: HTTP


timeoutSec: 5

Important thing to mention, you do not have to create any extra configuration to ensure the cross-zone high availability . As long as the backend service contains resource groups belonging to 2 or more different zones within the same region (like in our example), the load will be spread among all the instances that have been added to the respective resource groups, following the distribution algorithm we defined above.

6. Having created this, you are ready to deliver traffic to your load balancer. For this you need to create an URL Map , to map all the incoming requests to it.

gcloud compute url-maps create pf-map --default-service pf-service

Note that you are using the –default-service flag, since you are only creating a basic cross-zone load balancer, and not splitting your traffic based on URL pattern match*.

7. Following this, you have to create a target proxy to route the requests to the above map. Do so by issuing:

 gcloud compute target-http-proxies create pf-proxy --url-map pf-map

8. Last step is to create a global forwarding rule to handle the incoming requests :

gcloud compute forwarding-rules create pf-rule --global --target-http-proxy pf-proxy --port-range 80

This concludes setting up the L7 load balancer, let’s now test it’s health , like so :

gcloud compute backend-services get-health pf-service

You should get a reply similar to :



- healthState: HEALTHY


  port: 80

kind: compute#backendServiceGroupHealth



- healthState: HEALTHY


  port: 80

kind: compute#backendServiceGroupHealth

You can go ahead and test the high availability by simulating outages on your machines ; verify the above command’s output to see the transition between Healthy and Unhealthy state , while of course checking your access logs too.

To sum up this how-to, these are the concepts you should be familiar with after completing it : resource views , health checks, backend services , URL maps , target proxies, global forwarding rules.

Wrapping up our tutorial, you are now ready to play around with HTTP Load Balancing and try more complex scenarios, like the cross-region or content-based balancing described here :



Full documentation on using the HTTP Load Balancing service can be found here :

Lahav Savir

Founder and CTO, Cloud Platforms

Read more posts by Lahav Savir