Last year was a year of GCP for me. I have maintained a list of its unique, productive and performant features and products. While it is easy to find or produce computational benchmarks (e.g. CPU benchmark points per instance type or network throughput between two VMs) - real performance of any cloud is not in compute, but productivity. Those seeking global maximums for CPU speeds, network latency or storage IOPs will probably find it in their datacentres. However productivity of individuals, teams and organisations as a whole is something much harder to quantify and significantly more critical and valuable. Hardware is a commodity, therefore in the long run most cloud vendors will offer no significant advantage in CPU performance or amount of available RAM between them. Storage and networking are more open to interpretation. Therefore vendors will differ more in these domains.
Having migrated a rather large estate of on-prem applications and VMs to GCP and having experience with two other cloud vendors, I am strongly convinced that GCP is the most productivity-focused cloud platform available. Therefore I present to you my incomplete and highly-biased list of features and products that make GCP cloud platform of choice.
Edit 27th of April, 2019
A week after publishing I have realised that I’ve overlooked GCP product that I can quickly put into top 5 favourite list. Therefore I fall back on computer science to allow me to introduce item number 0.
It allows for secure, TLS 1.2 encrypted connectivity to your Cloud SQL instances from anywhere (no more IP whitelisting) using localhost or socket connection on the host. Its value comes from speed, simplicity, lightweight resource footprint and a plethora of possible applications. Often time security is overlooked due to a barrier to entry in terms of knowledge, know-how and time to implement, Cloud SQL Proxy is a perfect example where secure approach is also the most convenient, which helps to break down the barriers. I can run Cloud SQL Proxy as a tiny docker container on my local laptop and connect to it using 127.0.0.1:3306, while in reality I am connected to a remote database using secure TLS encrypted connection. You can use it as a robust cloud migration tool by running it on-prem alongside your application, while connection to managed Cloud SQL databases, all you need is a binary or container with json credentials. You can run it as a side-car container per each Kubernetes application pod, ensuring security and ease of configuration.
Summary of features:
- Run as a binary or docker container, in any environment. Local, on-prem, cloud.
- TLS 1.2 Encrypted connectivity with Cloud SQL
- No more IP whitelisting
- Better auditability as Cloud SQL Proxy generated IAM audit logging
- Grant access via credentials.json or Service Account
- Connect to multiple Cloud SQL databases via one Cloud SQL Proxy
- Connect to databases in different regions, projects, organisations via one Cloud SQL Proxy
- Supports TCP and socket connectivity
- Simplifies application logic and configuration by pushing security to Cloud SQL Proxy
See an example of CloudSQL Proxy sidecar below.
- name: cloudsql-proxy image: gcr.io/cloudsql-docker/gce-proxy:1.14 command: - /cloud_sql_proxy - -instancesgcp-prodject-name:region:cloud-sql-database-name=tcp:3306 - -credential_file=/secrets/cloudsql/credentials.json securityContext: runAsUser: 2 allowPrivilegeEscalation: false volumeMounts: - name: cloudsql-instance-credentials mountPath: /secrets/cloudsql readOnly: true
1. Simple instance network egress estimation
Each core is a subject to 2Gbps cap for peak performance, up to a maximum of 32Gbps. You’ll never need to consult the documentation and recheck instance specifications.
GCP’s HTTP/HTTPS load-balancers are global by default. Google uses a single anycast IP. It allows you to run backends of your application in different regions and even continents, while Google guarantees to route traffic to the nearest, lowest latency backend. This feature is also great if you want to build up a DR site in nearby-region, with least possible effort.
3. Global VPC
Unlike other Cloud providers, GCP’s VPC is global by default. You don’t need to configure cross subnet, cross zonal or cross-region connectivity. Given the same VPC, every VM, in every zone, subnet and region will be able to talk to each other via fast Google fibre backbone (assuming correct firewall rules are in place).
4. Network tags
Most commonly, firewall rules are configured by specifying the source and destination IP or IP range. While its an industry standard, there are few problems with this approach – More-So in the world of Cloud and ephemeral infrastructure. Predefined IPs and IP ranges are susceptible to ongoing maintenance as you need to expand new source or destination IPs/ranges. IPs are not descriptive and require separate knowledge store or comments on the firewall rules. Firewall rules that involve entire subnets are easier to manage. GCP has taken this to the whole new level with network tags. You can apply up to 64 network tags for each VM, which in turn can be used in the firewall rules. natted network tag is a great example of how you can tag instances in all your projects, GCE and GKE to achieve the same, coherent result - NATing of the ingress/egress traffic. Another great firewall rule selector in GCP is a service account. Which is more secure than network tags (service account can only be changed when the instance is shut down), but instance can just have 1 active service account.
Combined with Global VPC - you are getting the most complete and direct single-carrier private network for your traffic. You can rest assured that your GCP resources on different continents will experience the least latency by utilising Google’s fibre for connectivity.
6. Internal DNS
Essentially you can communicate with other instances on the same network just by knowing instance names. e.g.
ssh sample-instance. This can save a considerable amount of time by not needing to create DNS records manually or remembering IP addresses. DNS names get longer once you start to cater for Global (project-wide) DNS - find out more from the link.
7. Shared VPC
It allows you to separate networking from day-to-day compute administration of projects. You can have a Host project which is well interconnected with On-prem, VPNs and other clouds - while Shared VPC service projects are only users of subnets shared with them by the Host project. This is great for security and principle of least privilege. Also, this can yield cost savings as expensive resources like VPNs, and interconnects can be shared across many projects.
GCP subnets are region wide by default. You won’t have to repeat work per each zone nor worry that newly added zones won’t be available to you. This dramatically simplifies network setup.
Another paradigm shifting feature. Subnet expansion. As long as you are within the IP range of your VPC and not overlapping with other subnets, you can expand the existing subnets IP range. E.g. turning /24 into /16 network with a single command. Mistakes during the early stages of network planning can often be easily rectified.
10. No pre-warming
Spikey traffic? No problem. No pre-warming is needed for the load-balancers. One less operational aspect to worry about in production.
11. No hard-cap on inbound traffic
Per VPC Resource Quotas page: “GCP does not impose bandwidth caps for ingress traffic. The amount of traffic a VM can handle depends on its machine type and operating system. Ingress data rates are not affected by the number of network interfaces a VM has, or any alias IP addresses it uses”. Works for me!
12. Alias IP
It allows you to assign an IP per service on a single machine by creating IP:service pairs. It is also the technology making Shared VPC and Cluster Native Networking possible.
13. Encryption at rest and Encryption in-transit by default
It sounds simple, but achieving anything similar with on-prem would require enormous time and resource commitment. With GCP it’s one less worry to have.
14. Docker image security scanning with Container Registry Vulnerability Scannning
A very simple and intuitive way to add image scanning to your CI/CD, it is especially helpful if you are using Google Container Registry.
It saves you time and effort of CVE patching by utilising free managed base images. More time to focus on whats matter!
GCP has a clear built in hierarchy of resources. You can use GSuite and IAM to minimise setup time of privileges. Google’s own illustration explains it better than I can.
GSuite is a very convenient free tool for managing your users, groups and security. You might say it isn’t free - but I am not talking about entire GSuite set of productivity tools. User and security management tools are free and can be used to setup any number of users. Enforcing and auditing 2SV is very simple too.
You may find just about any application being shipped in the form of the Docker image. Docker hub is one of the favourite places for you to find an image for the problem at hand. Security wise this could be a considerable risk as there are little guarantees that public images are not going to expose and compromise your entire cluster and network. While we can build automation around CICD to prevent this from happening, it will take time to implement and maintain. Instead, Binary Authorization can be used to whitelist images with acceptable prefixes. e.g. Google’s registry.
A free tool allowing you to run vulnerability scans against your GCE/GKE/App Engine applications. During service evaluation, I was able to discover previously unnoticed vulnerabilities and get them resolved. You can run it ad-hoc or on schedule.
20. Managed Istio
GCP and IBM Cloud are the only places where you can get managed Istio at this moment. It is a cutting edge platform for microservices, integrating flawlessly with GKE. Istio allows users to codify security and network policies between applications. It also provides a plethora of features around monitoring, load-balancing, traffic management, rate limiting and circuit breaking.
Excellent product. Using IAP you can expose applications publicly, but lock down access to authorised cloud-identity or GSuite groups. In practice, users of your internal tools avoid the hassle of VPNs, while your admins have simplified, yet very secure method of user access management that is well integrated with GCP. When a user leaves the organisations, he or she loses access to IAP fronted resources automatically, considerably simplifying off-boarding.
DevOps & Productivity
22. Cloud Build
Container-based managed CI/CD service. It is very minimalistic on the first look and does not solve a million and one problems like Jenkins might with its never-ending list of plugins. However, its narrow focus is also its biggest strength. Cloud Build is tightly integrated with GitHub which allows for a seamless experience for your developers. Cloud Build provides plenty of free daily build minutes, so smaller projects are likely to get most of their builds done free of charge. Lastly, it is well integrated with GCP features like GCR, Container Security Scanning, IAMs and GKE. I think that Cloud Build is one of the most undervalued products in GCP. I will blog more deeply about its capabilities soon.
23. Useable default instance Monitoring
Default instance monitoring provides CPU, Network Bytes and Packets and Disk I/O operations and bytes metrics. Monitoring dashboards are accessible per instance or group of instances. The sampling rate is sufficiently high to help investigate common problems, while 30-day retention will likely suffice for the majority of investigations.
gcloud CLI is fantastic as it is. Yet, GCP topped it by releasing gcloud with interactive mode. It provides real-time auto-complete with documentation on the bottom. You can also open a GCP documentation page relevant to the command that you are typing by pressing F8.
This is hardly the most exciting thing for most of the readers. However, when you operate a large number of projects with various resource needs, you find yourself in a position where you regularly have to bump-up quotas. This process is super easy, and often your quota will be approved before you have time to check your email with approval confirmation. More significant requests (I am talking hundreds of IPs, thousands of cores, tens of terabytes of SSD storage) may take hours to days, but usually, it is still pretty quick. My experience around artificial limitations with other cloud vendors was rather painful, so GCP is a substantial improvement.
26. Cloud Shell
Are you using different workstations at work and at home? Want to check up on your GCP Project using friend’s laptop? I bet you’ll miss your finely tuned terminal aliases, local snippets and custom tools. With Google Cloud Shell you have a container based VM available to you via a browser. It comes with 5GB of persistent storage - so nothing valuable goes missing in between the sessions. gcloud, gsuite, kubectl, docker and bunch of build tools come pre-packaged and continuously upgraded for you. A super powerful little gem of a feature!
GCP allows you to securely SSH to your instances via the browser. There are some limitations, but generally, it works great for ad-hoc connectivity.
Compute & Kubernetes
28. Almost homogeneous machine types
GCP does a great job of making end user’s life easy. All you need to know is the number of CPUs and amount of RAM that you require. Check which predefined machine type suits you best and you are good to go. Underlying VM will use CPU family available at the time, would you need to enforce a specific CPU family, you can do so with an extra line of configuration. Networking and storage work the same across the board. I say almost because of memory-optimized machine types and recently announced Compute-Optimised Machine Types.
If simplicity of homogeneous machine-types wasn’t enough to bring a smile on your face, there are custom machine types to do so. Feel like you are forced into a higher tier of predefined machine types just because your application needs an extra GB of RAM or few more cores? No problem - you can request 1 or any even number of CPUs and between 900MB and 6.5GB of RAM per core. Quite a lot of options.
// Google's example // CREATE INSTANCE WITH 4 vCPUs and 5 GB MEMORY gcloud compute instances create my-vm --custom-cpu 4 --custom-memory 5
30. Local SSDs
NVMe SSD on any VM. GCP gives you an ability to add up-to eight 375GB local-SSDs to any instance. Be mindful - those are not persistent, hence the much lower price and higher performance. GPUs can be similarly attached to any instance.
31. Live migration
From time-to-time cloud vendors need to perform maintenance on physical hardware hosting your VM instance. You’ll probably get a notice requesting you to restart your instance by certain date to get it rescheduled on a different host. Not a thing in GCP, there is a clever multi-step way which allows to migrated running instances from one host to another without noticeable impact to VM. I wish I could do this with my pods.
I know it’s an entire product, but it’s also an industry standard for managed Kubernetes. If you are planning to run applications on Kubernetes - I’d question any choice other than GKE. GKE is the first-class citizen of the GCP, and it is seamlessly integrated with networking (Alias IP, Host/Service project networking, container-native load-balancing), load-balancing, CDN, Cloud Storage, GCR, StackDriver, Deployment Manager, certificates, billing and IAM. Moving from self-managed bare-metal cluster to GKE was one of the greatest things that happened to me in the last few years. The magnitude of change is comparable with migration from on-prem to cloud.
You are only given one type of persistent SSD storage and its fast. While larger SSDs benefit from having increased throughput and IO, there are no classes in persistent SSD disks. Also, instances with more CPU cores might be able to drive more throughput/IO from the large SSD disks when compared to less mighty VMs. Persistent SSDs merely perform well without costing you extra.
Google Cloud Storage has consistent retrieval time across all storage classes. Therefore you can benefit from cost savings on those rarely accessed files without worrying about possible slow retrieval times when there is a need. Keep in mind that Coldline is not cost effective when files are retrieved more than once per year, while Nearline when files are retrieved more than once per month.
GCP makes commitments (reservations) very simple. You specify a region, a quantity of CPUs and GBs of RAM and choose duration 1 or 3 years. That’s it. No instance types, zones, partial payment plans. Standard machine types can receive up to 57% discount vs on-demand price. While reservations are not unique to GCP, they are the most streamlined here.
Are you continuously running instances in GCP, but 1 or 3-year commitments are not an option? You can still benefit from cost reductions from sustained use discount. It is applied automatically after VM is running for 25%, 50% or 75%+ of the month. The longer you run, the cheaper it gets. You don’t have to keep the same VM running continuously. GCP will collapse your timeline of VMs and will aim to provide most significant qualifying discount. You can get up to 30% off your compute instances with no effort at all.
GCP sells its over-capacity using preemptible VMs. It is even cheaper than 3-year commitments. Your VM can be taken away at any time with 2-minute notice. Otherwise, your VM will be terminated after a 24 hour period. When VMs are preempted within the first 10 minutes, you are not charged at all. As an icing on the cake, GCP made it super easy to add preemptible VMs to managed instance groups, GKE node pools and Dataproc worker pools. Preemptible VMs should be the primary cost-saving avenue for the adventurous, shoe-string budget projects. Unless you know what you are doing, preemptible VMs with local SSDs is a recipe for disaster.
GCP will monitor utilisation of your VMs (having stackdriver agent installed makes it even better) and produce recommendations whether VMs need to be downsized or increased in size. In my case, the vast majority of the recommendations are downsizing. GCP even tells you potential savings from this change.
For those of you running GKE workloads, Cluster Autoscaler is a great way to save money. It automatically detects that additional nodes are needed to schedule all of the pods and hence scales up an appropriate number of nodes. Cluster autoscaler can also optimise node utilisation by rescheduling currently running pods to other nodes. Once an underutilised node is clear of application pods it gets cordoned and terminated. Lastly, you can further improve autoscaling by introducing Horizontal Pod Autoscalers which will scale Kubernetes deployment pods based on CPU load or other custom metrics.
One of a kind serverless data warehouse and analytics product. It provides a very cheap storage and pay as you go retrieval and analytics capabilities. GCP constantly enhances BigQuery’s feature set by providing new integrations, improving performance, ML capabilities and streaming data processing.
41. Cloud Spanner
Another one of a kind product developed by Google to bring SQL into the era of global scale datastores. It is strongly consistent, highly available, horizontally scalable and global database, but comes at cost. 1 Petabyte global database will set you back at over $4m per month, but then again, its a petabyte of globally available, strongly consistent SQL data.
Google’s open sourced end-to-end machine learning platform. While GCP comes with VM images, Marketplace products and easy GPU integrations to maximise TensorFlow usefulness, it also has something very unique - Tensor Processing Units, purpose built ASICs that beat even GPUs at power/performance/cost spectrum.