GCP FinOps #001: Google Cloud Storage - Location, Location, Location
This post marks a start of a series about Google Cloud Platform and FinOps.
Cloud object storage is often labeled as affordable, fast and durable - something that you should consume without a second thought. Object storage like GCS is exactly that, at $20 per terabyte per month you’ll have a hard time building your own solution to match 4 nines of availability and 11 nines of durability. So unless you are building the next Google Drive, you should just use GCS.
Yet, there are avoidable ways to make it very expensive. Let’s explore those.
Location
A property observed across the entire GCP product range is pricing based on the geographical location of the resource. GCS is no exception.
Storing data at-rest in Iowa is the cheapest at $20 per TB per month. The same data in London will cost you $23 (15% increase) and in Sāo Paulo you’ll pay $35 (75% increase).
Data storage locality can be a sensitive subject due to the various regulations around data residency, so put these regulations above any cost considerations.
Store local
Far away, cheaper regions are often a cost trap. They only work if you can avoid egress costs. For example, if my data originates in London (say 1 TB) it will cost me upwards of $120 per TB to move it from GCE instance to an Iowa bucket. Knowing that we save $3 per month on 1 TB stored in Iowa vs London - it will take over 3 years (40 months) to recuperate egress cost alone. While 15% ROI is attractive, however preceding 40-months of negative ROI far outweighs the benefit. A very few cloud users are confident with 40 month retention commitments.
Unless you have a massive data set, long time horizons, and cheaper egress -cross-regional data transfer is a bad idea.
Serve local
Rules are there to be broken. While I’ve explained why storing local is typically cheaper, there are exceptions. The above calculations were made for an at-rest scenario where large amount of data is stored and consumed within the same region.
Imagine a scenario where I do store 10 TB of historical raw data in London. This data is needed by multiple ML teams in Iowa and Oregon as they combine it with similar-sized local datasets to build a model.
Knowing that ML models are seldom good on the first try, I can safely assume that the entire 10 TB data set will be ingested multiple times. We can calculate ROI for 1, 10, and 100 ingestion - in the real world, you don’t know how many times ML model learning will be run. Luckily, we only need to know that if it is run more than once, we are achieving positive ROI.
Option 1:
A simple approach in this situation would be to transfer (Storage Transfer Service is awesome) 10 TB to a bucket in Iowa ($1200 operation) and to a bucket in Oregon ($1200 operation). Now, model training should reference local buckets and not get charged for egress. Cost: $2400 egress + $400 recurring storage fee x 12 months (assumed usefulness of this dataset) = $7200
Option 2:
A more reasonable approach would be to transfer 10 TB to a bucket in Iowa ($1200 operation) and from there to a bucket in Oregon ($100 operation). Cost: $1300 egress + $400 recurring storage fee x 12 months = $6100
Option 3:
The ultimate solution in terms of the cost would be a US
multi-region bucket. This will store data in multiple US regions, increasing storage cost per TB to $26, however, you do not need to duplicate the data in Iowa and Oregon. We are also only paying for egress once.
Cost: $1200 egress + $260 recurring storage fee x 12 months = $4320
The above examples demonstrate that cloud cost considerations can alter cloud architecture. In our case, the cheapest solution (40% cheaper than the initial proposal) is also the simplest and gives elevated availability guarantees - an all-around win.
Lastly, all of the above calculations assumed a $120 per TB inter-continental and $10 per TB intra-continental egress fee. In future posts, I’ll cover options and methods to reduce the egress cost.
Partner local
You probably know that intra-region egress within GCP is free within your organisation. You can have a bucket in europe-west2
in Project A that is consumed by a GCE instance in a Project B which also runs in europe-west2
. Your egress will cost $0.
Now imagine you need to share 1 TB of daily data with an analytics partner running in GCP to gain new insights. It can be worthwhile storing a copy of your data in the same location where a partner is consuming and processing it, avoiding a hefty $120 daily egress fee.
Summary
The location of your GCS buckets can impact both cost of storage as well as the cost of transfer (egress) of your data. Choose the location wisely.