Starting 29th of March Google Cloud Professionals attempting Data Engineer certification will be subject to the latest version of the certification exam. The most recognisable difference is the removal of case studies. I was lucky enough to be challenged with and pass the updated version of the Data Engineer certification on the 2nd of April. In my preparation I’ve used materials and exam guide aimed at the v1 of the exam, hence being underprepared for the certification is an understatement. While Data Engineering is not something I’d consider to be a large part of my day-to-day toolkit, I do use BigQuery, Dataproc, Data Studio, GCS, PubSub, Datastore, Cloud SQL and related IAM. Therefore topics of Big Data and Machine Learning are fascinating, novel and challenging to me.

2019 Updated exam guide.

My GCP experience to date

I’ve been learning about GCP from around February 2018 and got fully committed to it from May onwards. I’ve chosen GCP cloud vs AWS as it was better suited for current and future company needs @ Loveholidays.com. We’ve successfully migrated from physical servers with KVM VMs environment to GCP where we are running most of our workloads inside of GKE. Since May I’ve worked with HTTP(S), TCP, Internal load-balancers, firewall rules, regular and hosted/service network VPCs, GKE (including VPC-Native alias IP clusters), VPNs, NAT Instances, Cloud Routers, Cloud Armor, Hybrid Connectivity with Interconnect, GCE instances with private IPs, network tagging, static routes, VPC peering, Stackdriver and few more technologies.

Exam preparation

While hands-on experience is invaluable, sometimes you miss on a bigger picture of the available infrastructure when you find yourself working only with a subset of available GCP network technologies. Here is what I’ve used:

An actual exam

Two hours, 50 questions. It took me 1 hour for the first pass; I had 21 out of 50 questions marked for review (shows the amount of self-doubt this exam will inflict upon you). I’ve finished the entire exam in 1 hour 15 minutes and was presented with much doubted “pass”.

Preparation suggestions

I do not intend to share any of the actual questions as this is against certification’s mission. Topics I’ve covered before are the ones that I’ve found harder or less prepared for after taking all of the training above.

Key topics: BigQuery, BigTable, Dataflow, PubSub

BigQuery

BigTable

Dataflow

PubSub

Data migrations

  • Know when to use Data Transfer Appliance. Hint - slow network, huge dataset, no in-between refreshes.
  • When to use Transfer Service and what are its limitations.
  • Know the cost of storage and availability for various products: BigQuery, BigTable, Cloud SQL, GCS to be able to find the cheapest product for a set of availability/durability criteria.
  • How Dedicated Interconnect impacts your data transfer decisions?
  • How to continuously sync data between on-prem and GCP

Dataproc

ML

IAM

  • How to allow cross team data access to BigQuery and GCS in a large organisation

Misc