Google Cloud Professional Machine Learning Engineer Certification Preparation Guide // Dmitri Lerko

Last updated on 11th of October, 2020. Certification is expected to go into GA on 15th of October.

Authors

Dmitri - knew GCP, didn’t know any of the ML stuff

Steven - knew ML, didn’t know any of the GCP stuff

Perfect reason to collaborate!

New certification

Google announced a new Machine Learninng Engineer beta certification in July with certifications taking place from 15th of July to 21st of August. Dmitri has attempted it on 16th of August. Most of the material below was written before the attempt and are a true reflection of a study guide, not an exam dump. Dmitri is a strong advocate of fair certifications therefore this guide is to motivate a wider audience to learn more about ML and GCP. Steven

Prior knowledge

Dmitri works with GCP professionally for over 2 years. I have GCP 7 certificates, including Data Engineer and DevOps Engineer ones that I find most helpful for this certification. I knew nothing about ML and Google’s ML production prior to starting my ~3-week preparation in the evenings and weekends. Steven is a recent graduate and has taken up a number of Machine Learning modules, so he has a strong theoretical knowledge, but lacked practical application with GCP.

Approach taken with this guide

We used Google’s own certification guide as a bedrock for our studies. It gave us a broad range of topic to try and find knowledge for. Additionally, we are listing materials that we used to prepare and lastly some final tips that don’t fall into either of these categories. Domain of ML and GCP is absolutely huge, so unless you are expert in ML and only need to brush up on GCP, give this certification plenty of time to learn. Even if you’ll learn everything listed below it won’t guarantee a Pass. Apply your newly aquired knowledge in practice and learn from your mistakes!

Learning resources used

Google Cloud Solutions (Free)

Relevant Open Source tools

Machine Learning Glossary

Things you should know really well even if they are not in the guide

Differences between ANNs, CNNs, RNNs, LSTM’s. Which is Appropriate?
How to appropriately split your data into Train/Validate/Test examples.
Neuron Activations: Many possibilities, which is appropriate?
Output Activation: How many outputs? Is the output binary? Probabilistic?
Gradient Descent: Batch, Mini-Batch, Stochastic. What are the differences? What are the consequences of these differences?
Optimisers: Know the differences, and relative strengths of each.
Evaluating model performance. Accuracy is rarely a meaningful metric and frequently misleading.
How to recognise and deal with Over and Under fitting. eg. achieved Better/Worse performance on training than testing.
How to recognise and deal with Class Imbalance. Can look like overfitting.
Hyperparameters: Learning Rate, Dropouts etc. Are they appropriate?
Regularisation: L1, L2 what are the differences? Is regularisation appropriate?
How to recognise and deal with Vanishing/Exploding Gradients.
Collaborative filtering
Savings costs on developing, training and serving ML models in GCP
tf.data
TFRecord
AI Platform, Dataflow - learn and try out all you can
Optimising for training speed, serving speed
Dealing with TensorFlow and machine Out of Memory issues
Re-using existing models in Google products
Where and why use more CPU, memory, GPU or TPU.
Understand customer lifetime value
Study as many solutions as you can, they are well written, full of useful content and great bridge between theory and practice.