Skip to content

Latest commit

 

History

History
60 lines (46 loc) · 5.56 KB

README.md

File metadata and controls

60 lines (46 loc) · 5.56 KB

google_cloud_data_engineer

Introduction

This is a repository/log/journal for my first ultralearning project ... passing the Google Cloud Professional Data Engineer Certification Exam in six months. The first commit to this repository was February 25, 2021, and my goal is to pass the certification exam by August 25, 2021. My goal is to put all of my notes and resources and projects related to achieving this goal on this repository. This will help me reach by reinforcing the following:

  • Organizing my thoughts, notes, learnings, etc. using a simple markdown-based format.
  • Getting me in the habit of committing to GitHub every day.
  • Improve my writing capabilities.
  • Logging everything I do to keep myself accountable and also provide a detailed account to others who might want to take on a similar ultralearning project.

Inspiration

Two big inspirations for doing this.

  1. Scott Young, the author of Ultralearning. I've always been an avid learner and love doing it, but since graduate school it has been very unfocussed learning (I have been busy after all!). I just read the book this year (February 2021), and it rekindled the fire for learning.
  2. Daniel Bourke. Daniel is a self-taught machine learning engineer, writer, and creator who began diving into machine learning five years ago and hasn't looked back. He's a great example of ultralearning principles applied to my field.

Project path and plan

All while organizing my notes as I'm learning and posting progress to medium on a bi-monthly basis.

Learning habits

Here is the general day-to-day/week-to-week approach for this project:

  1. I'll commit to giving this five hours a week. Hopefully I can dedicate a little more to it, but I'm a busy husband, dad, and employee, and an hour each weekday morning is about all that I can commit.
  2. This project will get an hour of each morning's attention. My morning routine will be waking up at 5:30, a light exercise routine of 500 steps and 100 push-ups or 50 pull-ups, then diving in for an hour. I'll work in two 30-minute bursts separated by a five minute break so I can get some more steps in. After all, life isn't fun if you are sitting in front of a screen all day.
  3. The hour of work will be structured as follows:
    1. Coursework/reading for 30 minutes.
    2. Five minute break.
    3. Building something for 30 minutes. This could either be labs or a project that applies what I am learning.
    4. Five minute break
    5. Summarize what I did in the daily log, save and commit notes/code, then on with my day!
  4. Create a system of spaced repetition learning notes so that I can remember what I've learned, but I haven't figured out what exactly that is going to look like yet ...

One key thing I am really curious about is showing myself (and setting an example to others) what I can accomplish with just five hours a week. All it takes is focussed, intentive study and a plan. Surely everyone has five hours a week, right?

Work log

  • March 12, 2021 - 45 minutes. Coursera lab on using Cloud AutoML to detect different types of clouds.
  • March 11, 2021 - 40 minutes. Coursera course learning on ML on unstructured datasets and learning how to do ML on GCP in three different ways: pre-built AI services, modifying a pre-built AI with AutoML, and creating my own models with Keras or BigQuery ML.
  • March 9, 2021 - 45 minutes. Worked in open weather forecasts, grabbing data from thredds and parsing it.
  • March 8, 2021 - 1 hour. Coursera lab on using Dataflow to stream messages from a NY taxi Pub/Sub topic into BigQuery.
  • March 5, 2021 - 30 minutes. Coursera course on data pipelines with Pub/Sub, Dataflow, and Data Studio.
  • March 4, 2021 - 1 hour. Coursera lab on using BigQuery ML to predict whether a return visitor will make a purchase on an ecommerce site. Completed week 1 of GCP Big Data and ML Fundamentals.
  • March 3, 2021 - 1 hour. Explored BigQuery public datasets, looking for something where I could explore BigQuery ML, but no home runs yet. I think tomorrow I'll just look at the ml_datasets dataset and cook something up to start.
  • March 2, 2021 - 1 hour. Coursera course discussing BigQuery and BigQuery ML.
  • March 1, 2021 - 1.5 hours. Coursera course discussed using Cloud SQL, Cloud Storage, and Cloud DataProc to create a recommendation engine and a relational database to populate house rental recommendations to users. Completed the lab.
  • February 27, 2021 - 30 minutes. Began a post on working with JSON-formatted data in BigQuery.
  • February 26, 2021 - 2 hours. Updated this README, constructed a loose project plan and daily habit plan to be able to accomplish the project. 30 minutes of Coursera work. Restructured BigQuery notes.
  • February 25, 2021 - 2 hours. Created the repo, made it through 1.5 hours of the Google Data Engineering Coursera course.