Skip to content

TravisH0301/learning

Repository files navigation

Learning

Repository containing brief notes made during learning.

Table of Contents

  1. Software Engineering
  2. Backend Engineering
  3. Data Engineering
  4. Data Science / Machine Learning
  5. Miscellaneous

1. Software Engineering

Back to table of contents

Python

  • Context Manager: Use of context manager to manage external resources on Python
  • pre-commit: How to set up Git hooks with pre-commit to review code automatically before the commit
  • Multiprocessing and Ray on Python: Instruction on implementing parallel processing on Python using Multiprocessing and Ray, and their comparison
  • Pandas Parallelism via Modin: Instruction on how to run Pandas operations in parallel by using Modin
  • Concurrency: How to achieve concurrency to process multiple tasks asynchronously using threading and asyncio in Python
  • Recursion: Recursion in Python with examples
  • Python Style Guides: Summary of recommended style guides from PEP 8, Google and Black

Network

  • SSH: How to establish SSH session between server and client using public key authentication, and how to transfer files using SFTP
  • Cloud Networking in AWS: Basic networking concepts in AWS

Security

DevOps

  • Git: Instruction to version control using Git
  • GitOps: Information on how GitOps streamlines continuous deployment for a system with declarative desired states (ex. Kubernetes)
  • Git Workflows: Covers different types of development workflows using Git
  • Codefresh: What is Codefresh and its CI/CD pipeline with examples
  • Test-Drive Development (TDD): Definition of Test-Driven Development with examples of unit test in Python using unittest module

2. Backend Engineering

Back to table of contents

Internet

  • Internet: Basic explanation of what internet is, and how information is communicated through internet with different protocol layers
  • HTTP: Characteristics of HTTP, how communication is made between a client and a server using HTTP request and HTTP response, and HTTP/2 & HTTP/3

API

  • REST API: Architectural constraints of REST API

Authentication

  • OAuth: Working mechanism of OAuth to delegate access to the applications

3. Data Engineering

Back to table of contents

Database

  • Database Engine & API: Definition of database engine in database management system and introduction of database engine API such as Open Database Connectivity (ODBC) and Object Linking and Embedding, Database (OLE DB)
  • Distributed Database: Pros & Cons of distributed database with an introduction to the distributed NoSQL database, Apache Cassandra
  • MPP Database: Introduction to Massively Parallel Processing (MPP) and its architectures of grid computing and clustering | Methods of table partitioning: Distribution style & Sorting key
  • Partitioning in Teradata: How data is partitioned in Teradata and how to optimise for queries by further partitioning data in nodes and collecting statistics
  • Query Optimisation in Modern Data Warehouses: Query optimisation methods used in modern data warehouses

Data Modelling

  • Datebase vs Data Warehouse vs Data Lake: Definition of relational database (OLTP & OLAP), data warehousing (architecture - Kimball's & Inmon's, dimensional data modelling, ETL vs ELT & OLAP Cube) and data lake
  • Data Modelling: How to do data modelling (Entity Relationship Diagram) and aspects of relational database & non-relational (NoSQL) database
  • Relational Data Model: How to structure normalised/denormalised data models
  • Types of Fact tables: Different types of fact tables and what they are used for
  • Star Schema & Snowflake Schema: Introduction to star schema & snowflake schema
  • Slowly Changing Dimension (SCD): Types of slowly changing dimensions (SCDs) to adapt to changes in the data source
  • Data Vault: Data vault architecture and its components, and how data vault fits into the medallion architecture
  • Semantic Layer: What is semantic layer, and how it differs from metrics layer, metrics store, and headless BI

Data Pipeline

Data Governance

  • Data Governance: What is data governance? Key components of data governance - processes, people & technology

Event Streaming

Spark

SQL

dbt

Storage

  • Delta Lake: Mechanisms of how Delta Lake works and its benefits

4. Data Science / Machine Learning

Back to table of contents

Statistics

Machine Learning

Deep Learning

5. Miscellaneous

Back to table of contents

Computer Science

  • Binary, Bit & Byte: Explanation of binary, bit and byte, and how they are used in modern computer architecture and character encoding
  • Encoding and Schema: Types of encoding and schema (Avro as an example)

Linux

  • Linux Server: Description of how to connect remote Linux server with some basic Linux terminal commands

Anaconda

Geographic Information System

On-prem SharePoint API

About

Brief notes on learning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published