Repository containing brief notes made during learning.
- Software Engineering
- Backend Engineering
- Data Engineering
- Data Science / Machine Learning
- Miscellaneous
- Context Manager: Use of context manager to manage external resources on Python
- pre-commit: How to set up Git hooks with pre-commit to review code automatically before the commit
- Multiprocessing and Ray on Python: Instruction on implementing parallel processing on Python using Multiprocessing and Ray, and their comparison
- Pandas Parallelism via Modin: Instruction on how to run Pandas operations in parallel by using Modin
- Concurrency: How to achieve concurrency to process multiple tasks asynchronously using threading and asyncio in Python
- Recursion: Recursion in Python with examples
- Python Style Guides: Summary of recommended style guides from PEP 8, Google and Black
- SSH: How to establish SSH session between server and client using public key authentication, and how to transfer files using SFTP
- Cloud Networking in AWS: Basic networking concepts in AWS
- IAM in AWS: Basic IAM concepts in AWS
- Git: Instruction to version control using Git
- GitOps: Information on how GitOps streamlines continuous deployment for a system with declarative desired states (ex. Kubernetes)
- Git Workflows: Covers different types of development workflows using Git
- Codefresh: What is Codefresh and its CI/CD pipeline with examples
- Test-Drive Development (TDD): Definition of Test-Driven Development with examples of unit test in Python using unittest module
- Internet: Basic explanation of what internet is, and how information is communicated through internet with different protocol layers
- HTTP: Characteristics of HTTP, how communication is made between a client and a server using HTTP request and HTTP response, and HTTP/2 & HTTP/3
- REST API: Architectural constraints of REST API
- OAuth: Working mechanism of OAuth to delegate access to the applications
- Database Engine & API: Definition of database engine in database management system and introduction of database engine API such as Open Database Connectivity (ODBC) and Object Linking and Embedding, Database (OLE DB)
- Distributed Database: Pros & Cons of distributed database with an introduction to the distributed NoSQL database, Apache Cassandra
- MPP Database: Introduction to Massively Parallel Processing (MPP) and its architectures of grid computing and clustering | Methods of table partitioning: Distribution style & Sorting key
- Partitioning in Teradata: How data is partitioned in Teradata and how to optimise for queries by further partitioning data in nodes and collecting statistics
- Query Optimisation in Modern Data Warehouses: Query optimisation methods used in modern data warehouses
- Datebase vs Data Warehouse vs Data Lake: Definition of relational database (OLTP & OLAP), data warehousing (architecture - Kimball's & Inmon's, dimensional data modelling, ETL vs ELT & OLAP Cube) and data lake
- Data Modelling: How to do data modelling (Entity Relationship Diagram) and aspects of relational database & non-relational (NoSQL) database
- Relational Data Model: How to structure normalised/denormalised data models
- Types of Fact tables: Different types of fact tables and what they are used for
- Star Schema & Snowflake Schema: Introduction to star schema & snowflake schema
- Slowly Changing Dimension (SCD): Types of slowly changing dimensions (SCDs) to adapt to changes in the data source
- Data Vault: Data vault architecture and its components, and how data vault fits into the medallion architecture
- Semantic Layer: What is semantic layer, and how it differs from metrics layer, metrics store, and headless BI
- Data Pipeline and Airflow: Introduction of Directed Acyclic Graphs (DAGs) in data pipeline and building DAGs with Apache Airflow
- Data Lineage & Quality in Airflow: Managing data lineage and data quality in Apache Airflow
- Great Expectations: Basic instructions to spin up Great Expectations to implement a validation layer in a data project
- Outbox Pattern in Event-Driven Architecture: Using the outbox pattern in event-driven architecture to address data inconsistency challenge
- Data Governance: What is data governance? Key components of data governance - processes, people & technology
- Apache Kafka: Basics of Apache Kafka
- Apache Spark: Basics of Apache Spark
- SQL Join: Examples for SQL joins; Inner Join, Left Join, Right Join, Full Join, Anti-Join & Cross Join
- Window Functions in SQL: Introduction to window functions in SQL with examples
- SQL Update using a Table: Demonstrates how a target table can be updated using a source table
- Querying Hierarchical Data using Recursive Query: Demonstrates how to design a hierarchical data using adjacency list method and how to query it using a recursive query
- Jinja 101: Covers basic Jinja syntax and functionalities
- Slowly Changing Data Type 1 & 2 in dbt: Demonstrates how to update tables with slowly changing data type 1 & 2 in dbt using Incremental materialization & Snapshot
- Delta Lake: Mechanisms of how Delta Lake works and its benefits
- Measure of Skewness and Kurtosis: Understanding of skewness and kurtosis
- Statistical Feature Selection Methods: Reference to feature selection methods for numercial and categorical data
- Average of Average: Interpretation of different average of average methods
- Image Segmentation by K-Means Clustering: Unsupervised image segmentation by k-means clustering
- Time Series Forecasting: Time series forecasting using statistical modelling
- Neural Network Optimisation: Optimisation methods for neural networks
- Convolution Neural Network: Explantion of convolutional neural network
- Convolutional Encoder Decoder: Variation of convolutional neural network
- VGG model: Variation of convolutional neural network
- Binary, Bit & Byte: Explanation of binary, bit and byte, and how they are used in modern computer architecture and character encoding
- Encoding and Schema: Types of encoding and schema (Avro as an example)
- Linux Server: Description of how to connect remote Linux server with some basic Linux terminal commands
- Anaconda Virtual Environment: Instruction on how to setup Anaconda virtual environment
- Geographic Cooridnate System: Explanation of commonly used geographic cooridnate system
- Authenticating on-prem SharePoint API via NTLM & AD FS: Steps on how to authenticate on-prem SharePoint API via NTLM (NT LAN Manager) & AD FS (Active Directory Federation Services)