Skip to content

Latest commit

 

History

History
60 lines (28 loc) · 4.25 KB

development-best-practices.md

File metadata and controls

60 lines (28 loc) · 4.25 KB

Software Development Best Practices

Introduction

This guide aims to establish a set of best practices and guidelines to ensure consistent, maintainable, and efficient code for our ETL projects. By adhering to these principles, we aim to foster collaboration, reduce bugs, and enhance the understandability of our codebase.

Guidelines

  1. Don't Repeat Yourself (DRY): Ensure that a piece of logic exists in one place. Reuse code components to avoid redundancy and make maintenance easier.

  2. Single Responsibility Principle (SRP): Each function or module should only handle a specific task. If a function is doing multiple unrelated jobs, consider splitting it.

  3. Descriptive Naming: Use meaningful names for variables, functions, and modules. For example, instead of naming a variable data_dict, consider processed_data_dictionary.

  4. Organize Related Code: Group related functions and modules together. This makes it easier to navigate the codebase and understand related functionalities.

  5. Avoid Redundant Function Wrapping: Don't create functions that merely wrap another function without adding value.

  6. Shared Libraries for Generic Functions: Only universally useful functions should reside in shared libraries to ensure their reusability across different parts of the pipeline.

  7. Leverage Existing Libraries: Before implementing common functionalities like logging or file handling, check for existing libraries or modules that can serve the purpose.

  8. Keep File Operations Transparent: Ensure file operations like opening, reading, or writing are not buried inside complex functions. This enhances clarity and reduces potential errors.

  9. Avoid Hardcoded Values: Always externalize configuration details such as paths, URLs, or database credentials. Consider using configuration files or environment variables.

  10. Constants Over Variables: If a value doesn't change throughout the pipeline, treat it as a constant rather than passing it around as a variable.

  11. Function Documentation: Each function should be accompanied by a docstring that describes its purpose, parameters, return values, and any exceptions it might raise.

  12. Prioritize Unit Testing: Aim to write unit tests for every piece of code. If a function seems hard to test, reconsider its structure and logic.

  13. Avoid Magic Numbers and Strings: Replace special numbers or strings with named constants. For example, instead of using the number 7 directly in the code, use a constant like DAYS_IN_A_WEEK.

  14. Consistent Error Handling and Logging: Ensure there's a standardized approach to error handling. Log meaningful messages to aid debugging.

  15. Prioritize Readability: Although it's tempting to write condensed code, always prioritize clarity and understandability.

  16. Immutable Data Structures: Prefer using immutable data structures or avoid directly altering inputs to minimize unexpected side effects.

  17. Limit Global State: Minimize the use of global variables as they can make the codebase harder to debug and maintain.

  18. Decompose Large Functions: Break down large functions into smaller, more focused ones to enhance readability and maintainability.

  19. Code Reviews: Regularly review code to ensure adherence to these guidelines and catch potential issues early.

  20. Explicit Dependencies: Make dependencies, especially in ETL jobs, evident and not hidden within functions.

  21. Limit Function Parameter Count: Too many parameters can complicate a function. If a function requires numerous inputs, consider using structured data types or classes.

  22. Inline Comments for Context: While function and module documentation is crucial, inline comments within code blocks are invaluable for providing context and explaining complex logic. Ensure comments are concise and directly related to the code they accompany. They should clarify 'why' something is done (if not immediately obvious) rather than 'what' is being done.


By integrating these best practices into our daily development workflow, we aim to build a robust, efficient, and maintainable ETL pipeline that meets our project's requirements. Remember, these are guidelines to direct our development, but always use your best judgment for specific scenarios.