Coding best practices ensure that your code is readable, maintainable, and scalable. Moreover, following the best engineering practices will make your code has less or no bug and streamline the code review process. This article summarizes the best practices and conventions of programming in Python.
PyCharm, Anaconda/Spyder and VS Code are among the most popular IDEs for Python programming. Personally, I use PyCharm Professional version since I enjoyed its features such as Docker integration, PEP 8 style checking, Jupyter notebook support, interactive Python console, VCS support, test-driving coding support, and database & SQL support, etc. BTW, PyCharm Professional version is free for students and open source project contributors.
Every project should be self-contained. I highly recommend to use Docker to containerize your application, and unify your development, test, and production environments. If this is not possible, you should at least create a virtual environment for each project. This will avoid any clashes, since different projects may need different versions of python or library.
Before we implement anything, it is important to have the architect and design mindset in the first place. We need to think about questions like who will use our system or service, how do we use it, how does the component to-be-implemented integrate with the rest of the system, how can we make code more modular, do we need a base class, what abstract methods do we need to have, how can we make functions as independent and decoupled as possible, what test cases should I have, what could possibly go wrong with my code, etc.
Simplicity, readability, consistency, extensibility, modularity, and scalability are key factors to consider when programming.
PEP 8 is the de facto coding style guide for Python.
Importing libraries should be at the top of the file, just after any module comments and docstrings, and before module globals and constants. Different modules should be on separate lines, and imports should be grouped in the following order and separated by a blank line to make it clear where each module is coming from:
- Python standard libraries (e.g., os, sys, and time, etc.)
- Third party libraries (e.g., pandas, numpy, and seaborn, etc.)
- Self developed modules
Absolute imports are recommended, and wildcard imports (from import *) should be avoided.
A good example of library imports are shown as below:
import os
import sys
import pandas as pd
import numpy as np
from utils import get_logger, get_config
from foo.bar.your_class import YourClassAvoid one-letter variables such as l, O, I since they are indistinguishable from the numerals one and zero in
some fonts.
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.
Class and exception/error names should normally use the CapWords convention, e.g., MyClass.
Function names should be lowercase, with words separated by underscores if necessary. Additionally, use single leading
underscore for protected and internal methods (e.g., _single_leading_underscore(self, ...)) and double leading
underscore for private methods (e.g., __double_leading_underscore(self, ...)).
Variable names follow the same convention as function names.
Constants are usually defined on a module level and written in all capital letters with underscores separating words.
Examples include MAX_OVERFLOW and TOTAL.
- Top-level functions and class definitions should be separated by two blank lines.
- Method definitions inside a class should be separated by one blank line.
- Blank lines can be used in functions, sparingly, to indicate logical sections.
- The end of a python script should be followed by one blank line.
Although it is not required, having typing for public functions will substantially improve readability and reusability. Below is an example of python typing:
def greeting(name: str) -> str:
return 'Hello ' + namePlease refer to python typing module for various types of typings such
as List, Tuple, Dict, Iterable, Callable and Optional, etc.
Please refer to PEP 257 for docstring guidelines. Use one-line docstrings for obvious functions. For example,
"""Return the pathname of ``foo``."""Multiline docstrings should include the following items:
- Summary line
- Use case
- Arguments
- Return type
"""Train a model to classify Foos and Bars.
Usage::
>>> import klassify
>>> data = [("green", "foo"), ("orange", "bar")]
>>> classifier = klassify.train(data)
:param train_data: A list of tuples of the form ``(color, label)``.
:rtype: A :class:`Classifier <Classifier>`
"""Limit all lines to a maximum of 79 characters (72 for docstrings/comments). The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation.
Personally, I recommend to use PyCharm IDE, which will automatically check PEP 8 conventions for your code. Open source libraries such as autopep8, pep8, and flake8 are also very helpful to check if your code follows the PEP 8 best practices and conventions.
In reality, few reviewer will spend more than 20 minutes to review your PR or CR. To facilitate the review process, make sure your PR or CR focuses on one thing and your commits are small and incremental. To enable this, it is a good practice to make your user story or task simple, small, and independent, and connected with the PR or CR. It is also helpful to add description, acceptance criteria, and test cases, etc. to facilitate reviewers, scrum master, and testers to review and test your code.
NEVER put passwords, API keys, database connections string or other credentials in your code. Additionally, do not commit and push any sensitive data into your repositories. These confidential info will be seen by others when they are pushed to your remote repositories. Use online service such as Azure Key Vault and Azure Data Lake or Blob storage instead.
It is extremely important to write unit tests for your code. For each function you implement, you should add test cases such as edge cases, special cases, normal cases, and any test cases could possibly lead to a bug. Strive for 100% code coverage, but don't get obsessed with the coverage score. It is a good practice to make each function independent of other functions, and make sure a unit test focuses on one tiny bit of functionality and your tests are isolated. Moreover, don't interact with a real database or network. Use a separate in-memory or test database that gets torn down or use mock objects. Use mock and patch when you want to control or replace parts of your code under test. Generally, each class or model should have one test case class.
- Keeping commented code
- Having unused imports
- Inconsistency in naming
- Long and involved function
- Missing whitespace around operator or after comma in arguments, etc.
- Using
print()in production code