We're happy you want to write a Python sample! Like a lot of Pythonistas, we're opinioned and fussy. This guide is a reference for the format and style expected of samples contributed to the python-docs-samples repo. The guidelines below are intended to ensure that all Python samples meet the following goals:
- Copy-paste-runnable. A developer should be able to copy and paste the code into their own environment and run it with as few modifications as possible.
- Teach through code. Each sample should demonstrate best practices for interacting with Google Cloud libraries, APIs, or services.
- Idiomatic. Each sample should follow widely accepted Python best practices as covered below.
This section covers guidelines for Python samples. Note that Testing Guidelines are covered separately below.
Each sample should be in a folder under the top-level folder of python-docs-samples that corresponds to the Google Cloud service or API used by the sample. For example, a sample demonstrating how to work with BigTable should be in a subfolder under the python-docs-samples/bigtable folder.
Conceptually related samples under a service or API should be grouped into a subfolder. For example, App Engine Standard samples are under the appengine/standard folder, and App Engine Flex samples are under the appengine/flexible folder.
If your sample is a set of discrete code snippets that each demonstrate a
single operation, these should be grouped into a snippets
folder. For
example, see the snippets in the
bigtable/snippets/writes
folder.
If your sample is a quickstart — intended to demonstrate how to quickly get started with using a service or API — it should be in a quickstart folder.
Samples should support Python 3.6, 3.7, and 3.8.
If the API or service your sample works with has specific Python version requirements different from those mentioned above, the sample should support those requirements.
Source code files should always begin with an Apache 2.0 license header. See the instructions in the repo license file on how to apply the Apache license to your work. For example, see the license header for the Datastore client quickstart sample.
If, and only if, your sample application is a command-line application, then include a shebang as the first line. Separate the shebang line from the rest of the application with a blank line. The shebang line for a Python application should always be:
#!/usr/bin/env python
Don't include shebang lines in web applications or test files.
All Python samples should follow the best practices defined in the PEP 8 style guide and the Google Python Style Guide. The automated linting process for Python samples uses flake8 to verify conformance to common Python coding standards, so the use of flake8 is recommended.
If you prefer to use pylint, note that Python samples for this repo are not required to conform to pylint’s default settings outside the scope of PEP 8, such as the “too many arguments” or “too many local variables” warnings.
The use of Black to standardize code formatting and simplify diffs is recommended, but optional.
In addition to the syntax guidelines covered in PEP 8, samples should strive to follow the Pythonic philosophy outlined in the PEP 20 - Zen of Python as well as the readability tenets presented in Donald Knuth's Literate Programming. Notably, your sample program should be self-contained, readable from top to bottom, and fairly self-documenting. Prefer descriptive names, and use comments and docstrings only as needed to further clarify the code’s intent. Always introduce functions and variables before they are used. Prefer less indirection. Prefer imperative programming as it is easier to understand.
Very few samples will require authoring classes. Prefer functions whenever possible. See this video for some insight into why classes aren't as necessary as you might think in Python. Classes also introduce cognitive load. If you do write a class in a sample, be prepared to justify its existence during code review.
Always prefer descriptive function names, even if they are long.
For example upload_file
, upload_encrypted_file
, and list_resource_records
.
Similarly, prefer long and descriptive parameter names. For example
source_file_name
, dns_zone_name
, and base64_encryption_key
.
Here's an example of a top-level function in a command-line application:
def list_blobs(bucket_name):
"""Lists all the blobs in the bucket."""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blobs = bucket.list_blobs()
for blob in blobs:
print(blob.name)
Notice the simple docstring and descriptive argument name (bucket_name
implying a string instead of just bucket
which could imply a class instance).
This particular function is intended to be the "top of the stack" - the function executed when the command-line sample is run by the user. As such, notice that it prints the blobs instead of returning. In general, top of the stack functions in command-line applications should print, but use your best judgment.
Here's an example of a more complicated top-level function in a command-line application:
def download_encrypted_blob(
bucket_name, source_blob_name, destination_file_name,
base64_encryption_key):
"""Downloads a previously-encrypted blob from Google Cloud Storage.
The encryption key provided must be the same key provided when uploading
the blob.
"""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(source_blob_name)
# Encryption key must be an AES256 key represented as a bytestring with
# 32 bytes. Since it's passed in as a base64 encoded string, it needs
# to be decoded.
encryption_key = base64.b64decode(base64_encryption_key)
blob.download_to_filename(
destination_file_name, encryption_key=encryption_key)
print(f'Blob {source_blob_name} downloaded to {destination_file_name}.'
Note the verbose parameter names and the extended description that helps the
user form context. If there were more parameters or if the parameters had
complex context, then it might make sense to expand the docstring to include
an Args
section such as:
Args:
bucket_name: The name of the cloud storage bucket.
source_blob_name: The name of the blob in the bucket to download.
destination_file_name: The blob will be downloaded to this path.
base64_encryption_key: A base64-encoded RSA256 encryption key. Must be the
same key used to encrypt the file.
Generally, however, it's rarely necessary to exhaustively document the parameters this way. Lean towards unsurprising arguments with descriptive names, as having to resort to this kind of docstring might be extremely accurate but it comes at the cost of high redundancy, signal-to-noise ratio, and increased cognitive load.
Argument types should be documented using Python type annotations as introduced in PEP 484. For example:
def hello_world(name: string):
print(f"Hello {name}!")
If there is an Args
section within the function's docstring, consider
documenting the argument types there as well. For example:
Args:
credentials (google.oauth2.credentials.Credentials): Credentials
authorized for the current user.
When documenting primitive types, be sure to note if they have a particular set
of constraints. For example, A base64-encoded string
or Must be between 0 and 10
.
Each sample should have a README.md
file that provides instructions for how
to install, configure, and run the sample. Setup steps that cover creating
Google Cloud projects and resources should link to appropriate pages in the
Google Cloud Documentation, to avoid
duplication and simplify maintenance.
Every sample should include a requirements.txt file that lists all of its dependencies, to enable others to re-create the environment that was used to create and test the sample. All dependencies should be pinned to a specific version, as in this example:
Flask==1.1.1
PyMySQL==0.9.3
SQLAlchemy==1.3.12
If a sample has testing requirements that differ from its runtime requirements
(such as dependencies on pytest or other
testing libraries), the testing requirements may be listed in a separate
requirements-test.txt
file instead of the main requirements.txt
file.
Sample code may be integrated into Google Cloud Documentation through the use
of region tags, which are comments added to the source code to identify code
blocks that correspond to specific topics covered in the documentation. For
example, see
this sample
— the region tags are the comments that begin with [START
or [END
.
The use of region tags is beyond the scope of this document, but if you’re using region tags they should start after the source code header (license/copyright information), imports, and global configuration such as initializing constants.
Sample code should use standard Python exception handling techniques as covered in the Google Python Style Guide.
Samples should include tests to verify that the sample runs correctly and generates the intended output. Follow these guidelines while writing your tests:
- Use pytest-style tests and plain
asserts. Don't use
unittest
-style tests orassertX
methods. - Whenever possible, tests should allow for future changes or additions to APIs that are unrelated to the code being tested. For example, if a test is intended to verify a JSON payload returned from an endpoint, it should only check for the existence of the expected keys and values, and the test should continue to work correctly if the order of keys changes or new keys are added to the response in a future version of the API. In some cases, it may make sense for tests to simply verify that an API call was successful rather than checking the response payload.
- Samples that use App Engine Standard should use the App Engine testbed for system testing, as shown in this example.
- All tests should be independent of one another and order-independent.
- We use parallel processing for tests, so tests should be capable of running in parallel with one another.
- Use pytest's fixture for resource setup and teardown, instead of having them in the test itself.
- Avoid infinite loops.
- Retry RPCs
Tests for samples should follow the “Arrange, Act, Assert” structure:
- Arrange — create and configure the components required for the test. Avoid nesting; prioritize readability and simplicity over efficiency. For Python tests, typical "arrange" steps include imports, copying environment variables to local variables, and so on.
- Act — execute the code to be tested, such as sending a request to an API and receiving a response.
- Assert — verify that the test results match what is expected, using an
assert
statement.
Whenever possible, tests should run against the live production version of cloud APIs and resources. This will assure that any breaking changes in those resources are identified by the tests.
External resources that must exist prior to the test (for example, a Cloud SQL instance) should be identified and passed in through an environment variable. If specific data needs to exist within such infrastructure resources, however, the test should create this data as part of its Arrange steps and then clean up when the test is completed.
Creating mocks for external resources is strongly discouraged. Tests should verify the validity of the sample against the APIs, and not against a mock that embodies assumptions about the behavior of the APIs.
When tests need temporary resources (such as a temp file or folder), they
should create reasonable names for these resources with a UUID attached to
assure uniqueness. Use the Python uuid
package from the standard
library to generate UUIDs for resource names. For example:
glossary_id = f'test-glossary-{uuid.uuid4()}'
or:
# If full uuid4 is too long, use its hex representation.
encrypted_disk_name = f'test-disk-{uuid.uuid4().hex}'
# If the hex representation is also too long, slice it.
encrypted_disk_name = f'test-disk-{uuid.uuid4().hex[:5]}'
All temporary resources should be explicitly deleted when testing is complete. Use pytest's fixture for cleaning up these resouces instead of doing it in test itself.
If the sample prints output to the console, the test should capture stdout to a file and verify that the captured output contains the key information that is expected. Strive to verify the content of the output rather than the syntax. For example, the test might verify that a string is included in the output, without taking a dependency on where that string occurs in the output.
Never put potential infinite loops in the test code path. A typical
example is about gRPC's LongRunningOperations. Make sure you pass the
timeout parameter to the result()
call.
Good:
# will raise google.api_core.GoogleAPICallError after 60 seconds
operation.result(60)
Bad:
operation.result() # this could wait forever.
We recommend the timeout parameter to be around the number that gives you more than 90% success rate. Don't put too long a timeout.
Now this test is inevitably flaky, so consider marking the test as
flaky
as follows:
@pytest.mark.flaky(max_runs=3, min_passes=1)
def my_flaky_test():
# test that involves LRO poling with the timeout
This combination will give you very high success rate with fixed test execution time (0.999 success rate and 180 seconds operation wait time in the worst case in this example).
All the RPCs are inevitably flaky. It can fail for many reasons. The
google-cloud
Python client retries requests automatically for most
cases.
The old api-client doesn't retry automatically, so consider using
backoff
for retrying. Here is a
simple example:
import backoff
from googleapiclient.errors import HttpError
@pytest.fixture(scope='module')
def test_resource():
@backoff.on_exception(backoff.expo, HttpError, max_time=60)
def create_resource():
try:
return client.projects().imaginaryResource().create(
name=resource_id, body=body).execute()
except HttpError as e:
if '409' in str(e):
# Ignore this case and get the existing one.
return client.projects().imaginaryResource().get(
name=resource_id).execute()
else:
raise
resource = create_resource()
yield resource
# cleanup
...
Because all tests are system tests that use live resources, running tests requires a Google Cloud project with billing enabled, as covered under Creating and Managing Projects.
Once you have your project created and configured, you'll need to set
environment variables to identify the project and resources to be used
by tests. See
testing/test-env.tmpl.sh
for a list of all environment variables used by all tests. Not every
test needs all of these variables. All required environment variables
should be listed in the README and testing/test-env.tmpl.sh
. If you
find one is missing, please add instructions for setting it as part of
your PR.
We suggest that you copy this file as follows:
$ cp testing/test-env.tmpl.sh testing/test-env.sh
$ editor testing/test-env.sh # change the value of `GCLOUD_PROJECT`.
You can easily source
this file for exporting the environment variables.
This repository supports two ways to run tests locally.
-
nox
This is the recommended way. Setup takes little more efforts than the second one, but the test execution will be faster.
-
Docker
This is another way of running the tests. Setup is easier because you only need to instal Docker. The test execution will be bit slower than the first one.
Please read the MAC Setup Guide.
Automated testing for samples in python-docs-samples
is managed by
nox. Nox allows us to run a variety of tests,
including the flake8 linter, Python 2.7, Python 3.x, and App Engine tests,
as well as automated README generation.
Note: As a temporary workaround, each project currently uses first
noxfile-template.py
found in a parent folder above the current sample. In
order to simulate this locally, you need to copy + rename the parent
noxfile-template.py
as noxfile.py
in the folder of the project (containing the requirements.txt
for the file) you want to
cd python-docs-samples
cp noxfile-template.py PATH/TO/YOUR/PROJECT/noxfile.py
cd PATH/TO/YOUR/PROJECT/
To use nox, install it globally with pip
:
$ pip install nox
To run style checks on your samples:
nox -s lint
To run tests with a python version, use the correct py-3.*
sessions:
nox -s py-3.6
To run a specific file:
nox -s py-3.7 -- snippets_test.py
To run a specific test from a specific following:
nox -s py-3.7 -- snippets_test.py:test_list_blobs
If you have Docker installed and runnable by
the local user, you can use scripts/run_tests_local.sh
helper script
to run the tests. For example, let's say you want to modify the code
in cdn
directory, then you can do:
$ cd cdn
$ ../scripts/run_tests_local.sh .
# This will run the default sessions; lint, py-3.6, and py-3.7
$ ../scripts/run_tests_local.sh . lint
# Running only lint
If your test needs a service account, you have to create a service
account and download the JSON key to testing/service-account.json
.
On MacOS systems, you also need to install coreutils
to use
scripts/run_tests_local.sh
. Here is how to install it with brew
:
$ brew install coreutils
Certain samples require integration with Google Cloud Storage (GCS), most
commonly for APIs that read files from GCS. To run the tests for these
samples, configure your GCS bucket name via the CLOUD_STORAGE_BUCKET
environment variable.
The resources required by tests can usually be found in the ./resources
folder inside the sample directory, as in this
example.
You can upload those resources to your own GCS bucket to run the tests with
gsutil. For example:
gsutil cp ./resources/* gs://$CLOUD_STORAGE_BUCKET/