Skip to content

hunyadi/asyncpg_typed

Repository files navigation

Type-safe queries for asyncpg

asyncpg is a high-performance database client to connect to a PostgreSQL server, and execute SQL statements using the async/await paradigm in Python. The library exposes a Connection object, which has methods like execute and fetch that run SQL queries against the database. Unfortunately, these methods take the query as a plain str, arguments as object, and the resultset is exposed as a Record, which is a tuple/dict hybrid whose get and indexer have a return type of Any. There is no mechanism to check compatibility of input or output arguments, even if their types are preliminarily known.

This Python library provides "compile-time" validation for SQL queries that linters and type checkers can enforce. By creating a generic SQL object and associating input and output type information with the query, the signatures of execute and fetch reveal the exact expected and returned types.

Motivating example

Using a plain tuple

# create a typed object, setting expected and returned types
select_where_sql = sql(
    """--sql
    SELECT boolean_value, integer_value, string_value
    FROM sample_data
    WHERE boolean_value = $1 AND integer_value > $2
    ORDER BY integer_value;
    """,
    args=tuple[bool, int],
    resultset=tuple[bool, int, str | None],
)

conn = await asyncpg.connect(host="localhost", port=5432, user="postgres", password="postgres")
try:
    # ✅ Valid signature
    rows = await select_where_sql.fetch(conn, False, 2)

    # ✅ Type of "rows" is "list[tuple[bool, int, str | None]]"
    reveal_type(rows)

    # ⚠️ Expected 1 more positional argument
    rows = await select_where_sql.fetch(conn, False)

    # ⚠️ Argument of type "float" cannot be assigned to parameter "arg2" of type "int" in function "fetch"; "float" is not assignable to "int"
    rows = await select_where_sql.fetch(conn, False, 3.14)

finally:
    await conn.close()

Using a named tuple

# capture resultset column names and data types as fields of a named tuple
class Resultset(NamedTuple):
    boolean_value: bool
    integer_value: int
    string_value: str | None


# create a typed object, declaring return types with the named tuple
select_sql = sql(
    """--sql
    SELECT boolean_value, integer_value, string_value
    FROM sample_data
    ORDER BY integer_value;
    """,
    resultset=Resultset,
)

conn = await asyncpg.connect(host="localhost", port=5432, user="postgres", password="postgres")
try:
    rows = await select_sql.fetch(conn)

    # ✅ Type of "rows" is "list[Resultset]"
    reveal_type(rows)

    for row in rows:
        # use dot notation to access properties
        if row.string_value is not None:
            print(f"#{row.integer_value}: {row.string_value}")

        # ✅ Type of "row.boolean_value" is "bool"
        reveal_type(row.boolean_value)

        # unpack named tuple
        b, i, s = row

        # ✅ Type of "s" is "str | None"
        reveal_type(s)

finally:
    await conn.close()

Syntax

Creating a SQL object

Instantiate a SQL object with the sql function:

def sql(
    stmt: LiteralString | string.templatelib.Template,
    *,
    args: None | type[tuple[P1, P2]] | type[tuple[P1, P2, P3]] | ... = None,
    resultset: None | type[tuple[R1, R2]] | type[tuple[R1, R2, R3]] | ... = None,
    arg: None | type[P] = None,
    result: None | type[R] = None,
) -> _SQL: ...

Parameters to factory function

The parameter stmt represents a SQL expression, either as a literal string or a template (i.e. a t-string).

If the expression is a string, it can have PostgreSQL parameter placeholders such as $1, $2 or $3:

"INSERT INTO table_name (col_1, col_2, col_3) VALUES ($1, $2, $3);"

If the expression is a t-string, it can have replacement fields that evaluate to integers:

t"INSERT INTO table_name (col_1, col_2, col_3) VALUES ({1}, {2}, {3});"

The parameters args and resultset take a tuple of several types Px or Rx.

The parameters arg and result take a single type P or R. Passing a simple type (e.g. type[T]) directly via arg and result is for convenience, and is equivalent to passing a one-element tuple of the same simple type (i.e. type[tuple[T]]) via args and resultset.

The number of types in args must correspond to the number of query parameters. (This is validated on calling sql(...) for the t-string syntax.) The number of types in resultset must correspond to the number of columns returned by the query.

Argument and resultset types

When passing Python types via the parameters args and resultset, each type may be any of the following:

  • (required) simple type
  • optional simple type (T | None)
  • special union type

Simple types include:

  • bool
  • numeric types:
    • int
    • float
    • decimal.Decimal
  • date and time types:
    • datetime.date
    • datetime.time
    • datetime.datetime
    • datetime.timedelta
  • str
  • bytes
  • uuid.UUID
  • types defined in the module ipaddress:
    • ipaddress.IPv4Address
    • ipaddress.IPv6Address
    • ipaddress.IPv4Network
    • ipaddress.IPv6Network
  • asyncpg representations of PostgreSQL geometric types:
    • asyncpg.Point
    • asyncpg.Line
    • asyncpg.LineSegment
    • asyncpg.Box
    • asyncpg.Path
    • asyncpg.Polygon
    • asyncpg.Circle
  • concrete types of asyncpg.Range:
    • asyncpg.Range[int]
    • asyncpg.Range[Decimal]
    • asyncpg.Range[date]
    • asyncpg.Range[datetime]
  • a user-defined enumeration class that derives from StrEnum

Custom Python types corresponding to PostgreSQL scalar or composite types are permitted. These types need to be pre-registered with set_type_codec passing an encoder, a decoder and typically format="tuple".

In general, union types are not allowed. However, there are notable exceptions. Special union types are as follows:

  • JsonType to represent an object reconstructed from a JSON string
  • IPv4Address | IPv6Address to denote either an IPv4 or IPv6 address
  • IPv4Network | IPv6Network to denote either an IPv4 or IPv6 network definition

Types are grouped together with tuple:

tuple[bool, int, str | None]

Both args and resultset types must be compatible with their corresponding PostgreSQL query parameter types and resultset column types, respectively. The following table shows the mapping between PostgreSQL and Python types. When there are multiple options separated by a slash, either of the types can be specified as a source or target type.

PostgreSQL type Python type
bool bool
smallint int
integer int
bigint int
real/float4 float
double/float8 float
decimal/numeric Decimal
date date
time time (naive)
timetz time (tz)
timestamp datetime (naive)
timestamptz datetime (tz)
interval timedelta
char(N) str
varchar(N) str
text str
bytea bytes
uuid UUID
cidr IPvXNetwork
inet IPvXNetwork/IPvXAddress
macaddr str
macaddr8 str
json str/JsonType
jsonb str/JsonType
xml str
any enumeration type E: StrEnum
point asyncpg.Point
line asyncpg.Line
lseg asyncpg.LineSegment
box asyncpg.Box
path asyncpg.Path
polygon asyncpg.Polygon
circle asyncpg.Circle
int4range asyncpg.Range[int]
int8range asyncpg.Range[int]
numrange asyncpg.Range[Decimal]
tsrange asyncpg.Range[datetime] (naive)
tstzrange asyncpg.Range[datetime] (tz)
daterange asyncpg.Range[date]

PostgreSQL types json and jsonb are returned by asyncpg as Python type str. However, if we specify the union type JsonType in args or resultset, the JSON string is parsed as if by calling json.loads(). If the library orjson is present, its faster routines are invoked instead of the slower standard library implementation in the module json.

JsonType is defined in the module asyncpg_typed as follows:

JsonType = None | bool | int | float | str | dict[str, "JsonType"] | list["JsonType"]

IPvXNetwork is a shorthand for either of the following:

  • IPv4Network
  • IPv6Network
  • their union type IPv4Network | IPv6Network

IPvXAddress stands for either of the following:

  • IPv4Address
  • IPv6Address
  • their union type IPv4Address | IPv6Address

SQL statement as an f-string

In addition to the sql function, SQL objects can be created with the functionally identical unsafe_sql function. As opposed to its safer alternative, the first parameter of unsafe_sql has the plain type str, allowing us to pass an f-string. This can prove useful if we want to inject the value of a Python variable at location where binding parameters are not permitted by PostgreSQL syntax, e.g. substitute the name of a database table to dynamically create a SQL statement.

Using a SQL object

The function sql returns an object that derives from the base class _SQL and is specific to the number and types of parameters passed in args and resultset.

The following functions are available on SQL objects:

async def execute(self, connection: Connection, *args: *P) -> None: ...
async def executemany(self, connection: Connection, args: Iterable[tuple[*P]]) -> None: ...
async def fetch(self, connection: Connection, *args: *P) -> list[tuple[*R]]: ...
async def fetchmany(self, connection: Connection, args: Iterable[tuple[*P]]) -> list[tuple[*R]]: ...
async def fetchrow(self, connection: Connection, *args: *P) -> tuple[*R] | None: ...
async def fetchval(self, connection: Connection, *args: *P) -> R1: ...

Connection may be an asyncpg.Connection or an asyncpg.pool.PoolConnectionProxy acquired from a connection pool.

*P and *R denote several types (a type pack) corresponding to those listed in args and resultset, respectively.

Only those functions are prompted on code completion that make sense in the context of the given number of input and output arguments. Specifically, fetchval is available only for a single type passed to resultset, and executemany and fetchmany are available only if the query takes (one or more) parameters.

Run-time behavior

When a call such as sql.executemany(conn, records) or sql.fetch(conn, param1, param2) is made on a SQL object at run time, the library invokes connection.prepare(sql) to create a PreparedStatement and compares the actual statement signature against the expected Python types. If the expected and actual signatures don't match, a TypeMismatchError exception is raised.

If the resultset type has been declared with a subclass of NamedTuple, the field names of the tuple are compared against the column names of the PostgreSQL resultset. Should there be a mismatch, a NameMismatchError is raised. Field and column order is relevant.

The set of values for an enumeration type is validated when a prepared statement is created. The string values declared in a Python StrEnum are compared against the values listed in PostgreSQL CREATE TYPE ... AS ENUM by querying the system table pg_enum. If there are missing or extra values on either side, an EnumMismatchError exception is raised.

Unfortunately, PostgreSQL doesn't propagate nullability via prepared statements: resultset types that are declared as required (e.g. T as opposed to T | None) have to be validated at run time. When a None value is encountered for a required type, a NoneTypeError exception is raised.

PostgreSQL doesn't differentiate between IPv4 and IPv6 network definitions, or IPv4 and IPv6 addresses in the types cidr and inet. This means that semantically a union type is returned. If you specify a more restrictive type, the resultset data is validated dynamically at run time.