Skip to content

Commit 2091240

Browse files
authored
Merge pull request #5 from octoenergy/rebase-to-302
Rebase to 302
2 parents ac5ebee + 17ce223 commit 2091240

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+14855
-3010
lines changed

CHANGELOG.md

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,64 @@
11
# Release History
22

3-
## 2.9.4 (Unreleased)
3+
# 3.0.2 (2024-01-25)
4+
5+
- SQLAlchemy dialect now supports table and column comments (thanks @cbornet!)
6+
- Fix: SQLAlchemy dialect now correctly reflects TINYINT types (thanks @TimTheinAtTabs!)
7+
- Fix: `server_hostname` URIs that included `https://` would raise an exception
8+
- Other: pinned to `pandas<=2.1` and `urllib3>=1.26` to avoid runtime errors in dbt-databricks (#330)
9+
10+
## 3.0.1 (2023-12-01)
11+
12+
- Other: updated docstring comment about default parameterization approach (#287)
13+
- Other: added tests for reading complex types and revised docstrings and type hints (#293)
14+
- Fix: SQLAlchemy dialect raised DeprecationWarning due to `dbapi` classmethod (#294)
15+
- Fix: SQLAlchemy dialect could not reflect TIMESTAMP_NTZ columns (#296)
16+
17+
## 3.0.0 (2023-11-17)
18+
19+
- Remove support for Python 3.7
20+
- Add support for native parameterized SQL queries. Requires DBR 14.2 and above. See docs/parameters.md for more info.
21+
- Completely rewritten SQLAlchemy dialect
22+
- Adds support for SQLAlchemy >= 2.0 and drops support for SQLAlchemy 1.x
23+
- Full e2e test coverage of all supported features
24+
- Detailed usage notes in `README.sqlalchemy.md`
25+
- Adds support for:
26+
- New types: `TIME`, `TIMESTAMP`, `TIMESTAMP_NTZ`, `TINYINT`
27+
- `Numeric` type scale and precision, like `Numeric(10,2)`
28+
- Reading and writing `PrimaryKeyConstraint` and `ForeignKeyConstraint`
29+
- Reading and writing composite keys
30+
- Reading and writing from views
31+
- Writing `Identity` to tables (i.e. autoincrementing primary keys)
32+
- `LIMIT` and `OFFSET` for paging through results
33+
- Caching metadata calls
34+
- Enable cloud fetch by default. To disable, set `use_cloud_fetch=False` when building `databricks.sql.client`.
35+
- Add integration tests for Databricks UC Volumes ingestion queries
36+
- Retries:
37+
- Add `_retry_max_redirects` config
38+
- Set `_enable_v3_retries=True` and warn if users override it
39+
- Security: bump minimum pyarrow version to 14.0.1 (CVE-2023-47248)
440

541
## 2.9.3 (2023-08-24)
642

743
- Fix: Connections failed when urllib3~=1.0.0 is installed (#206)
844

945
## 2.9.2 (2023-08-17)
1046

47+
__Note: this release was yanked from Pypi on 13 September 2023 due to compatibility issues with environments where `urllib3<=2.0.0` were installed. The log changes are incorporated into version 2.9.3 and greater.__
48+
1149
- Other: Add `examples/v3_retries_query_execute.py` (#199)
1250
- Other: suppress log message when `_enable_v3_retries` is not `True` (#199)
1351
- Other: make this connector backwards compatible with `urllib3>=1.0.0` (#197)
1452

1553
## 2.9.1 (2023-08-11)
1654

55+
__Note: this release was yanked from Pypi on 13 September 2023 due to compatibility issues with environments where `urllib3<=2.0.0` were installed.__
56+
1757
- Other: Explicitly pin urllib3 to ^2.0.0 (#191)
1858

1959
## 2.9.0 (2023-08-10)
2060

21-
- Replace retry handling with DatabricksRetryPolicy. This is disabled by default. To enable, set `enable_v3_retries=True` when creating `databricks.sql.client` (#182)
61+
- Replace retry handling with DatabricksRetryPolicy. This is disabled by default. To enable, set `_enable_v3_retries=True` when creating `databricks.sql.client` (#182)
2262
- Other: Fix typo in README quick start example (#186)
2363
- Other: Add autospec to Client mocks and tidy up `make_request` (#188)
2464

CONTRIBUTING.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,8 @@ End-to-end tests require a Databricks account. Before you can run them, you must
107107
export host=""
108108
export http_path=""
109109
export access_token=""
110+
export catalog=""
111+
export schema=""
110112
```
111113

112114
Or you can write these into a file called `test.env` in the root of the repository:
@@ -141,6 +143,11 @@ The `PySQLLargeQueriesSuite` namespace contains long-running query tests and is
141143
The `PySQLStagingIngestionTestSuite` namespace requires a cluster running DBR version > 12.x which supports staging ingestion commands.
142144

143145
The suites marked `[not documented]` require additional configuration which will be documented at a later time.
146+
147+
#### SQLAlchemy dialect tests
148+
149+
See README.tests.md for details.
150+
144151
### Code formatting
145152

146153
This project uses [Black](https://pypi.org/project/black/).

README.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@
33
[![PyPI](https://img.shields.io/pypi/v/databricks-sql-connector?style=flat-square)](https://pypi.org/project/databricks-sql-connector/)
44
[![Downloads](https://pepy.tech/badge/databricks-sql-connector)](https://pepy.tech/project/databricks-sql-connector)
55

6-
The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/) and exposes a [SQLAlchemy](https://www.sqlalchemy.org/) dialect for use with tools like `pandas` and `alembic` which use SQLAlchemy to execute DDL.
6+
The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/) and exposes a [SQLAlchemy](https://www.sqlalchemy.org/) dialect for use with tools like `pandas` and `alembic` which use SQLAlchemy to execute DDL. Use `pip install databricks-sql-connector[sqlalchemy]` to install with SQLAlchemy's dependencies. `pip install databricks-sql-connector[alembic]` will install alembic's dependencies.
77

88
This connector uses Arrow as the data-exchange format, and supports APIs to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time.
99

1010
You are welcome to file an issue here for general use cases. You can also contact Databricks Support [here](help.databricks.com).
1111

1212
## Requirements
1313

14-
Python 3.7 or above is required.
14+
Python 3.8 or above is required.
1515

1616
## Documentation
1717

@@ -47,8 +47,7 @@ connection = sql.connect(
4747
access_token=access_token)
4848

4949
cursor = connection.cursor()
50-
51-
cursor.execute('SELECT * FROM RANGE(10)')
50+
cursor.execute('SELECT :param `p`, * FROM RANGE(10)', {"param": "foo"})
5251
result = cursor.fetchall()
5352
for row in result:
5453
print(row)

docs/parameters.md

Lines changed: 255 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,255 @@
1+
# Using Native Parameters
2+
3+
This connector supports native parameterized query execution. When you execute a query that includes variable markers, then you can pass a collection of parameters which are sent separately to Databricks Runtime for safe execution. This prevents SQL injection and can improve query performance.
4+
5+
This behaviour is distinct from legacy "inline" parameterized execution in versions below 3.0.0. The legacy behavior is preserved behind a flag called `use_inline_params`, which will be removed in a future release. See [Using Inline Parameters](#using-inline-parameters) for more information.
6+
7+
See **[below](#migrating-to-native-parameters)** for details about updating your client code to use native parameters.
8+
9+
See `examples/parameters.py` in this repository for a runnable demo.
10+
11+
## Requirements
12+
13+
- `databricks-sql-connector>=3.0.0`
14+
- A SQL warehouse or all-purpose cluster running Databricks Runtime >=14.2
15+
16+
## Limitations
17+
18+
- A query executed with native parameters can contain at most 255 parameter markers
19+
- The maximum size of all parameterized values cannot exceed 1MB
20+
21+
## SQL Syntax
22+
23+
Variables in your SQL query can use one of three PEP-249 [paramstyles](https://peps.python.org/pep-0249/#paramstyle). A parameterized query can use exactly one paramstyle.
24+
25+
|paramstyle|example|comment|
26+
|-|-|-|
27+
|`named`|`:param`|Parameters must be named|
28+
|`qmark`|`?`|Parameter names are ignored|
29+
|`pyformat`|`%(param)s`|Legacy syntax. Will be deprecated. Parameters must be named.|
30+
31+
#### Example
32+
33+
```sql
34+
-- named paramstyle
35+
SELECT * FROM table WHERE field = :value
36+
37+
-- qmark paramstyle
38+
SELECT * FROM table WHERE field = ?
39+
40+
-- pyformat paramstyle (legacy)
41+
SELECT * FROM table WHERE field = %(value)s
42+
```
43+
44+
## Python Syntax
45+
46+
This connector follows the [PEP-249 interface](https://peps.python.org/pep-0249/#id20). The expected structure of the parameter collection follows the paramstyle of the variables in your query.
47+
48+
### `named` paramstyle Usage Example
49+
50+
When your SQL query uses `named` paramstyle variable markers, you need specify a name for each value that corresponds to a variable marker in your query.
51+
52+
Generally, you do this by passing `parameters` as a dictionary whose keys match the variables in your query. The length of the dictionary must exactly match the count of variable markers or an exception will be raised.
53+
54+
```python
55+
from databricks import sql
56+
57+
with sql.connect(...) as conn:
58+
with conn.cursor() as cursor():
59+
query = "SELECT field FROM table WHERE field = :value1 AND another_field = :value2"
60+
parameters = {"value1": "foo", "value2": 20}
61+
result = cursor.execute(query, parameters=parameters).fetchone()
62+
```
63+
64+
This paramstyle is a drop-in replacement for the `pyformat` paramstyle which was used in connector versions below 3.0.0. It should be used going forward.
65+
66+
### `qmark` paramstyle Usage Example
67+
68+
When your SQL query uses `qmark` paramstyle variable markers, you only need to specify a value for each variable marker in your query.
69+
70+
You do this by passing `parameters` as a list. The order of values in the list corresponds to the order of `qmark` variables in your query. The length of the list must exactly match the count of variable markers in your query or an exception will be raised.
71+
72+
```python
73+
from databricks import sql
74+
75+
with sql.connect(...) as conn:
76+
with conn.cursor() as cursor():
77+
query = "SELECT field FROM table WHERE field = ? AND another_field = ?"
78+
parameters = ["foo", 20]
79+
result = cursor.execute(query, parameters=parameters).fetchone()
80+
```
81+
82+
The result of the above two examples is identical.
83+
84+
### Legacy `pyformat` paramstyle Usage Example
85+
86+
Databricks Runtime expects variable markers to use either `named` or `qmark` paramstyles. Historically, this connector used `pyformat` which Databricks Runtime does not support. So to assist assist customers transitioning their codebases from `pyformat``named`, we can dynamically rewrite the variable markers before sending the query to Databricks. This happens only when `use_inline_params=False`.
87+
88+
This dynamic rewrite will be deprecated in a future release. New queries should be written using the `named` paramstyle instead. And users should update their client code to replace `pyformat` markers with `named` markers.
89+
90+
For example:
91+
92+
```sql
93+
-- a query written for databricks-sql-connector==2.9.3 and below
94+
95+
SELECT field1, field2, %(param1)s FROM table WHERE field4 = %(param2)s
96+
97+
-- rewritten for databricks-sql-connector==3.0.0 and above
98+
99+
SELECT field1, field2, :param1 FROM table WHERE field4 = :param2
100+
```
101+
102+
103+
**Note:** While named `pyformat` markers are transparently replaced when `use_inline_params=False`, un-named inline `%s`-style markers are ignored. If your client code makes extensive use of `%s` markers, these queries will need to be updated to use `?` markers before you can execute them when `use_inline_params=False`. See [When to use inline parameters](#when-to-use-inline-parameters) for more information.
104+
105+
### Type inference
106+
107+
Under the covers, parameter values are annotated with a valid Databricks SQL type. As shown in the examples above, this connector accepts primitive Python types like `int`, `str`, and `Decimal`. When this happens, the connector infers the corresponding Databricks SQL type (e.g. `INT`, `STRING`, `DECIMAL`) automatically. This means that the parameters passed to `cursor.execute()` are always wrapped in a `TDbsqlParameter` subtype prior to execution.
108+
109+
Automatic inferrence is sufficient for most usages. But you can bypass the inference by explicitly setting the Databricks SQL type in your client code. All supported Databricks SQL types have `TDbsqlParameter` implementations which you can import from `databricks.sql.parameters`.
110+
111+
`TDbsqlParameter` objects must always be passed within a list. Either paramstyle (`:named` or `?`) may be used. However, if your query uses the `named` paramstyle, all `TDbsqlParameter` objects must be provided a `name` when they are constructed.
112+
113+
```python
114+
from databricks import sql
115+
from databricks.sql.parameters import StringParameter, IntegerParameter
116+
117+
# with `named` markers
118+
with sql.connect(...) as conn:
119+
with conn.cursor() as cursor():
120+
query = "SELECT field FROM table WHERE field = :value1 AND another_field = :value2"
121+
parameters = [
122+
StringParameter(name="value1", value="foo"),
123+
IntegerParameter(name="value2", value=20)
124+
]
125+
result = cursor.execute(query, parameters=parameters).fetchone()
126+
127+
# with `?` markers
128+
with sql.connect(...) as conn:
129+
with conn.cursor() as cursor():
130+
query = "SELECT field FROM table WHERE field = ? AND another_field = ?"
131+
parameters = [
132+
StringParameter(value="foo"),
133+
IntegerParameter(value=20)
134+
]
135+
result = cursor.execute(query, parameters=parameters).fetchone()
136+
```
137+
138+
In general, we recommend using `?` markers when passing `TDbsqlParameter`'s directly.
139+
140+
**Note**: When using `?` markers, you can bypass inference for _some_ parameters by passing a list containing both primitive Python types and `TDbsqlParameter` objects. `TDbsqlParameter` objects can never be passed in a dictionary.
141+
142+
# Using Inline Parameters
143+
144+
Since its initial release, this connector's `cursor.execute()` method has supported passing a sequence or mapping of parameter values. Prior to Databricks Runtime introducing native parameter support, however, "parameterized" queries could not be executed in a guaranteed safe manner. Instead, the connector made a best effort to escape parameter values and and render those strings inline with the query.
145+
146+
This approach has several drawbacks:
147+
148+
- It's not guaranteed to be safe from SQL injection
149+
- The server could not boost performance by caching prepared statements
150+
- The parameter marker syntax conflicted with SQL syntax in some cases
151+
152+
Nevertheless, this behaviour is preserved in version 3.0.0 and above for legacy purposes. It will be removed in a subsequent major release. To enable this legacy code path, you must now construct your connection with `use_inline_params=True`.
153+
154+
## Requirements
155+
156+
Rendering parameters inline is supported on all versions of DBR since these queries are indistinguishable from ad-hoc query text.
157+
158+
159+
## SQL Syntax
160+
161+
Variables in your SQL query can look like `%(param)s` or like `%s`.
162+
163+
#### Example
164+
165+
```sql
166+
-- pyformat paramstyle is used for named parameters
167+
SELECT * FROM table WHERE field = %(value)s
168+
169+
-- %s is used for positional parameters
170+
SELECT * FROM table WHERE field = %s
171+
```
172+
173+
## Python Syntax
174+
175+
This connector follows the [PEP-249 interface](https://peps.python.org/pep-0249/#id20). The expected structure of the parameter collection follows the paramstyle of the variables in your query.
176+
177+
### `pyformat` paramstyle Usage Example
178+
179+
Parameters must be passed as a dictionary.
180+
181+
```python
182+
from databricks import sql
183+
184+
with sql.connect(..., use_inline_params=True) as conn:
185+
with conn.cursor() as cursor():
186+
query = "SELECT field FROM table WHERE field = %(value1)s AND another_field = %(value2)s"
187+
parameters = {"value1": "foo", "value2": 20}
188+
result = cursor.execute(query, parameters=parameters).fetchone()
189+
```
190+
191+
The above query would be rendered into the following SQL:
192+
193+
```sql
194+
SELECT field FROM table WHERE field = 'foo' AND another_field = 20
195+
```
196+
197+
### `%s` paramstyle Usage Example
198+
199+
Parameters must be passed as a list.
200+
201+
```python
202+
from databricks import sql
203+
204+
with sql.connect(..., use_inline_params=True) as conn:
205+
with conn.cursor() as cursor():
206+
query = "SELECT field FROM table WHERE field = %s AND another_field = %s"
207+
parameters = ["foo", 20]
208+
result = cursor.execute(query, parameters=parameters).fetchone()
209+
```
210+
211+
The result of the above two examples is identical.
212+
213+
**Note**: `%s` is not compliant with PEP-249 and only works due to the specific implementation of our inline renderer.
214+
215+
**Note:** This `%s` syntax overlaps with valid SQL syntax around the usage of `LIKE` DML. For example if your query includes a clause like `WHERE field LIKE '%sequence'`, the parameter inlining function will raise an exception because this string appears to include an inline marker but none is provided. This means that connector versions below 3.0.0 it has been impossible to execute a query that included both parameters and LIKE wildcards. When `use_inline_params=False`, we will pass `%s` occurrences along to the database, allowing it to be used as expected in `LIKE` statements.
216+
217+
### Passing sequences as parameter values
218+
219+
Parameter values can also be passed as a sequence. This is typically used when writing `WHERE ... IN` clauses:
220+
221+
```python
222+
from databricks import sql
223+
224+
with sql.connect(..., use_inline_params=True) as conn:
225+
with conn.cursor() as cursor():
226+
query = "SELECT field FROM table WHERE field IN %(value_list)s"
227+
parameters = {"value_list": [1,2,3,4,5]}
228+
result = cursor.execute(query, parameters=parameters).fetchone()
229+
```
230+
231+
Output:
232+
233+
```sql
234+
SELECT field FROM table WHERE field IN (1,2,3,4,5)
235+
```
236+
237+
**Note**: this behavior is not specified by PEP-249 and only works due to the specific implementation of our inline renderer.
238+
239+
### Migrating to native parameters
240+
241+
Native parameters are meant to be a drop-in replacement for inline parameters. In most use-cases, upgrading to `databricks-sql-connector>=3.0.0` will grant an immediate improvement to safety. Plus, native parameters allow you to use SQL LIKE wildcards (`%`) in your queries which is impossible with inline parameters. Future improvements to parameterization (such as support for binding complex types like `STRUCT`, `MAP`, and `ARRAY`) will only be available when `use_inline_params=False`.
242+
243+
To completely migrate, you need to [revise your SQL queries](#legacy-pyformat-paramstyle-usage-example) to use the new paramstyles.
244+
245+
246+
### When to use inline parameters
247+
248+
You should only set `use_inline_params=True` in the following cases:
249+
250+
1. Your client code passes more than 255 parameters in a single query execution
251+
2. Your client code passes parameter values greater than 1MB in a single query execution
252+
3. Your client code makes extensive use of [`%s` positional parameter markers](#s-paramstyle-usage-example)
253+
4. Your client code uses [sequences as parameter values](#passing-sequences-as-parameter-values)
254+
255+
We expect limitations (1) and (2) to be addressed in a future Databricks Runtime release.

0 commit comments

Comments
 (0)