Skip to content

Commit 29cf81a

Browse files
Manul from Pathwayembe-pwjanchorowskiXGendredxtrous
committed
Release 0.3.4
Co-authored-by: Michał Bartoszkiewicz <embe@pathway.com> Co-authored-by: Jan Chorowski <janek@pathway.com> Co-authored-by: Xavier Gendre <xavier@pathway.com> Co-authored-by: Adrian Kosowski <adrian@pathway.com> Co-authored-by: Jakub Kowalski <kuba@pathway.com> Co-authored-by: Sergey Kulik <sergey@pathway.com> Co-authored-by: Mateusz Lewandowski <mateusz@pathway.com> Co-authored-by: Mohamed Malhou <mohamed@pathway.com> Co-authored-by: Krzysztof Nowicki <krzysiek@pathway.com> Co-authored-by: Richard Pelgrim <richard.pelgrim@pathway.com> Co-authored-by: Kamil Piechowiak <kamil@pathway.com> Co-authored-by: Paweł Podhajski <pawel.podhajski@pathway.com> Co-authored-by: Olivier Ruas <olivier@pathway.com> Co-authored-by: Przemysław Uznański <przemek@pathway.com> Co-authored-by: Sebastian Włudzik <sebastian.wludzik@pathway.com> GitOrigin-RevId: 8bf711247a07692d3403c797159218d1f5c0aa2d
1 parent 1fbd828 commit 29cf81a

40 files changed

+556
-164
lines changed

.github/workflows/release.yml

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ jobs:
122122
with:
123123
name: pathway-arm64
124124
path: ./target/wheels/
125-
125+
126126
- name: Upload artifact
127127
if: ${{ matrix.os == needs.start-runner.outputs.label }}
128128
uses: actions/upload-artifact@v3
@@ -153,13 +153,14 @@ jobs:
153153
- name: Install and verify Linux package
154154
run: |
155155
set -ex
156-
ENV_NAME="env_"${{ matrix.python-version }}""
157-
rm -rf $ENV_NAME
158-
python -m venv ${ENV_NAME}
159-
source ${ENV_NAME}/bin/activate
160-
pip install --prefer-binary ./wheels/pathway-*.whl
161-
pip"${{ matrix.python-version }}" install py
162-
python -m py --confcutdir $ENV_NAME --pyargs pathway
156+
ENV_NAME="testenv_${{ matrix.python-version }}"
157+
rm -rf "${ENV_NAME}"
158+
python -m venv "${ENV_NAME}"
159+
source "${ENV_NAME}/bin/activate"
160+
WHEEL=(public/pathway/target/wheels/pathway-*.whl)
161+
pip install --prefer-binary "${WHEEL}[tests]"
162+
# --confcutdir anything below to avoid picking REPO_TOP_DIR/conftest.py
163+
python -m pytest --confcutdir "${ENV_NAME}" --doctest-modules --pyargs pathway
163164
164165
Verify_ARM_ARCH:
165166
needs:
@@ -196,18 +197,20 @@ jobs:
196197
run: |
197198
set -ex
198199
# Pathway http monitoring set port
199-
ENV_NAME="env_${{ matrix.python-version }}"
200-
rm -rf $ENV_NAME
201-
python"${{ matrix.python-version }}" -m venv ${ENV_NAME}
202-
source ${ENV_NAME}/bin/activate
203-
pip"${{ matrix.python-version }}" install --prefer-binary ./wheels/pathway-*.whl
204-
pip"${{ matrix.python-version }}" install py
205-
python"${{ matrix.python-version }}" -m py --confcutdir $ENV_NAME --pyargs pathway
200+
source .github/workflows/bash_scripts/PATHWAY_MONITORING_HTTP_PORT.sh
201+
ENV_NAME="testenv_${{ matrix.python-version }}"
202+
rm -rf "${ENV_NAME}"
203+
python"${{ matrix.python-version }}" -m venv "${ENV_NAME}"
204+
source "${ENV_NAME}/bin/activate"
205+
WHEEL=(public/pathway/target/wheels/pathway-*.whl)
206+
pip install --prefer-binary "${WHEEL}[tests]"
207+
# --confcutdir anything below to avoid picking REPO_TOP_DIR/conftest.py
208+
python -m pytest --confcutdir "${ENV_NAME}" --doctest-modules --pyargs pathway
206209
env:
207210
MACOSX_DEPLOYMENT_TARGET: "10.15"
208211
DEVELOPER_DIR: /Library/Developer/CommandLineTools
209212
SDKROOT: /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
210-
213+
211214
- name: post cleanup
212215
run: rm -rf ./wheels
213216

@@ -257,7 +260,7 @@ jobs:
257260
path: .
258261

259262
- name: Create Release
260-
uses: ncipollo/release-action@v1.12.0
263+
uses: ncipollo/release-action@v1.12.0
261264
with:
262265
draft: true
263266
artifacts: "./wheels/*.whl"
@@ -270,7 +273,7 @@ jobs:
270273
with:
271274
password: ${{ secrets.PYPI_TOKEN }}
272275
packages-dir: './wheels/'
273-
276+
274277
- name: Publish package to s3
275278
uses: prewk/s3-cp-action@v2
276279
with:

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77
## [Unreleased]
88

9+
## [0.3.4] - 2023-09-18
10+
11+
### Fixed
12+
- Incompatible `beartype` version is now excluded from dependencies.
13+
914
## [0.3.3] - 2023-09-14
1015

1116
### Added

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "pathway"
3-
version = "0.3.3"
3+
version = "0.3.4"
44
edition = "2021"
55
publish = false
66
rust-version = "1.71.0"

integration_tests/s3/test_s3_interops.py

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import os
55
import pathlib
66
import time
7+
import uuid
78

89
import boto3
910
import pandas as pd
@@ -300,3 +301,74 @@ class InputSchema(pw.Schema):
300301
output_contents = read_jsonlines_fields(output_path, ["key", "value"])
301302
output_contents.sort(key=lambda entry: entry["key"])
302303
assert output_contents == third_input_part
304+
305+
306+
def test_s3_bytes_read(tmp_path: pathlib.Path):
307+
input_path = (
308+
f"integration_tests/test_s3_bytes_read/{time.time()}-{uuid.uuid4()}/input.txt"
309+
)
310+
input_full_contents = "abc\n\ndef\nghi\njkl"
311+
output_path = tmp_path / "output.json"
312+
313+
put_aws_object(input_path, input_full_contents)
314+
table = pw.io.s3.read(
315+
input_path,
316+
aws_s3_settings=pw.io.s3_csv.AwsS3Settings(
317+
bucket_name="aws-integrationtest",
318+
access_key="AKIAX67C7K343BP4QUWN",
319+
secret_access_key=os.environ["AWS_S3_SECRET_ACCESS_KEY"],
320+
region="eu-central-1",
321+
),
322+
format="binary",
323+
mode="static",
324+
autocommit_duration_ms=1000,
325+
)
326+
pw.io.jsonlines.write(table, output_path)
327+
pw.run()
328+
329+
with open(output_path, "r") as f:
330+
result = json.load(f)
331+
assert result["data"] == [ord(c) for c in input_full_contents]
332+
333+
334+
def test_s3_empty_bytes_read(tmp_path: pathlib.Path):
335+
base_path = (
336+
f"integration_tests/test_s3_empty_bytes_read/{time.time()}-{uuid.uuid4()}/"
337+
)
338+
339+
put_aws_object(base_path + "input", "")
340+
put_aws_object(base_path + "input2", "")
341+
342+
table = pw.io.s3.read(
343+
base_path,
344+
aws_s3_settings=pw.io.s3_csv.AwsS3Settings(
345+
bucket_name="aws-integrationtest",
346+
access_key="AKIAX67C7K343BP4QUWN",
347+
secret_access_key=os.environ["AWS_S3_SECRET_ACCESS_KEY"],
348+
region="eu-central-1",
349+
),
350+
format="binary",
351+
mode="static",
352+
autocommit_duration_ms=1000,
353+
)
354+
355+
rows = []
356+
357+
def on_change(key, row, time, is_addition):
358+
rows.append(row)
359+
360+
def on_end(*args, **kwargs):
361+
pass
362+
363+
pw.io.subscribe(table, on_change=on_change, on_end=on_end)
364+
pw.run()
365+
366+
assert (
367+
rows
368+
== [
369+
{
370+
"data": b"",
371+
}
372+
]
373+
* 2
374+
)

pyproject.toml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,18 @@ dependencies = [
2323
"pyarrow >= 10.0.0",
2424
"requests >= 2.31.0",
2525
"python-sat >= 0.1.8.dev",
26-
"beartype >= 0.14.0",
26+
"beartype >= 0.14.0, < 0.16.0",
2727
"rich >= 12.6.0",
2828
"diskcache >= 5.2.1",
2929
"exceptiongroup >= 1.1.3; python_version < '3.11'",
3030
]
3131

32+
[project.optional-dependencies]
33+
tests = [
34+
"pytest == 7.4.0",
35+
"pytest-xdist == 3.3.1",
36+
]
37+
3238
[project.urls]
3339
"Homepage" = "https://pathway.com/"
3440
"Source code" = "https://github.com/pathwaycom/pathway/"

python/pathway/conftest.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,20 @@
1111
def parse_graph_teardown():
1212
yield
1313
parse_graph.G.clear()
14+
15+
16+
@pytest.fixture(autouse=True)
17+
def environment_variables(monkeypatch):
18+
monkeypatch.setenv("KAFKA_USERNAME", "pathway")
19+
monkeypatch.setenv("KAFKA_PASSWORD", "Pallas'sCat")
20+
monkeypatch.setenv("BEARER_TOKEN", "42")
21+
monkeypatch.setenv("MINIO_S3_ACCESS_KEY", "Otocolobus")
22+
monkeypatch.setenv("MINIO_S3_SECRET_ACCESS_KEY", "manul")
23+
monkeypatch.setenv("S3_ACCESS_KEY", "Otocolobus")
24+
monkeypatch.setenv("S3_SECRET_ACCESS_KEY", "manul")
25+
monkeypatch.setenv("DO_S3_ACCESS_KEY", "Otocolobus")
26+
monkeypatch.setenv("DO_S3_SECRET_ACCESS_KEY", "manul")
27+
monkeypatch.setenv("WASABI_S3_ACCESS_KEY", "Otocolobus")
28+
monkeypatch.setenv("WASABI_S3_SECRET_ACCESS_KEY", "manul")
29+
monkeypatch.setenv("OVH_S3_ACCESS_KEY", "Otocolobus")
30+
monkeypatch.setenv("OVH_S3_SECRET_ACCESS_KEY", "manul")

python/pathway/engine.pyi

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,10 @@ class ConnectorMode(Enum):
3535
SIMPLE_STREAMING: ConnectorMode
3636
STREAMING_WITH_DELETIONS: ConnectorMode
3737

38+
class ReadMethod(Enum):
39+
BY_LINE: ReadMethod
40+
FULL: ReadMethod
41+
3842
class Universe:
3943
@property
4044
def id_column(self) -> Column: ...

python/pathway/internals/api.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
int,
1818
float,
1919
str,
20+
bytes,
2021
bool,
2122
BasePointer,
2223
datetime.datetime,

python/pathway/io/_utils.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
from pathway.internals import api
1111
from pathway.internals import dtype as dt
1212
from pathway.internals._io_helpers import _form_value_fields
13-
from pathway.internals.api import ConnectorMode, PathwayType
13+
from pathway.internals.api import ConnectorMode, PathwayType, ReadMethod
1414
from pathway.internals.schema import ColumnDefinition, Schema, SchemaProperties
1515

1616
STATIC_MODE_NAME = "static"
@@ -28,6 +28,7 @@
2828
"plaintext": "identity",
2929
"json": "jsonlines",
3030
"raw": "identity",
31+
"binary": "identity",
3132
}
3233

3334
_PATHWAY_TYPE_MAPPING: Dict[PathwayType, Any] = {
@@ -49,6 +50,7 @@
4950
"json",
5051
"plaintext",
5152
"raw",
53+
"binary",
5254
]
5355
)
5456

@@ -91,6 +93,12 @@ def internal_connector_mode(mode: str | api.ConnectorMode) -> api.ConnectorMode:
9193
return internal_mode
9294

9395

96+
def internal_read_method(format: str) -> ReadMethod:
97+
if format == "binary":
98+
return ReadMethod.FULL
99+
return ReadMethod.BY_LINE
100+
101+
94102
class CsvParserSettings:
95103
"""Class representing settings for the CSV parser."""
96104

@@ -248,6 +256,7 @@ def construct_schema_and_data_format(
248256
format_type=data_format_type,
249257
key_field_names=None,
250258
value_fields=[api.ValueField("data", PathwayType.ANY)],
259+
parse_utf8=(format != "binary"),
251260
)
252261
schema, api_schema = read_schema(
253262
schema=schema,
@@ -300,6 +309,7 @@ def construct_s3_data_storage(
300309
path=path,
301310
aws_s3_settings=rust_engine_s3_settings,
302311
mode=internal_connector_mode(mode),
312+
read_method=internal_read_method(format),
303313
persistent_id=persistent_id,
304314
)
305315

python/pathway/io/csv/__init__.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -83,16 +83,14 @@ def read(
8383
use the `pw.io.csv.read` method:
8484
8585
>>> import pathway as pw
86-
...
8786
>>> class InputSchema(pw.Schema):
8887
... owner: str
8988
... pet: str
90-
...
9189
>>> t = pw.io.csv.read("dataset.csv", schema=InputSchema, mode="static")
9290
9391
Then, you can output the table in order to check the correctness of the read:
9492
95-
>>> pw.debug.compute_and_print(t, include_id=False)
93+
>>> pw.debug.compute_and_print(t, include_id=False) # doctest: +SKIP
9694
owner pet
9795
Alice dog
9896
Bob dog
@@ -119,7 +117,6 @@ def read(
119117
>>> class InputSchema(pw.Schema):
120118
... ip: str
121119
... login: str
122-
...
123120
>>> t = pw.io.csv.read("logs/", schema=InputSchema, mode="static")
124121
125122
The only difference is that you specified the name of the directory instead of the

python/pathway/io/debezium/__init__.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -100,13 +100,12 @@ def read(
100100
101101
Now, using the settings you can set up a connector. It is as simple as:
102102
103-
103+
>>> import pathway as pw
104104
>>> class InputSchema(pw.Schema):
105105
... id: str = pw.column_definition(primary_key=True)
106106
... age: int
107107
... owner: str
108108
... pet: str
109-
110109
>>> t = pw.io.debezium.read(
111110
... rdkafka_settings,
112111
... topic_name="pets",
@@ -123,7 +122,7 @@ def read(
123122
data_storage = api.DataStorage(
124123
storage_type="kafka",
125124
rdkafka_settings=rdkafka_settings,
126-
topics=[topic_name],
125+
topic=topic_name,
127126
persistent_id=persistent_id,
128127
)
129128
schema, data_format_definition = read_schema(

python/pathway/io/elasticsearch/__init__.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -74,18 +74,19 @@ def write(table: Table, host: str, auth: ElasticSearchAuth, index_name: str) ->
7474
7575
Now suppose we want to send a Pathway table pets to this local instance of
7676
Elasticsearch.
77+
>>> import pathway as pw
78+
>>> pets = pw.debug.parse_to_table("age owner pet \\n 1 10 Alice dog \\n 2 9 Bob cat \\n 3 8 Alice cat")
7779
7880
It can be done as follows:
7981
80-
>>> import pathway as pw
81-
>>> t = pw.io.elasticsearch.write(
82+
>>> pw.io.elasticsearch.write(
8283
... table=pets,
8384
... host="http://localhost:9200",
8485
... auth=pw.io.elasticsearch.ElasticSearchAuth.basic("admin", "admin"),
8586
... index_name="animals",
8687
... )
8788
88-
All the updates of table t will be indexed to "animals" as well.
89+
All the updates of table ```pets`` will be indexed to "animals" as well.
8990
"""
9091

9192
data_storage = api.DataStorage(

0 commit comments

Comments
 (0)