Fix handling of prepared collections #38

hampuskraft · 2023-11-17T21:01:18Z

I experienced some issues regarding type marshaling of maps, such as map<int, bigint>:

scyllapy.exceptions.ScyllaPyDBError: Database returned an error: The query is syntactically correct but invalid, Error message: Exception while binding column permissions: marshaling error: Validation failed for type org.apache.cassandra.db.marshal.LongType: got 4 bytes

After digging through the code, I identified the issue: If column_type is a Map—such as Map(Int, BigInt) in this case—the PyDict case of py_to_value would pass along ColumnType::Map to as the column_type to the recursive call, and since default match in the PyInt case was int32, it would incorrectly pass an int32 to a bigint field.

To make issues like these simpler to debug in the future, I also refactored the PyInt and PyFloat cases not to choose an implicit default but instead return a BindingError if an unsupported type were to be bound.

hampuskraft · 2023-11-20T12:32:09Z

I also noticed that sets weren't working as expected since it, too, passed the wrong column type to serialize its children. The same goes for tuples. However, tuples may also contain mixed data types. The two test cases I added for tuples are failing. Let me know if you have any ideas why that might be! Thanks.

scylla = <builtins.Scylla object at 0x1020c1730>, type_name = 'TUPLE<INT, INT>', test_val = (1, 2), cast_func = <class 'tuple'>

E       scyllapy.exceptions.ScyllaPyDBError: Database returned an error: The query is syntactically correct but invalid, Error message: Exception while binding column coll: marshaling error: Validation failed for type org.apache.cassandra.db.marshal.Int32Type: got 2 bytes

(And similarly for the TUPLE<INT, TEXT, FLOAT> test case, although there it got 3 bytes.)

pull-request-quantifier-deprecated · 2023-11-21T02:16:25Z

This PR has 91 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!

Quantification details

Label      : Small
Size       : +81 -10
Percentile : 36.4%

Total files changed: 3

Change summary by file extension:
.toml : +1 -1
.py : +43 -0
.rs : +37 -9

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

Fast and predictable releases to production:
- Optimal size changes are more likely to be reviewed faster with fewer
  iterations.
- Similarity in low PR complexity drives similar review times.
Review quality is likely higher as complexity is lower:
- Bugs are more likely to be detected.
- Code inconsistencies are more likely to be detected.
Knowledge sharing is improved within the participants:
- Small portions can be assimilated better.
Better engineering practices are exercised:
- Solving big problems by dividing them in well contained, smaller problems.
- Exercising separation of concerns within the code changes.

What can I do to optimize my changes

Use the PullRequestQuantifier to quantify your PR accurately
- Create a context profile for your repo using the context generator
- Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
- Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
- Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
Change your engineering behaviors
- For PRs that fall outside of the desired spectrum, review the details and check if:
  - Your PR could be split in smaller, self-contained PRs instead
  - Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

One line was added: +1 -0
One line was deleted: +0 -1
One line was modified: +1 -1 (git diff doesn't know about modified, it will
interpret that line like one addition plus one deletion)
Change percentiles: Change characteristics (addition, deletion, modification)
of this PR in relation to all other PRs within the repository.

Was this comment helpful? 👍 :ok_hand: :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

s3rius · 2023-11-25T10:11:38Z

Hello! Thanks for your cotribution. Have you tried using BigInt instead of python's integer? There're extra types which you can use: https://github.com/Intreecom/scyllapy#extra-types

The column_type parameter only set for prepared queries, it won't work for ordinary queries.

Agree about floats.

hampuskraft · 2023-11-25T10:46:33Z

Yeah.

I'm only using prepared queries in my app to avoid having to transform my query parameters to use ScyllaPy's extra types, and I noticed that a few things did not work since it passed the container type rather than the type of the contained value when converting them—I believe this would be on the right path? Automatic type conversion would indeed only be possible with prepared queries, so I'm only testing those.

The logic for handling tuples would need to be fixed to pass the prepared collections test since ScyllaDB currently complains about getting 2-3 bytes instead of the expected 4 bytes for an Int32Type when serializing tuple<int, int>, for instance. Nevertheless, I'm currently using my fork in my application without any issues since I'm not yet using tuples, but I do use sets & maps and would prefer not to cast the types manually.

I unfortunately don't have much time to work on this at the moment—I would appreciate any ideas you might have to finalize this and perhaps make it a bit cleaner.

Thanks!

s3rius · 2023-11-25T17:02:57Z

Will see what I can do to fix that moment.

hampuskraft added 2 commits November 17, 2023 21:41

Fix prepared statement tests and map type handling

5d36647

Fix type binding errors in py_to_value function

5a5a8a0

pull-request-quantifier-deprecated bot added the Small label Nov 17, 2023

hampuskraft added 2 commits November 20, 2023 13:28

Attempt at getting sets and tuples working

c8dcf27

Make Ruff linter work again

2d55152

hampuskraft marked this pull request as draft November 20, 2023 12:28

hampuskraft changed the title ~~Fix prepared statement tests and map type handling~~ Fix handling of prepared collections Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling of prepared collections #38

Fix handling of prepared collections #38

hampuskraft commented Nov 17, 2023 •

edited

Loading

hampuskraft commented Nov 20, 2023

pull-request-quantifier-deprecated bot commented Nov 21, 2023

What can I do to optimize my changes

How to interpret the change counts in git diff output

s3rius commented Nov 25, 2023 •

edited

Loading

hampuskraft commented Nov 25, 2023

s3rius commented Nov 25, 2023

Fix handling of prepared collections #38

Are you sure you want to change the base?

Fix handling of prepared collections #38

Conversation

hampuskraft commented Nov 17, 2023 • edited Loading

hampuskraft commented Nov 20, 2023

pull-request-quantifier-deprecated bot commented Nov 21, 2023

What can I do to optimize my changes

How to interpret the change counts in git diff output

s3rius commented Nov 25, 2023 • edited Loading

hampuskraft commented Nov 25, 2023

s3rius commented Nov 25, 2023

hampuskraft commented Nov 17, 2023 •

edited

Loading

s3rius commented Nov 25, 2023 •

edited

Loading