0.65.0 (2025-01-26)
0.64.0 (2025-01-12)
- additional boolean comparison functions (#764) (2d8b1b6)
- introduce Iceberg table type using metadata file (#758) (7434e2f)
- run pytest in pr workflow to check function test coverage (#765) (7bfc37c)
- bump flake8 version to 7.0.0 (#768) (57770b6)
- update the doc to clarify that function names are case-sensitive (#757) (203e6e4)
0.63.1 (2024-12-22)
0.63.0 (2024-12-15)
- The encoding of FetchRel has changed in a strictly backwards incompatible way. The change involves transitioning offset and count from a standalone int64 field to a oneof structure, where the original int64 field is marked as deprecated, and a new field of Expression type is introduced. Using a oneof may cause ambiguity between unset and set-to-zero states in older messages. However, the fields are defined such that their logical meaning remains indistinguishable, ensuring consistency across encodings.
- add expression support for count and offset in the fetch operator (#748) (bd4b431)
- add simple linking to the examples (#702) (4c00b1c)
- support missing variants for regexp string functions (#750) (3410a3e)
0.62.0 (2024-11-24)
- add readme for testcase file format (#746) (708a7b8)
- port function testcases from bft (#738) (d84ccd1)
0.61.0 (2024-11-17)
- add substrait test files to go embedded fs (#740) (e3a7773)
- handle parsing of list arguments in func testcases (#737) (1f9c710)
- update operator to update a table (#734) (adb1079)
0.60.0 (2024-11-10)
- add antlr grammar for test file format (#728) (752aa63)
- add CreateMode for CTAS in WriteRel (#715) (2e13d0b)
- update test file format to support aggregate functions (#736) (c18c0c1)
0.59.0 (2024-11-03)
- changes the message type for Expressions field in VirtualTable
0.58.0 (2024-10-13)
- define sideband optimization hints (#705) (e386a29)
- enhance VirtualTable to have expression as value (#711) (954bcbc)
- specify row_number start (#722) (#723) (a0388ff)
0.57.1 (2024-10-06)
0.57.0 (2024-10-02)
- This PR changes the definition of grouping sets in
AggregateRel
to consist of references into a list of grouping expressions instead of consisting of expressions directly.
With the previous definition, consumers had to deduplicate the expressions in the grouping sets in order to execute the query or even derive the output schema (which is problematic, as explained below). With this change, the responsibility of deduplicating expressions is now on the producer. Concretely, consumers are now expected to be simpler: The list of grouping expressions immediately provides the information needed to derive the output schema and the list of grouping sets explicitly and unambiguously provides the equality of grouping expressions. Producers now have to specify the grouping sets explicitly. If their internal representation of grouping sets consists of full grouping expressions (rather than references), then they must deduplicate these expressions according to their internal notion of expression equality in order to produce grouping sets consisting of references to these deduplicated expressions.
If the previous format is desired, it can be obtained from the new format by (1) deduplicating the grouping expressions (according to the previously applicable definition of expression equality), (2) re-establishing the duplicates using the emit clause, and (3) "dereferencing" the references in the grouping sets, i.e., by replacing each reference in the grouping sets with the expression it refers to.
The previous version was problematic because it required the consumers to deduplicate the expressions from the grouping sets. This, in turn, requires to parse and understand 100% of these expression even in cases where that understanding is otherwise optional, which is in opposition to the general philosophy of allowing for simple-minded consumers. The new version avoids that problem and, thus, allows consumers to be
- change grouping expressions in AggregateRel to references (#706) (65a7d38), closes #700
- clarify behaviour of SetRel operations (#708) (f796521)
- make substrait repo a go module (#712) (3dca9b5)
0.56.0 (2024-09-15)
- add optional metadata containing field names to RelCommon (#696) (5a73281)
- define mark join (#682) (bc1b93f)
0.55.0 (2024-08-18)
0.54.0 (2024-08-11)
- The encoding of IntervalDay literals has changed in a strictly backwards incompatible way. However, the logical meaning across encoding is maintained using a oneof. Moving a field into a oneof makes unset/set to zero unclear with older messages but the fields are defined such that the logical meaning of the two is indistinct. If neither microseconds nor precision is set, the value can be considered a precision 6 value. If you aren't using IntervalDay type, you will not need to make any changes.
- TypeExpression and Parameterized type protobufs (used to serialize output derivation) are updated to match the now compound nature of IntervalDay. If you use protobuf to serialize output derivation that refer to IntervalDay type, you will need to rework that logic.
- JoinRel's type enum now has LEFT_SINGLE instead of SINGLE. Similarly there is now LEFT_ANTI and LEFT_SEMI. Other values are available in all join type enums. This affects JSON and text formats only (binary plans -- the interoperable part of Substrait -- will still be compatible before and after this change).
- add arithmetic function "power" with decimal type (#660) (9af2d66)
- add CSV (text) file support (#646) (5d49e04)
- add precision to IntervalDay and new IntervalCompound type (#665) (e41eff2), closes #664
- normalize the join types (#662) (bed84ec)
0.53.0 (2024-08-04)
- PrecisionTimestamp(Tz) literal's value is now int64 instead of uint64
- add aggregate count functions with decimal return type (#670) (2aa516b)
- add arithmetic function "sqrt" and "factorial" with decimal type (#674) (e4f5b68)
- add arithmetic function for bitwise(AND/OR/XOR) operation with decimal arguments (#675) (a70cf72)
- add logarithmic functions with decimal type args (#669) (d9fb1e3)
- add precision timestamp datetime fn variants (#666) (60c93d2)
- clarify the meaning of plans (#616) (c1553df), closes #612 #613
0.52.0 (2024-07-14)
- changes the message type for Literal PrecisionTimestamp and PrecisionTimestampTZ
The PrecisionTimestamp and PrecisionTimestampTZ literals were introduced
- include precision information in PrecisionTimestamp and PrecisionTimestampTZ literals (#659) (f9e5f9c), closes #594 /github.com/substrait-io/substrait/pull/594#discussion_r1471844566
0.51.0 (2024-07-07)
- add "initcap" function (#656) (95bc6ba), closes /github.com/Blizzara/substrait/blob/70d1eb71623ca0754157dd5d87348bae51d420c4/extensions/functions_string.yaml#L1023
- add null input handling options for
any_value
(#652) (1890e6a) - allow naming/aliasing relations (#649) (4cf8108), closes #648 #571
- define SetRel output nullability derivation (#558) (#654) (612123a)
0.50.0 (2024-06-30)
- consumers must now check for multiple optimization messages within an AdvancedExtension
0.49.0 (2024-05-23)
- ci: pin
conventional-changelog-conventionalcommits
to7.0.2
(#644) (9528bd2) - specify a minimum length for the options of enum args (#642) (8e65af5), closes /github.com/substrait-io/substrait-rs/pull/185#discussion_r1603513149
0.48.0 (2024-04-25)
- min:ts has been moved to functions_datetime
- max:ts has been moved to functions_datetime
0.47.0 (2024-04-18)
- add i64 variant for exp, ln, log10, log2 and logb functions (#628) (fef2253)
- allow FetchRel to specify a return of ALL results (#622) (#627) (37f43b4)
- index_in has wrong return type (#632) (4cd2089)
- use any1 instead of T in function extensions (#629) (0bddf68)
0.46.0 (2024-04-14)
0.45.0 (2024-03-24)
0.44.0 (2024-03-03)
- Adding a NULL option to the on_domain_errors.
SQLite returns null for some inputs such as negative infinity
- add extra option for on domain errors in log functions (#536) (cbec079)
- add ignore nulls options to concat function (#605) (55db05b)
0.43.0 (2024-02-25)
0.42.1 (2024-01-28)
0.42.0 (2024-01-21)
- add custom equality behavior to the hash/merge join (#585) (daeac31)
- add interval multiplication (#580) (c1254ac)
- add min/max for datetime types (#584) (5c8fa04)
0.41.0 (2023-12-24)
- Renamed modulus to modulo.
Added options and documentation for the modulo operator as defined in math and comp sci.
0.40.0 (2023-12-17)
- The enum
WriteRel::OutputMode
had an option change fromOUTPUT_MODE_MODIFIED_TUPLES
toOUTPUT_MODE_MODIFIED_RECORDS
- The message
AggregateFunction.ReferenceRel
has moved toReferenceRel
.
0.39.0 (2023-11-26)
-
- Map keys may be repeated.
- Map keys must not be NULL.
- The map key type may be nullable.
This is based on the current restrictions found in the wild.
DuckDB, Velox, Spark, and Acero all reject attempts to provide NULL as a key.
Despite DuckDB specifically calling out that keys must be unique in its implementation other implementations such as Velox and Acero do not require the key to be unique so we cannot require the map key to be 1:1 with map values.
0.38.0 (2023-11-05)
0.37.0 (2023-10-22)
0.36.0 (2023-10-08)
0.35.0 (2023-10-01)
- nullability of is_not_distinct_from has changed
- The minimum precision for floating point numbers is now mandated.
- add approval guidelines for documentation updates (#553) (da4b32a)
- add geometric data types and functions (#543) (db52bbd)
- add geometry editor functions (#554) (727467c)
- adding geometry accessor functions (#552) (784fa9b)
- explicitly reference IEEE 754 and mandate precision as well as range (#449) (54e3d52), closes #447
0.34.0 (2023-09-17)
- add more window functions (#534) (f2bfe15)
- allow agg functions to be used in windows (#540) (565a1ef)
0.33.0 (2023-08-27)
0.32.0 (2023-08-21)
- plans referencing functions using simple names (e.g. not vs not:bool) will no longer be valid.
- add ExchangeRel as a type in Rel (#518) (89b0c62)
- add expand rel (#368) (98380b0)
- add options to substring for start parameter being negative (#508) (281dc0f)
- add windowrel support in proto (#399) (bd14e0e)
- require compound functions names in extension references (#537) (2503beb)
0.31.0 (2023-07-02)
- add a two-arg variant of substring (#513) (a6ead70)
- add timestamp types to max/min function (#511) (6943400)
0.30.0 (2023-05-14)
- This adds an option to control indexing of components
0.29.0 (2023-04-23)
- text: mark
name
andstructure
property oftype
extension item as required (#495)
- referenced simple extension in tutorial (set instead of string) (#494) (b5d7ed2)
- text: mark
name
andstructure
property oftype
extension item as required (#495) (7246102)
0.28.2 (2023-04-16)
0.28.1 (2023-04-09)
0.28.0 (2023-04-02)
- adding BibTex entry to cite Substrait (#481) (425e7f8), closes #480
- adding SUM0 definition for aggregate functions (#465) (73228b4), closes #259
0.27.0 (2023-03-26)
group
argument added toregexp_match_substring
function
Add regexp_match_substring_all function
Resolves #466
- ci: fix link to conventional commits spec (#482) (45b4e48)
- remove duplication in simple extensions schema (#404) (b7df38d)
0.26.0 (2023-03-05)
- add script to re-namespace .proto files for internal use in public libraries (#207) (a6f24db)
- add temporal functions (#272) (beb104b), closes #222
0.25.0 (2023-02-26)
- (add/subtract)ing an interval to a timestamp_tz now requires a time zone and returns a timestamp_tz
- correct return of temporal add and subtract and add timezone parameter (#337) (1b184cc)
- extension: fix typo in scalar function argument type (#445) (7d7ddf1)
0.24.0 (2023-02-12)
0.23.0 (2023-01-22)
- add extended expression for expression only evaluation (#405) (d35f0ed)
- spec: add physical plans for hashJoin and mergeJoin (#336) (431651e)
0.22.0 (2022-12-18)
0.21.1 (2022-12-04)
0.21.0 (2022-11-27)
- add nested type constructor expressions (#351) (b64d30b)
- add title to simple extensions schema (#387) (2819ecc)
0.20.0 (2022-11-20)
- optional arguments are no longer allowed to be specified as a part of FunctionArgument messages. Instead they are now specified separately as part of the function invocation.
- optional arguments are now specified separately from required arguments in the YAML specification.
Co-authored-by: Benjamin Kietzman bengilgit@gmail.com
Co-authored-by: Benjamin Kietzman bengilgit@gmail.com
- add best effort filter to read rel and clarify that the pre-masked schema should be used (#271) (4beff87)
- optional args are now specified separately from required args (#342) (bd29ea3)
0.19.0 (2022-11-06)
0.18.0 (2022-10-09)
0.17.0 (2022-10-02)
0.16.0 (2022-09-25)
- add any_value aggregate function (#321) (6f603d3)
- support constant function arguments (#305) (6021030)
0.15.0 (2022-09-18)
- options were added to division and logarithmic functions.
0.14.0 (2022-09-11)
- option argument added to std_dev and variance aggregate functions
- add bool_and and bool_or aggregate functions (#314) (52fa523)
- add corr and mode aggregation functions (#296) (96b13d7)
- add median and count_distinct aggregation functions (#278) (9be62e5)
- add population option to variance and standard deviation functions (#295) (c47fffa)
- add quantile aggregate function (#279) (de6bc9f)
- add string_agg aggregate function (#297) (fbe5e09)
- mark string_agg aggregate as being sensitive to input order (#312) (683faaa)
- naming: add missing arg names in functions_arithmetic.yaml (#315) (d433a06)
- naming: add missing arg names in functions_datetime.yaml (#318) (b7347d1)
- naming: add missing arg names in functions_logarithmic.yaml and functions_set.yaml (#319) (1c14d27)
- naming: add/replace arg names in functions_boolean.yaml (#317) (809a2f4)
- revert addition of count_distinct aggregate function (#311) (90d7c0d)
0.13.0 (2022-09-04)
- nullability behavior of is_nan, is_finite, and is_infinite has changed
- compound name for concat has changed to concat:str and concat:vchar (one argument) to make it 1+ variadic
- add center function (#282) (7697d39)
- add coalesce function (#301) (63c5da0)
- add dwrf file format (#304) (0f7c2ea)
- add exp function (#299) (7ed31f6)
- add factorial scalar function (#300) (a4d6f35)
- add hyperbolic functions (#290) (4252824)
- add log1p function (#273) (55e8275)
- add regexp_match_substring, regexp_strpos, and regexp_count_substring (#293) (6b8191f)
- add regexp_replace function (#281) (433d049)
- add string transform functions (#267) (ff2f7f1)
- clarify behavior of is_null, is_not_null, is_nan, is_finite, and is_infinite for nulls (#285) (cb25124)
0.12.0 (2022-08-28)
- add between function (#287) (aad6f63)
- add case_sensitivity option to string functions (#289) (4c354de)
0.11.0 (2022-08-21)
0.10.0 (2022-08-14)
- add and_not boolean function (#276) (8af3fe0)
- add is_finite and is_infinite (#286) (01d5428)
- add support for DDL and INSERT/DELETE/UPDATE operations (#252) (cbb6c26)
0.9.0 (2022-07-31)
- arithmetic: Options SILENT, SATURATE, ERROR are no longer valid for use with floating point arguments to add, subtract, multiply or divide
- function argument bindings were open to interpretation before, and were often produced incorrectly; therefore, this change semantically shifts some responsibilities from the consumers to the producers.
- the grouping set index column now only exists if there is more than one grouping set.
- Existing plans that are modeling
cast
with thecast
function (as opposed to thecast
expression) will no longer be valid. All producers/consumers should use thecast
expression type.
- add functions for arithmetic, rounding, logarithmic, and string transformations (#245) (f7c5da5)
- add standard deviation functions (#257) (1339534)
- add string containment functions (#256) (d6b9b34)
- add string trimming and padding functions (#248) (8a8f65d)
- add trigonometry functions (#241) (d83d566)
- add variance function (#263) (b6c3772)
- arithmetic: add abs and sign to scalar function extensions (#244) (1b9a45f)
- support window functions (#224) (4b2072a)
- message: commit lint issue (#250) (34ec8f5)
- removes cast function definition (#253) (66a3476), closes #88 #152
- specify how function arguments are to be bound (#231) (d4cfbe0)
0.8.0 (2022-07-17)
- The signature of divide functions for multiple types now specify an enumeration prior to specifying operands.
0.7.0 (2022-07-11)
0.6.0 (2022-06-26)
0.5.0 (2022-06-12)
- The
substrait/ReadRel/LocalFiles/format
field is deprecated. This will cause a hard break in compatibility. Newer consumers will not be able to read older files. Older consumers will not be able to read newer files. One should now express format concepts using the file_format oneof field.
Co-authored-by: Jacques Nadeau jacques@apache.org
- add aggregate function min/max support (#219) (48b6b12)
- add Arrow and Orc file formats (#169) (43be00a)
- support nullable and non-default variation user-defined types (#217) (5851b02)
0.4.0 (2022-06-05)
- there was an accidental inclusion of a binary
not
function with unspecified behavior. This function was removed. Use the unarynot
function to return the compliment of an input argument.
0.3.0 (2022-05-22)
- support type function arguments in protobuf (#161) (df98816)
- define APPROX_COUNT_DISTINCT in new yaml for approximate aggregate functions (#204) (8e206b9)
- literals for extension types (#197) (296c266)
- support fractional seconds for interval_day literals (#199) (129e52f)
0.2.0 (2022-05-15)
- add flag FailureBehavior in Cast expression (#186) (a3d3b2f)
- add invocation property to AggregateFunction message for specifying distinct vs all (#191) (373b33f)