forked from memsql/singlestore-spark-connector
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGELOG
213 lines (163 loc) · 9 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
2024-06-14 Version 4.1.8
* Changed retry during reading from result table to use exponential backoff
* Used ForkJoinPool instead of FixedThreadPool
* Added more logging
2024-05-13 Version 4.1.7
* Fixed bug that caused reading from the wrong result table when the task was restarted
2024-04-11 Version 4.1.6
* Changed LoadDataWriter to send data in batches
* Added numPartitions parameter to specify exact number of resulting partition during parallel read
2023-10-05 Version 4.1.5
* Added support of Spark 3.5
* Updated dependencies
2023-07-18 Version 4.1.4
* Added support of Spark 3.4
* Added connection attributes
* Fixed conflicts of result table names during parallel read
* Updated version of the SingleStore JDBC driver
2023-03-31 Version 4.1.3
* Updated version of the SingleStore JDBC driver
* Fixed error handling when `onDuplicateKeySQL` option is used
2023-02-21 Version 4.1.2
* Fixed an issue that would cause a `Table has reached its quota of 1 reader(s)`` error to be displayed when a parallel read was retried
2022-07-13 Version 4.1.1
* Added clientEndpoint option for Cloud deployment of the SingleStoreDB
* Fixed bug in the error handling that caused deadlock
* Added support of the Spark 3.3
2022-06-22 Version 4.1.0
* Added support of more SQL expressions in pushdown
* Added multi-partition to the parallel read
* Updated SingleStore JDBC Driver to 1.1.0
* Added JWT authentication
* Added connection pooling
2022-01-20 Version 4.0.0
* Changed connector to use SingleStore JDBC Driver instead of MariaDB JDBC Driver
2021-12-23 Version 3.2.2
* Added possibility to repartition result by columns in parallel read from aggregators
* Replaced usages of `transformDown` with `transform` in order to make connector work with Databricks 9.1 LTS
2021-12-14 Version 3.2.1
* Added support of the Spark 3.2
* Fixed links in the README
2021-11-29 Version 3.2.0
* Added support for reading in parallel from aggregator nodes instead of leaf nodes
2021-09-16 Version 3.1.3
* Added Spark 3.1 support
* Deleted Spark 2.3 and 2.4 support
2021-04-29 Version 3.1.2
* Added using external host and port by default while using `useParallelRead`
2021-02-05 Version 3.1.1
* Added support of `com.memsql.spark` data source name for backward compatibility
2021-01-22 Version 3.1.0
* Rebranded `memsql-spark-connector` to `singlestore-spark-connector`
* Spark data source format changed from `memsql` to `singlestore`
* Configuration prefix changed from `spark.datasource.memsql.<config_name>` to `spark.datasource.singlestore.<config_name>`
2020-10-19 Version 3.0.5
* Fixed bug with load balance connections to dml endpoint
2020-09-29 Version 3.1.0-beta
* Added Spark 3.0 support
* Fixed bugs in pushdowns
* Fixed bug with wrong SQL code generation of attribute names that contains special characters
* Added methods that allow you to run SQL queries on a MemSQL database directly
2020-08-20 Version 3.0.4
* Added trim pushdown
2020-08-14 Version 3.0.3
* Fixed bug with pushdown of the join condition
2020-08-03 Version 3.0.2
* added maxErrors option
* changed aliases in SQL queries to be more deterministic
* disabled comments inside of the SQL queries when logging level is not TRACE
2020-06-12 Version 3.0.1
* The connector now updates task metrics with the number of records written during write operations
2020-05-27 Version 3.0.0
* Introduces SQL Optimization & Rewrite for most query shapes and compatible expressions
* Implemented as a native Spark SQL plugin
* Supports both the DataSource and DataSourceV2 API for maximum support of current and future functionality
* Contains deep integrations with the Catalyst query optimizer
* Is compatible with Spark 2.3 and 2.4
* Leverages MemSQL LOAD DATA to accelerate ingest from Spark via compression, vectorized cpu instructions, and optimized segment sizes
* Takes advantage of all the latest and greatest features in MemSQL 7.x
2020-05-06 Version 3.0.0-rc1
* Support writing into MemSQL reference tables
* Deprecated truncate option in favor of overwriteBehavior
* New option overwriteBehavior allows you to specify how to overwrite or merge rows during ingest
* The Ignore SaveMode now correctly skips all duplicate key errors during ingest
2020-04-30 Version 3.0.0-beta12
* Improved performance of new batch insert functionality for `ON DUPLICATE KEY UPDATE` feature
2020-04-30 Version 3.0.0-beta11
* Added support for merging rows on ingest via `ON DUPLICATE KEY UPDATE`
* Added docker-based demo for running a Zeppelin notebook using the Spark connector
2020-04-20 Version 3.0.0-beta10
* Additional functions supported in SQL Pushdown: toUnixTimestamp, unixTimestamp, nextDay, dateDiff, monthsAdd, hypot, rint
* Now tested against MemSQL 6.7, and all tests use SSL
* Fixed bug with disablePushdown
2020-04-09 Version 3.0.0-beta9
* Add null handling to address Spark bug which causes incorrect handling of null literals (https://issues.apache.org/jira/browse/SPARK-31403)
2020-04-01 Version 3.0.0-beta8
* Added support for more datetime expressions:
* addition/subtraction of datetime objects
* to_utc_timestamp, from_utc_timestamp
* date_trunc, trunc
2020-03-25 Version 3.0.0-beta7
* The connector now respects column selection when loading dataframes into MemSQL
2020-03-24 Version 3.0.0-beta6
* Fix bug when you use an expression in an explicit query
2020-03-23 Version 3.0.0-beta5
* Increase connection timeout to increase connector reliability
2020-03-20 Version 3.0.0-beta4
* Set JDBC driver to MariaDB explicitely to avoid issues with the mysql driver
2020-03-19 Version 3.0.0-beta3
* Created tables default to Columnstore
* User can override keys attached to new tables
* New parallelRead option which enables reading directly from MemSQL leaf nodes
* Created tables now set case-sensitive collation on all columns
to match Spark semantics
* More SQL expressions supported in pushdown (tanh, sinh, cosh)
2020-02-08 Version 3.0.0-beta2
* Removed options: masterHost and masterPort
* Added ddlEndpoint and ddlEndpoints options
* Added path option to support specifying the dbtable via `.load("mytable")` when creating a dataframe
2020-01-30 Version 3.0.0-beta
* Full re-write of the Spark Connector
2019-02-27 Version 2.0.7
* Add support for EXPLAIN JSON in MemSQL versions 6.7 and later to fix partition pushdown.
2018-09-14 Version 2.0.6
* Force utf-8 encoding when loading data into MemSQL
2018-01-18 Version 2.0.5
* Explicitly sort MemSQLRDD partitions due to MemSQL 6.0 no longer returning partitions in sorted order by ordinal.
2017-08-31 Version 2.0.4
* Switch threads in LoadDataStrategy so that the parent thread reads from the RDD and the new thread writes
to MemSQL so that Spark has access to the thread-local variables it expects
2017-07-19 Version 2.0.3
* Handle special characters column names in query
* Add option to enable jdbc connector to stream result sets row-by-row
* Fix groupby queries incorrectly pushed down to leaves
* Add option to write to master aggregator only
* Add support for reading MemSQL columns of type unsigned bigint and unsigned int
2017-04-17
* Pull MemSQL configuration from runtime configuration in sparkSession.conf instead of static config in sparkContext
* Fix connection pooling bug where extraneous connections were created
* Add MemSQL configuration to disable partition pushdown
2017-02-06 Version 2.0.1
* Fixed bug to enable partition pushdown for MemSQL DataFrames loaded from a custom user query
2017-02-01 Version 2.0.0
* Compatible with Apache Spark 2.0.0+
* Removed experimental strategy SQL pushdown to instead use the more stable Data Sources API for reading
data from MemSQL
* Removed memsql-spark-interface, memsql-etl
2015-12-15 Version 1.2.1
* Python support for extractors and transformers
* More extensive SQL pushdown for DataFrame operations
* Use DataFrames as common interface between extractor, transformer, and loader
* Rewrite connectorLib internals to support SparkSQL relation provider API
* Remove RDD.saveToMemSQL
2015-11-19 Version 1.1.1
* Set JDBC login timeout to 10 seconds
2015-11-02 Version 1.1.0
* Available on Maven Central Repository
* More events for batches
* Deprecated the old Kafka extractor and replaced it with a new one that takes in a Zookeeper quorum address
* Added a new field to pipeline API responses indicating whether or not a pipeline is currently running
* Renamed projects: memsqlsparkinterface -> memsql-spark-interface, memsqletl -> memsql-etl, memsqlrdd -> memsql-connector.
* Robustness and bug fixes
2015-09-24 Version 1.0.0
* Initial release of MemSQL Streamliner