Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.4.1 lyft #56

Closed
wants to merge 7,555 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
7555 commits
Select commit Hold shift + click to select a range
8e63ca9
[SPARK-42559][CONNECT][TESTS][FOLLOW-UP] Disable ANSI in several test…
HyukjinKwon Mar 7, 2023
1847eba
[SPARK-42656][CONNECT][FOLLOWUP] Spark Connect Shell
zhenlineo Mar 7, 2023
f40c1d0
[SPARK-42656][SPARK SHELL][CONNECT][FOLLOWUP] Add same `ClassNotFound…
LuciferYang Mar 7, 2023
c47fc2d
[SPARK-42692][CONNECT] Implement `Dataset.toJSON`
LuciferYang Mar 7, 2023
b310827
[SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate mi…
ueshin Mar 8, 2023
ef20d70
[SPARK-42705][CONNECT] Fix spark.sql to return values from the command
ueshin Mar 8, 2023
b3673f0
[SPARK-39399][CORE][K8S] Fix proxy-user authentication for Spark on k…
shrprasa Mar 8, 2023
3a29be4
[SPARK-42700][BUILD] Add `h2` as test dependency of connect-server mo…
LuciferYang Mar 8, 2023
824b78a
[SPARK-42707][CONNECT][DOCS] Update developer documentation about API…
HyukjinKwon Mar 8, 2023
0e959a5
[SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined…
xinrong-meng Mar 8, 2023
60a4041
[SPARK-41775][PYTHON][FOLLOW-UP] Updating error message for training …
rithwik-db Mar 8, 2023
9c10cd9
[SPARK-42712][PYTHON][DOC] Improve docstring of mapInPandas and mapIn…
xinrong-meng Mar 8, 2023
fa0f0e6
[SPARK-42266][PYTHON] Remove the parent directory in shell.py executi…
HyukjinKwon Mar 8, 2023
1fcf497
[SPARK-42713][PYTHON][DOCS] Add '__getattr__' and '__getitem__' of Da…
zhengruifeng Mar 8, 2023
ee3daec
[SPARK-42684][SQL] v2 catalog should not allow column default value b…
cloud-fan Mar 8, 2023
fd97de4
[SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the…
beliefer Mar 8, 2023
5aae970
[SPARK-42709][PYTHON] Remove the assumption of `__file__` being avail…
HyukjinKwon Mar 8, 2023
5eb4edf
[SPARK-42480][SQL] Improve the performance of drop partitions
wecharyu Mar 9, 2023
8655dfe
[SPARK-42722][CONNECT][PYTHON] Python Connect def schema() should not…
amaliujia Mar 9, 2023
fc29b07
[SPARK-42656][CONNECT][FOLLOWUP] Fix the spark-connect script
zhenlineo Mar 9, 2023
006e838
[SPARK-42723][SQL] Support parser data type json "timestamp_ltz" as T…
gengliangwang Mar 9, 2023
afced91
[SPARK-42697][WEBUI] Fix /api/v1/applications to return total uptime …
yaooqinn Mar 9, 2023
0191a5b
[SPARK-42690][CONNECT] Implement CSV/JSON parsing functions for Scala…
LuciferYang Mar 9, 2023
74cf1a3
[SPARK-42724][CONNECT][BUILD] Upgrade buf to v1.15.1
panbingkun Mar 9, 2023
f3e69a1
[SPARK-42710][CONNECT][PYTHON] Rename FrameMap proto to MapPartitions
xinrong-meng Mar 9, 2023
e38e619
[SPARK-42630][CONNECT][PYTHON] Introduce UnparsedDataType and delay p…
ueshin Mar 9, 2023
7e4f870
[SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work witho…
ueshin Mar 10, 2023
94a2afc
[SPARK-42726][CONNECT][PYTHON] Implement `DataFrame.mapInArrow`
xinrong-meng Mar 10, 2023
8012728
[SPARK-42702][SPARK-42623][SQL] Support parameterized query in subque…
cloud-fan Mar 10, 2023
12c7e75
Revert "[SPARK-42702][SPARK-42623][SQL] Support parameterized query i…
HyukjinKwon Mar 10, 2023
a01f4d6
[SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array p…
zhengruifeng Mar 10, 2023
49cf58e
[SPARK-42739][BUILD] Ensure release tag to be pushed to release branch
xinrong-meng Mar 10, 2023
4000d68
Preparing Spark release v3.4.0-rc4
xinrong-meng Mar 10, 2023
bc16710
Preparing development version 3.4.1-SNAPSHOT
xinrong-meng Mar 10, 2023
0357c9f
[SPARK-42702][SPARK-42623][SQL] Support parameterized query in subque…
cloud-fan Mar 10, 2023
4bbdcbc
[SPARK-42745][SQL] Improved AliasAwareOutputExpression works with DSv2
peter-toth Mar 10, 2023
24f3d4d
[SPARK-42743][SQL] Support analyze TimestampNTZ columns
gengliangwang Mar 10, 2023
7e71378
Revert "[SPARK-41498] Propagate metadata through Union"
cloud-fan Mar 10, 2023
6d8ea2f
[SPARK-42398][SQL][FOLLOWUP] DelegatingCatalogExtension should overri…
cloud-fan Mar 10, 2023
d79d102
[SPARK-42667][CONNECT][FOLLOW-UP] SparkSession created by newSession …
amaliujia Mar 10, 2023
516a202
[SPARK-42721][CONNECT] RPC logging interceptor
rangadi Mar 10, 2023
67ccd8f
[SPARK-42721][CONNECT][FOLLOWUP] Apply scalafmt to LoggingInterceptor
dongjoon-hyun Mar 11, 2023
d3fd9ff
[SPARK-42691][CONNECT][PYTHON] Implement Dataset.semanticHash
beliefer Mar 11, 2023
cb7ae04
[SPARK-42747][ML] Fix incorrect internal status of LoR and AFT
zhengruifeng Mar 11, 2023
2e4238c
[SPARK-42679][CONNECT][PYTHON] createDataFrame doesn't work with non-…
panbingkun Mar 13, 2023
def02cb
[SPARK-42755][CONNECT] Factor literal value conversion out to `connec…
zhengruifeng Mar 13, 2023
ea25bda
[SPARK-42756][CONNECT][PYTHON] Helper function to convert proto liter…
zhengruifeng Mar 13, 2023
dadb5a5
[SPARK-42773][DOCS][PYTHON] Minor update to 3.4.0 version change mess…
Mar 14, 2023
a352507
[SPARK-42777][SQL] Support converting TimestampNTZ catalog stats to p…
gengliangwang Mar 14, 2023
e93b59f
[SPARK-42496][CONNECT][DOCS] Adding Spark Connect to the Spark 3.4 do…
Mar 14, 2023
f6b5fd9
[SPARK-42733][CONNECT][FOLLOWUP] Write without path or table
zhenlineo Mar 14, 2023
d5a62e9
[SPARK-42785][K8S][CORE] When spark submit without `--deploy-mode`, a…
zwangsheng Mar 14, 2023
f36325d
[SPARK-42770][CONNECT] Add `truncatedTo(ChronoUnit.MICROS)` to make `…
LuciferYang Mar 14, 2023
24cdae8
[SPARK-42754][SQL][UI] Fix backward compatibility issue in nested SQL…
linhongliu-db Mar 14, 2023
1c7d780
[SPARK-42731][CONNECT][DOCS] Document Spark Connect configurations
HyukjinKwon Mar 14, 2023
3b4fd1d
[SPARK-42757][CONNECT] Implement textFile for DataFrameReader
panbingkun Mar 14, 2023
bf9c4b9
[SPARK-42793][CONNECT] `connect` module requires `build_profile_flags`
dongjoon-hyun Mar 14, 2023
4c17885
[SPARK-42796][SQL] Support accessing TimestampNTZ columns in CachedBatch
gengliangwang Mar 15, 2023
cc66287
[SPARK-42797][CONNECT][DOCS] Grammatical improvements for Spark Conne…
Mar 15, 2023
d92e5a5
[SPARK-42765][CONNECT][PYTHON] Enable importing `pandas_udf` from `py…
xinrong-meng Mar 15, 2023
ab7c4f8
[SPARK-42801][CONNECT][TESTS] Ignore flaky `write jdbc` test of `Clie…
dongjoon-hyun Mar 15, 2023
9c1cb47
[SPARK-42706][SQL][DOCS][3.4] Document the Spark SQL error classes in…
itholic Mar 15, 2023
492aa38
[SPARK-42799][BUILD] Update SBT build `xercesImpl` version to match w…
dongjoon-hyun Mar 15, 2023
fb729ad
[MINOR][PYTHON] Change TypeVar to private symbols
MaicoTimmerman Mar 15, 2023
0f71caa
[SPARK-42496][CONNECT][DOCS][FOLLOW-UP] Addressing feedback to remove…
Mar 15, 2023
07324b8
[SPARK-42818][CONNECT][PYTHON] Implement DataFrameReader/Writer.jdbc
ueshin Mar 16, 2023
fc43fa1
[SPARK-42818][CONNECT][PYTHON][FOLLOWUP] Add versionchanged
ueshin Mar 16, 2023
9f1e8af
[SPARK-42767][CONNECT][TESTS] Add a precondition to start connect ser…
LuciferYang Mar 16, 2023
89e7e3d
[SPARK-42820][BUILD] Update ORC to 1.8.3
williamhyun Mar 16, 2023
60320f0
[SPARK-42812][CONNECT] Add client_type to AddArtifactsRequest protobu…
vicennial Mar 16, 2023
62d6a3b
[SPARK-42817][CORE] Logging the shuffle service name once in Applicat…
otterc Mar 16, 2023
833599c
[SPARK-42826][PS][DOCS] Add migration notes for update to supported p…
itholic Mar 17, 2023
ca75340
[SPARK-42824][CONNECT][PYTHON] Provide a clear error message for unsu…
itholic Mar 17, 2023
c29cf34
[SPARK-42823][SQL] `spark-sql` shell supports multipart namespaces fo…
yaooqinn Mar 17, 2023
5bd7b09
[SPARK-42778][SQL][3.4] QueryStageExec should respect supportsRowBased
ulysses-you Mar 17, 2023
8804803
[SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect
ueshin Mar 20, 2023
613be4b
[SPARK-42848][CONNECT][PYTHON] Implement DataFrame.registerTempTable
ueshin Mar 20, 2023
a0993ba
[SPARK-41818][SPARK-41843][CONNECT][PYTHON][TESTS] Enable more parity…
ueshin Mar 20, 2023
ba132ef
[SPARK-42247][CONNECT][PYTHON] Fix UserDefinedFunction to have return…
ueshin Mar 20, 2023
666eb65
[SPARK-42852][SQL] Revert NamedLambdaVariable related changes from Eq…
peter-toth Mar 20, 2023
57c9691
[MINOR][TEST] Fix spelling of 'regex' for RegexFilter
yaooqinn Mar 20, 2023
a28b1ab
[SPARK-42557][CONNECT][FOLLOWUP] Remove `broadcast` `ProblemFilters.e…
LuciferYang Mar 20, 2023
602aaff
[SPARK-42875][CONNECT][PYTHON] Fix toPandas to handle timezone and ma…
ueshin Mar 21, 2023
5222cfd
[SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` pr…
zhengruifeng Mar 21, 2023
ed797bb
[SPARK-42340][CONNECT][PYTHON][3.4] Implement Grouped Map API
xinrong-meng Mar 21, 2023
8cffa5c
[SPARK-42876][SQL] DataType's physicalDataType should be private[sql]
amaliujia Mar 21, 2023
6d5414f
[SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with support…
rednaxelafx Mar 21, 2023
d123bc6
[MINOR][DOCS] Remove SparkSession constructor invocation in the example
HyukjinKwon Mar 22, 2023
ca260cc
[SPARK-42888][BUILD] Upgrade `gcs-connector` to 2.2.11
cnauroth Mar 22, 2023
61b85ae
[SPARK-42816][CONNECT] Support Max Message size up to 128MB
grundprinzip Mar 22, 2023
5622981
[SPARK-42889][CONNECT][PYTHON] Implement cache, persist, unpersist, a…
ueshin Mar 22, 2023
56d9ccf
[SPARK-42893][PYTHON][3.4] Block Arrow-optimized Python UDFs
xinrong-meng Mar 22, 2023
f73b5c5
[SPARK-42894][CONNECT] Support `cache`/`persist`/`unpersist`/`storage…
LuciferYang Mar 22, 2023
a1b853c
[SPARK-42899][SQL] Fix DataFrame.to(schema) to handle the case where …
ueshin Mar 23, 2023
253cb7a
[SPARK-42901][CONNECT][PYTHON] Move `StorageLevel` into a separate fi…
LuciferYang Mar 23, 2023
a25c0ea
[SPARK-42878][CONNECT] The table API in DataFrameReader could also ac…
amaliujia Mar 23, 2023
827eeb4
[SPARK-42903][PYTHON][DOCS] Avoid documenting None as as a return val…
HyukjinKwon Mar 23, 2023
4d06299
[SPARK-42900][CONNECT][PYTHON] Fix createDataFrame to respect inferen…
ueshin Mar 23, 2023
88eaaea
[SPARK-42903][PYTHON][DOCS] Avoid documenting None as as a return val…
xinrong-meng Mar 23, 2023
f20a269
[SPARK-42202][CONNECT][TEST][FOLLOWUP] Loop around command entry in S…
juliuszsompolski Mar 24, 2023
d44c7c0
[SPARK-42904][SQL] Char/Varchar Support for JDBC Catalog
yaooqinn Mar 24, 2023
b74f792
[SPARK-42861][SQL] Use private[sql] instead of protected[sql] to avoi…
cloud-fan Mar 24, 2023
3122d4f
[SPARK-42891][CONNECT][PYTHON][3.4] Implement CoGrouped Map API
xinrong-meng Mar 24, 2023
f5f53e4
[SPARK-42917][SQL] Correct getUpdateColumnNullabilityQuery for DerbyD…
yaooqinn Mar 24, 2023
594c8fe
[SPARK-42884][CONNECT] Add Ammonite REPL integration
hvanhovell Mar 24, 2023
1b95b4d
[SPARK-42911][PYTHON][3.4] Introduce more basic exceptions
ueshin Mar 27, 2023
31ede73
[SPARK-42920][CONNECT][PYTHON] Enable tests for UDF with UDT
ueshin Mar 27, 2023
aba1c3b
[SPARK-42899][SQL][FOLLOWUP] Project.reconcileColumnType should use K…
ueshin Mar 27, 2023
dde9de6
[SPARK-42924][SQL][CONNECT][PYTHON] Clarify the comment of parameteri…
MaxGekk Mar 27, 2023
c701859
[SPARK-42930][CORE][SQL] Change the access scope of `ProtobufSerDe` r…
LuciferYang Mar 27, 2023
d7f2a6b
[SPARK-42934][BUILD] Add `spark.hadoop.hadoop.security.key.provider.p…
LuciferYang Mar 27, 2023
604fee6
[SPARK-42906][K8S] Replace a starting digit with `x` in resource name…
pan3793 Mar 27, 2023
46866aa
[SPARK-42936][SQL] Fix LCA bug when the having clause can be resolved…
anchovYu Mar 28, 2023
61293fa
[SPARK-41876][CONNECT][PYTHON] Implement DataFrame.toLocalIterator
ueshin Mar 28, 2023
2cd341f
[SPARK-42922][SQL] Move from Random to SecureRandom
Mar 28, 2023
0c4ad50
[SPARK-42908][PYTHON] Raise RuntimeError when SparkContext is require…
xinrong-meng Mar 28, 2023
0620b56
[SPARK-42928][SQL] Make resolvePersistentFunction synchronized
allisonwang-db Mar 28, 2023
a2dd949
[SPARK-42937][SQL] `PlanSubqueries` should set `InSubqueryExec#should…
bersprockets Mar 28, 2023
3124470
[SPARK-42927][CORE] Change the access scope of `o.a.spark.util.Iterat…
LuciferYang Mar 28, 2023
a9cacc1
[SPARK-42946][SQL] Redact sensitive data which is nested by variable …
yaooqinn Mar 29, 2023
2256459
[SPARK-42957][INFRA] `release-build.sh` should not remove SBOM artifacts
dongjoon-hyun Mar 29, 2023
dc834d4
[SPARK-42895][CONNECT] Improve error messages for stopped Spark sessions
allisonwang-db Mar 29, 2023
4e20467
[SPARK-42957][INFRA][FOLLOWUP] Use 'cyclonedx' instead of file extens…
dongjoon-hyun Mar 29, 2023
ce36692
[SPARK-42631][CONNECT][FOLLOW-UP] Expose Column.expr to extensions
tomvanbussel Mar 29, 2023
f39ad61
Preparing Spark release v3.4.0-rc5
xinrong-meng Mar 30, 2023
6a6f504
Preparing development version 3.4.1-SNAPSHOT
xinrong-meng Mar 30, 2023
6e4fcf7
[SPARK-42971][CORE] Change to print `workdir` if `appDirs` is null wh…
LuciferYang Mar 30, 2023
e586527
Revert "[SPARK-41765][SQL] Pull out v1 write metrics to WriteFiles"
cloud-fan Mar 30, 2023
e20b55b
[SPARK-42967][CORE][3.2][3.3][3.4] Fix SparkListenerTaskStart.stageAt…
jiangxb1987 Mar 30, 2023
98f00ea
[SPARK-42970][CONNECT][PYTHON][TESTS][3.4] Reuse pyspark.sql.tests.te…
ueshin Mar 30, 2023
68fa8ca
[SPARK-42969][CONNECT][TESTS] Fix the comparison the result with Arro…
ueshin Mar 30, 2023
df858d3
[SPARK-42998][CONNECT][PYTHON] Fix DataFrame.collect with null struct
ueshin Apr 1, 2023
807abf9
[SPARK-43004][CORE] Fix typo in ResourceRequest.equals()
thyecust Apr 3, 2023
beb8928
[SPARK-42519][CONNECT][TESTS] Add More WriteTo Tests In Spark Connect…
Hisoka-X Apr 3, 2023
9244afb
[SPARK-43005][PYSPARK] Fix typo in pyspark/pandas/config.py
thyecust Apr 3, 2023
54d1b62
[SPARK-43006][PYSPARK] Fix typo in StorageLevel __eq__()
thyecust Apr 3, 2023
47b2912
[MINOR][DOCS] Add Java 8 types to value types of Scala/Java APIs
MaxGekk Apr 3, 2023
ce6d5eb
[SPARK-43006][PYTHON][TESTS] Fix DataFrameTests.test_cache_dataframe
ueshin Apr 3, 2023
a647fef
[SPARK-42974][CORE][3.4] Restore `Utils.createTempDir` to use the `Sh…
LuciferYang Apr 4, 2023
9b1f2db
[SPARK-43011][SQL] `array_insert` should fail with 0 index
zhengruifeng Apr 4, 2023
444053f
[SPARK-42655][SQL] Incorrect ambiguous column reference error
shrprasa Apr 4, 2023
532d446
[SPARK-43009][SQL][3.4] Parameterized `sql()` with `Any` constants
MaxGekk Apr 5, 2023
d373661
[MINOR][PYTHON][CONNECT][DOCS] Deduplicate versionchanged directive i…
HyukjinKwon Apr 5, 2023
c79fc94
[SPARK-42983][CONNECT][PYTHON] Fix createDataFrame to handle 0-dim nu…
ueshin Apr 5, 2023
34c7c3b
[MINOR][CONNECT][DOCS] Clarify Spark Connect option in Spark scripts
HyukjinKwon Apr 5, 2023
f2900f8
[SPARK-43018][SQL] Fix bug for INSERT commands with timestamp literals
dtenedor Apr 6, 2023
9037642
[SPARK-43041][SQL] Restore constructors of exceptions for compatibili…
aokolnychyi Apr 6, 2023
28d0723
Preparing Spark release v3.4.0-rc6
xinrong-meng Apr 6, 2023
1d974a7
Preparing development version 3.4.1-SNAPSHOT
xinrong-meng Apr 6, 2023
b2ff4c4
[SPARK-39696][CORE] Fix data race in access to TaskMetrics.externalAc…
eejbyfeldt Apr 7, 2023
87a5442
Preparing Spark release v3.4.0-rc7
xinrong-meng Apr 7, 2023
ada16f6
Log first successful container request (#20)
catalinii Dec 9, 2020
e2d9c39
Do not localize s3 files
Apr 14, 2021
2ce9f3c
Automatic staging committer conflict-mode for dynamic partition overw…
dzhi-lyft Apr 14, 2021
c2ceeb5
Backport [SPARK-30707][SQL]Window function set partitionSpec as order…
Jun 29, 2021
54264af
[SPARK-28098][SQL]Support read partitioned Hive tables with (#40)
catalinii Aug 16, 2021
ce0407b
Trunc function does not contain the Q string (for quarter) (#45)
catalinii Jan 24, 2022
0ef54ce
Use IP address on the executor side instead of hostname and random po…
catalinii Oct 12, 2022
032f2f2
[SPARK-32838][SQL]Check DataSource insert command path with actual path
Oct 25, 2022
6707629
Fix Long type in theschema and int32 parquet type
Oct 7, 2021
57f90d7
Remove unnecessary logging statements
andelink Nov 4, 2022
ed7a392
Preparing development version 3.4.1-SNAPSHOT
xinrong-meng Apr 7, 2023
f19c37b
[SPARK-43069][BUILD] Use `sbt-eclipse` instead of `sbteclipse-plugin`
dongjoon-hyun Apr 7, 2023
d7f5c4c
[SPARK-43067][SS] Correct the location of error class resource file i…
HeartSaVioR Apr 8, 2023
2f6725d
[SPARK-43075][CONNECT] Change `gRPC` to `grpcio` when it is not insta…
bjornjorgensen Apr 9, 2023
a0938cf
[MINOR][SQL][TESTS] Tests in `SubquerySuite` should not drop view cre…
bersprockets Apr 10, 2023
5f53a44
[SPARK-43072][DOC] Include TIMESTAMP_NTZ type in ANSI Compliance doc
gengliangwang Apr 10, 2023
7f31b98
[SPARK-43071][SQL] Support SELECT DEFAULT with ORDER BY, LIMIT, OFFSE…
dtenedor Apr 10, 2023
83484c5
[SPARK-43083][SQL][TESTS] Mark `*StateStoreSuite` as `ExtendedSQLTest`
dongjoon-hyun Apr 10, 2023
933abb8
[SPARK-43085][SQL] Support column DEFAULT assignment for multi-part t…
dtenedor Apr 13, 2023
ede226b
[SPARK-43126][SQL] Mark two Hive UDF expressions as stateful
cloud-fan Apr 14, 2023
89d3e39
[SPARK-43125][CONNECT] Fix Connect Server Can't Handle Exception With…
Hisoka-X Apr 14, 2023
e04bdbe
[SPARK-43050][SQL] Fix construct aggregate expressions by replacing g…
wangyum Apr 15, 2023
e077310
[SPARK-43139][SQL][DOCS] Fix incorrect column names in sql-ref-syntax…
wangyum Apr 16, 2023
64afee8
[SPARK-42475][DOCS][FOLLOW-UP] Fix PySpark connect Quickstart binder …
HyukjinKwon Apr 17, 2023
b0e1263
[SPARK-43158][DOCS] Set upperbound of pandas version for Binder integ…
HyukjinKwon Apr 17, 2023
6a799f0
[SPARK-42475][DOCS][FOLLOW-UP] Fix the version string with dev0 to wo…
HyukjinKwon Apr 17, 2023
de79e2c
Revert "[SPARK-42475][DOCS][FOLLOW-UP] Fix the version string with de…
HyukjinKwon Apr 17, 2023
3dff7ba
[SPARK-43141][BUILD] Ignore generated Java files in checkstyle
HyukjinKwon Apr 16, 2023
29730dd
[SPARK-42078][PYTHON][FOLLOWUP] Add `CapturedException` to utils
itholic Apr 17, 2023
4686fe8
[SPARK-43113][SQL] Evaluate stream-side variables when generating cod…
bersprockets Apr 18, 2023
404259d
[SPARK-43098][SQL] Fix correctness COUNT bug when scalar subquery has…
jchen5 Apr 19, 2023
8bda273
[SPARK-37829][SQL] Dataframe.joinWith outer-join should return a null…
kings129 Apr 19, 2023
b8bb32d
Revert [SPARK-39203][SQL] Rewrite table location to absolute URI base…
cloud-fan Apr 21, 2023
279da72
[MINOR][CONNECT][PYTHON][DOCS] Fix the doc of parameter `num` in `Dat…
zhengruifeng Apr 21, 2023
c5172ad
[SPARK-43113][SQL][FOLLOWUP] Add comment about copying steam-side var…
bersprockets Apr 21, 2023
d3f1eec
[SPARK-43249][CONNECT] Fix missing stats for SQL Command
grundprinzip Apr 24, 2023
8f52bbd
[SPARK-43293][SQL] `__qualified_access_only` should be ignored in nor…
cloud-fan Apr 27, 2023
3b681ff
[SPARK-43156][SQL][3.4] Fix `COUNT(*) is null` bug in correlated scal…
Hisoka-X May 2, 2023
75be2ac
[SPARK-43336][SQL] Casting between Timestamp and TimestampNTZ require…
gengliangwang May 2, 2023
27b2797
[SPARK-43313][SQL] Adding missing column DEFAULT values for MERGE INS…
dtenedor May 4, 2023
0bba750
[SPARK-43378][CORE] Properly close stream objects in deserializeFromC…
eejbyfeldt May 5, 2023
02aa835
[SPARK-43284] Switch back to url-encoded strings
databricks-david-lewis May 5, 2023
3544ee6
[SPARK-43284][SQL][FOLLOWUP] Return URI encoded path, and add a test
databricks-david-lewis May 5, 2023
8b1b153
[SPARK-43337][UI][3.4] Asc/desc arrow icons for sorting column does n…
maytasm May 5, 2023
ff81db5
[SPARK-43340][CORE] Handle missing stack-trace field in eventlogs
amahussein May 5, 2023
d7c034b
[SPARK-43374][INFRA] Move protobuf-java to BSD 3-clause group and upd…
yaooqinn May 5, 2023
eea5dac
[SPARK-43395][BUILD] Exclude macOS tar extended metadata in make-dist…
pan3793 May 6, 2023
b824fc5
[SPARK-43342][K8S] Revert SPARK-39006 Show a directional error messag…
dcoliversun May 7, 2023
0e92da5
[SPARK-43414][TESTS] Fix flakiness in Kafka RDD suites due to port bi…
JoshRosen May 8, 2023
81de599
[SPARK-43425][SQL][3.4] Add `TimestampNTZType` to `ColumnarBatchRow`
Fokko May 11, 2023
2921bb6
[SPARK-43441][CORE] `makeDotNode` should not fail when DeterministicL…
TQJADE May 11, 2023
689e35d
[SPARK-43471][CORE] Handle missing hadoopProperties and metricsProper…
dongjoon-hyun May 11, 2023
e820b23
[SPARK-43483][SQL][DOCS] Adds SQL references for OFFSET clause
beliefer May 15, 2023
5f5459a
[SPARK-43281][SQL] Fix concurrent writer does not update file metrics
ulysses-you May 16, 2023
7234334
[SPARK-43517][PYTHON][DOCS] Add a migration guide for namedtuple monk…
HyukjinKwon May 16, 2023
318ceb0
[SPARK-43043][CORE] Improve the performance of MapOutputTracker.updat…
jiangxb1987 May 16, 2023
dbceb0f
[SPARK-43527][PYTHON] Fix `catalog.listCatalogs` in PySpark
zhengruifeng May 16, 2023
482dce6
[SPARK-42826][3.4][FOLLOWUP][PS][DOCS] Update migration notes for pan…
itholic May 18, 2023
c19e0f3
[SPARK-43547][3.4][PS][DOCS] Update "Supported Pandas API" page to po…
itholic May 18, 2023
f6db4f5
[SPARK-43157][SQL] Clone InMemoryRelation cached plan to prevent clon…
robreeves May 18, 2023
7324cf2
[SPARK-43522][SQL] Fix creating struct column name with index of array
Hisoka-X May 18, 2023
38ae900
[SPARK-43450][SQL][TESTS] Add more `_metadata` filter test cases
olaky May 18, 2023
ce79b99
Revert "[SPARK-43313][SQL] Adding missing column DEFAULT values for M…
cloud-fan May 18, 2023
2681707
[SPARK-43541][SQL][3.4] Propagate all `Project` tags in resolving of …
MaxGekk May 18, 2023
b071461
[SPARK-43587][CORE][TESTS] Run `HealthTrackerIntegrationSuite` in a d…
dongjoon-hyun May 19, 2023
e23d149
[SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` t…
dongjoon-hyun May 19, 2023
fc9401e
[SPARK-43718][SQL] Set nullable correctly for keys in USING joins
bersprockets May 23, 2023
21c12e2
[SPARK-43719][WEBUI] Handle `missing row.excludedInStages` field
dongjoon-hyun May 23, 2023
963a368
[MINOR][PS][TESTS] Fix `SeriesDateTimeTests.test_quarter` to work pro…
itholic May 23, 2023
382a0fe
[SPARK-43758][BUILD] Upgrade snappy-java to 1.1.10.0
sunchao May 24, 2023
68d34dd
[SPARK-43758][BUILD][FOLLOWUP][3.4] Update Hadoop 2 dependency manifest
dongjoon-hyun May 24, 2023
2d792a0
[SPARK-43759][SQL][PYTHON] Expose TimestampNTZType in pyspark.sql.types
ueshin May 24, 2023
d8b79c7
[SPARK-43751][SQL][DOC] Document `unbase64` behavior change
pan3793 May 26, 2023
177d172
[SPARK-43802][SQL][3.4] Fix codegen for unhex and unbase64 with failO…
Kimahriman May 27, 2023
3b3ffe3
[SPARK-42421][CORE] Use the utils to get the switch for dynamic alloc…
jiwq May 29, 2023
80a396f
[SPARK-43894][PYTHON] Fix bug in df.cache()
grundprinzip May 31, 2023
4248442
[SPARK-43760][SQL][3.4] Nullability of scalar subquery results
agubichev Jun 1, 2023
fd6397d
[SPARK-43949][PYTHON] Upgrade cloudpickle to 2.2.1
HyukjinKwon Jun 2, 2023
9db9002
[SPARK-43956][SQL][3.4] Fix the bug doesn't display column's sql for …
beliefer Jun 3, 2023
c8884e8
[SPARK-43911][SQL] Use toSet to deduplicate the iterator data to prev…
mcdull-zhang Jun 4, 2023
71f3bbc
Revert "[SPARK-43911][SQL] Use toSet to deduplicate the iterator data…
HyukjinKwon Jun 6, 2023
f7c4f1f
[SPARK-43973][SS][UI] Structured Streaming UI should display failed q…
rednaxelafx Jun 6, 2023
d7532d5
[SPARK-43510][YARN] Fix YarnAllocator internal state when adding runn…
manuzhang Jun 6, 2023
c435245
[SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exi…
dongjoon-hyun Jun 6, 2023
0f6c5da
[SPARK-43973][SS][UI][TESTS][FOLLOWUP][3.4] Fix compilation by switch…
dongjoon-hyun Jun 6, 2023
c74b99c
[MINOR][SQL][TESTS] Move ResolveDefaultColumnsSuite to 'o.a.s.sql'
dongjoon-hyun Jun 8, 2023
45812eb
[SPARK-42290][SQL] Fix the OOM error can't be reported when AQE on
Hisoka-X Jun 8, 2023
445e3ed
[SPARK-43404][SS][3.4] Skip reusing sst file for same version of Rock…
anishshri-db Jun 9, 2023
1f5d7da
[SPARK-43398][CORE] Executor timeout should be max of idle shuffle an…
warrenzhu25 Jun 12, 2023
9a2bdbe
[SPARK-32559][SQL] Fix the trim logic did't handle ASCII control char…
Jun 13, 2023
ea87ac5
[SPARK-44031][BUILD] Upgrade silencer to 1.7.13
dongjoon-hyun Jun 13, 2023
f1c8b0d
Revert "[SPARK-44031][BUILD] Upgrade silencer to 1.7.13"
dongjoon-hyun Jun 13, 2023
af8425e
[SPARK-44038][DOCS][K8S] Update YuniKorn docs with v1.3
dongjoon-hyun Jun 13, 2023
ead8e1f
[SPARK-44053][BUILD][3.4] Update ORC to 1.8.4
guiyanakuang Jun 14, 2023
729396d
[SPARK-44040][SQL] Fix compute stats when AggregateExec node above Qu…
wangyum Jun 16, 2023
15e69ee
[SPARK-44070][BUILD] Bump snappy-java 1.1.10.1
wangyum Jun 16, 2023
4134bac
[MINOR][K8S][DOCS] Fix all dead links for K8s doc
wangyum Jun 17, 2023
ffa7e68
[SPARK-44018][SQL] Improve the hashCode and toString for some DS V2 E…
beliefer Jun 19, 2023
9e54608
Preparing Spark release v3.4.1-rc1
dongjoon-hyun Jun 19, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
11 changes: 10 additions & 1 deletion .asf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

# https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories
# https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features
---
github:
description: "Apache Spark - A unified analytics engine for large-scale data processing"
Expand All @@ -27,3 +27,12 @@ github:
- jdbc
- sql
- spark
enabled_merge_buttons:
merge: false
squash: true
rebase: true

notifications:
pullrequests: reviews@spark.apache.org
issues: reviews@spark.apache.org
commits: commits@spark.apache.org
3 changes: 3 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Thanks for sending a pull request! Here are some tips for you:
6. If possible, provide a concise example to reproduce the issue for a faster review.
7. If you want to add a new configuration, please read the guideline first for naming configurations in
'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
8. If you want to add or modify an error type or message, please read the guideline first in
'core/src/main/resources/error/README.md'.
-->

### What changes were proposed in this pull request?
Expand Down Expand Up @@ -43,4 +45,5 @@ If no, write 'No'.
If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
If tests were not added, please describe why they were not added and/or why it was difficult to add.
If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
-->
24 changes: 16 additions & 8 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,12 +84,12 @@ SPARK SHELL:
- "repl/**/*"
- "bin/spark-shell*"
SQL:
#- any: ["**/sql/**/*", "!python/pyspark/sql/avro/**/*", "!python/pyspark/sql/streaming.py", "!python/pyspark/sql/tests/test_streaming.py"]
#- any: ["**/sql/**/*", "!python/pyspark/sql/avro/**/*", "!python/pyspark/sql/streaming/**/*", "!python/pyspark/sql/tests/streaming/test_streaming.py"]
- "**/sql/**/*"
- "common/unsafe/**/*"
#- "!python/pyspark/sql/avro/**/*"
#- "!python/pyspark/sql/streaming.py"
#- "!python/pyspark/sql/tests/test_streaming.py"
#- "!python/pyspark/sql/streaming/**/*"
#- "!python/pyspark/sql/tests/streaming/test_streaming.py"
- "bin/spark-sql*"
- "bin/beeline*"
- "sbin/*thriftserver*.sh"
Expand All @@ -103,7 +103,7 @@ SQL:
- "**/*schema.R"
- "**/*types.R"
AVRO:
- "external/avro/**/*"
- "connector/avro/**/*"
- "python/pyspark/sql/avro/**/*"
DSTREAM:
- "streaming/**/*"
Expand All @@ -123,13 +123,15 @@ MLLIB:
- "python/pyspark/mllib/**/*"
STRUCTURED STREAMING:
- "**/sql/**/streaming/**/*"
- "external/kafka-0-10-sql/**/*"
- "python/pyspark/sql/streaming.py"
- "python/pyspark/sql/tests/test_streaming.py"
- "connector/kafka-0-10-sql/**/*"
- "python/pyspark/sql/streaming/**/*"
- "python/pyspark/sql/tests/streaming/test_streaming.py"
- "**/*streaming.R"
PYTHON:
- "bin/pyspark*"
- "**/python/**/*"
PANDAS API ON SPARK:
- "python/pyspark/pandas/**/*"
R:
- "**/r/**/*"
- "**/R/**/*"
Expand All @@ -149,4 +151,10 @@ WEB UI:
- "**/*UI.scala"
DEPLOY:
- "sbin/**/*"

CONNECT:
- "connector/connect/**/*"
- "**/sql/sparkconnect/**/*"
- "python/pyspark/sql/**/connect/**/*"
PROTOBUF:
- "connector/protobuf/**/*"
- "python/pyspark/sql/protobuf/**/*"
195 changes: 195 additions & 0 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

name: Run benchmarks

on:
workflow_dispatch:
inputs:
class:
description: 'Benchmark class'
required: true
default: '*'
jdk:
description: 'JDK version: 8, 11 or 17'
required: true
default: '8'
scala:
description: 'Scala version: 2.12 or 2.13'
required: true
default: '2.12'
failfast:
description: 'Failfast: true or false'
required: true
default: 'true'
num-splits:
description: 'Number of job splits'
required: true
default: '1'

jobs:
matrix-gen:
name: Generate matrix for job splits
runs-on: ubuntu-20.04
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
env:
SPARK_BENCHMARK_NUM_SPLITS: ${{ github.event.inputs.num-splits }}
steps:
- name: Generate matrix
id: set-matrix
run: echo "matrix=["`seq -s, 1 $SPARK_BENCHMARK_NUM_SPLITS`"]" >> $GITHUB_OUTPUT

# Any TPC-DS related updates on this job need to be applied to tpcds-1g job of build_and_test.yml as well
tpcds-1g-gen:
name: "Generate an input dataset for TPCDSQueryBenchmark with SF=1"
if: contains(github.event.inputs.class, 'TPCDSQueryBenchmark') || contains(github.event.inputs.class, '*')
runs-on: ubuntu-20.04
env:
SPARK_LOCAL_IP: localhost
steps:
- name: Checkout Spark repository
uses: actions/checkout@v3
# In order to get diff files
with:
fetch-depth: 0
- name: Cache Scala, SBT and Maven
uses: actions/cache@v3
with:
path: |
build/apache-maven-*
build/scala-*
build/*.jar
~/.sbt
key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 'build/spark-build-info') }}
restore-keys: |
build-
- name: Cache Coursier local repository
uses: actions/cache@v3
with:
path: ~/.cache/coursier
key: benchmark-coursier-${{ github.event.inputs.jdk }}-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
restore-keys: |
benchmark-coursier-${{ github.event.inputs.jdk }}
- name: Cache TPC-DS generated data
id: cache-tpcds-sf-1
uses: actions/cache@v3
with:
path: ./tpcds-sf-1
key: tpcds-${{ hashFiles('.github/workflows/benchmark.yml', 'sql/core/src/test/scala/org/apache/spark/sql/TPCDSSchema.scala') }}
- name: Checkout tpcds-kit repository
if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true'
uses: actions/checkout@v3
with:
repository: databricks/tpcds-kit
ref: 2a5078a782192ddb6efbcead8de9973d6ab4f069
path: ./tpcds-kit
- name: Build tpcds-kit
if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true'
run: cd tpcds-kit/tools && make OS=LINUX
- name: Install Java ${{ github.event.inputs.jdk }}
if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true'
uses: actions/setup-java@v3
with:
distribution: temurin
java-version: ${{ github.event.inputs.jdk }}
- name: Generate TPC-DS (SF=1) table data
if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true'
run: build/sbt "sql/Test/runMain org.apache.spark.sql.GenTPCDSData --dsdgenDir `pwd`/tpcds-kit/tools --location `pwd`/tpcds-sf-1 --scaleFactor 1 --numPartitions 1 --overwrite"

benchmark:
name: "Run benchmarks: ${{ github.event.inputs.class }} (JDK ${{ github.event.inputs.jdk }}, Scala ${{ github.event.inputs.scala }}, ${{ matrix.split }} out of ${{ github.event.inputs.num-splits }} splits)"
if: always()
needs: [matrix-gen, tpcds-1g-gen]
# Ubuntu 20.04 is the latest LTS. The next LTS is 22.04.
runs-on: ubuntu-20.04
strategy:
fail-fast: false
matrix:
split: ${{fromJSON(needs.matrix-gen.outputs.matrix)}}
env:
SPARK_BENCHMARK_FAILFAST: ${{ github.event.inputs.failfast }}
SPARK_BENCHMARK_NUM_SPLITS: ${{ github.event.inputs.num-splits }}
SPARK_BENCHMARK_CUR_SPLIT: ${{ matrix.split }}
SPARK_GENERATE_BENCHMARK_FILES: 1
SPARK_LOCAL_IP: localhost
# To prevent spark.test.home not being set. See more detail in SPARK-36007.
SPARK_HOME: ${{ github.workspace }}
SPARK_TPCDS_DATA: ${{ github.workspace }}/tpcds-sf-1
steps:
- name: Checkout Spark repository
uses: actions/checkout@v3
# In order to get diff files
with:
fetch-depth: 0
- name: Cache Scala, SBT and Maven
uses: actions/cache@v3
with:
path: |
build/apache-maven-*
build/scala-*
build/*.jar
~/.sbt
key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 'build/spark-build-info') }}
restore-keys: |
build-
- name: Cache Coursier local repository
uses: actions/cache@v3
with:
path: ~/.cache/coursier
key: benchmark-coursier-${{ github.event.inputs.jdk }}-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
restore-keys: |
benchmark-coursier-${{ github.event.inputs.jdk }}
- name: Install Java ${{ github.event.inputs.jdk }}
uses: actions/setup-java@v3
with:
distribution: temurin
java-version: ${{ github.event.inputs.jdk }}
- name: Cache TPC-DS generated data
if: contains(github.event.inputs.class, 'TPCDSQueryBenchmark') || contains(github.event.inputs.class, '*')
id: cache-tpcds-sf-1
uses: actions/cache@v3
with:
path: ./tpcds-sf-1
key: tpcds-${{ hashFiles('.github/workflows/benchmark.yml', 'sql/core/src/test/scala/org/apache/spark/sql/TPCDSSchema.scala') }}
- name: Run benchmarks
run: |
dev/change-scala-version.sh ${{ github.event.inputs.scala }}
./build/sbt -Pscala-${{ github.event.inputs.scala }} -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pspark-ganglia-lgpl Test/package
# Make less noisy
cp conf/log4j2.properties.template conf/log4j2.properties
sed -i 's/rootLogger.level = info/rootLogger.level = warn/g' conf/log4j2.properties
# In benchmark, we use local as master so set driver memory only. Note that GitHub Actions has 7 GB memory limit.
bin/spark-submit \
--driver-memory 6g --class org.apache.spark.benchmark.Benchmarks \
--jars "`find . -name '*-SNAPSHOT-tests.jar' -o -name '*avro*-SNAPSHOT.jar' | paste -sd ',' -`" \
"`find . -name 'spark-core*-SNAPSHOT-tests.jar'`" \
"${{ github.event.inputs.class }}"
# Revert to default Scala version to clean up unnecessary git diff
dev/change-scala-version.sh 2.12
# To keep the directory structure and file permissions, tar them
# See also https://github.com/actions/upload-artifact#maintaining-file-permissions-and-case-sensitive-files
echo "Preparing the benchmark results:"
tar -cvf benchmark-results-${{ github.event.inputs.jdk }}-${{ github.event.inputs.scala }}.tar `git diff --name-only` `git ls-files --others --exclude=tpcds-sf-1 --exclude-standard`
- name: Upload benchmark results
uses: actions/upload-artifact@v3
with:
name: benchmark-results-${{ github.event.inputs.jdk }}-${{ github.event.inputs.scala }}-${{ matrix.split }}
path: benchmark-results-${{ github.event.inputs.jdk }}-${{ github.event.inputs.scala }}.tar

Loading
Loading