Skip to content

Commit

Permalink
[#1824] Merging Delta Connector code into this repo with full commit …
Browse files Browse the repository at this point in the history
…history (#1837)

## Description

In this PR, I am merging the entire codebase of Delta Connectors
repository into the connectors/ subdirectory. I have maintained full
commit history up to current connectors master
(https://github.com/delta-io/connectors/commit/47ae5a3540d3e9400b8140e460b74e09343b0497).

This is the first step the process to unify these 2 repos. See #1824.

## How was this patch tested?
Not yet tested. The test infra is unable to handle a diff of this large
size. I am just merging the code. I have ensured that all the changes in
this PR are in the connectors/ directory (which did not exist before)
and therefore will not affect any existing code. I plan to merge this PR
and then make follow up PRs to get it integrated in the main build and
tested.
  • Loading branch information
tdas committed Jun 20, 2023
2 parents 0bec328 + 6278da5 commit 31cee97
Show file tree
Hide file tree
Showing 1,943 changed files with 303,657 additions and 0 deletions.
16 changes: 16 additions & 0 deletions connectors/.github/workflows/new_pull_request.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: Add new pull requests to Backlog (External)

on:
pull_request_target:
types: [opened, reopened]

jobs:
automate-new-pull-requests:
if: ${{ !contains('allisonport-db dennyglee scottsand-db tdas zsxwing', github.event.sender.login) }}
runs-on: ubuntu-latest
steps:
- uses: alex-page/github-project-automation-plus@v0.8.1
with:
project: oss-delta-prs
column: Needs Review
repo-token: ${{ secrets.PROJECT_BOARD_AUTOMATION_TOKEN }}
19 changes: 19 additions & 0 deletions connectors/.github/workflows/new_updated_issue.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: Add new and updated issues to Needs Review

on:
issues:
types: [opened, reopened]
issue_comment:
types: [created]


jobs:
automate-new-updated-issues:
if: ${{ !github.event.issue.pull_request && !contains('allisonport-db dennyglee scottsand-db tdas zsxwing', github.event.sender.login) }}
runs-on: ubuntu-latest
steps:
- uses: alex-page/github-project-automation-plus@v0.8.1
with:
project: oss-delta-issues
column: Needs Review
repo-token: ${{ secrets.PROJECT_BOARD_AUTOMATION_TOKEN }}
43 changes: 43 additions & 0 deletions connectors/.github/workflows/test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: "Delta Lake Connectors Tests"
on: [push, pull_request]
jobs:
build:
name: "Run tests"
runs-on: ubuntu-20.04
strategy:
matrix:
scala: [2.13.8, 2.12.8, 2.11.12]
steps:
- uses: actions/checkout@v2
- name: install java
uses: actions/setup-java@v2
with:
distribution: 'zulu'
java-version: '8'
- name: Cache Scala, SBT
uses: actions/cache@v2
with:
path: |
~/.sbt
~/.ivy2
~/.cache/coursier
~/.m2
key: build-cache-3-with-scala_${{ matrix.scala }}
- name: Run Scala Style tests on test sources (Scala 2.12 only)
run: build/sbt "++ ${{ matrix.scala }}" testScalastyle
if: startsWith(matrix.scala, '2.12.')
- name: Run sqlDeltaImport tests (Scala 2.12 and 2.13 only)
run: build/sbt "++ ${{ matrix.scala }}" sqlDeltaImport/test
if: ${{ !startsWith(matrix.scala, '2.11.') }}
- name: Run Delta Standalone Compatibility tests (Scala 2.12 only)
run: build/sbt "++ ${{ matrix.scala }}" compatibility/test
if: startsWith(matrix.scala, '2.12.')
- name: Run Delta Standalone tests
run: build/sbt "++ ${{ matrix.scala }}" standalone/test testStandaloneCosmetic/test standaloneParquet/test testParquetUtilsWithStandaloneCosmetic/test
- name: Run Hive 3 tests
run: build/sbt "++ ${{ matrix.scala }}" hiveMR/test hiveTez/test
- name: Run Hive 2 tests
run: build/sbt "++ ${{ matrix.scala }}" hive2MR/test hive2Tez/test
- name: Run Flink tests (Scala 2.12 only)
run: build/sbt -mem 3000 "++ ${{ matrix.scala }}" flink/test
if: ${{ startsWith(matrix.scala, '2.12.') }}
20 changes: 20 additions & 0 deletions connectors/.github/workflows/updated_pull_request.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: Move updated pull requests to Needs Review

on:
issue_comment:
types: [created]
pull_request_target:
types: [synchronize]

jobs:
automate-updated-pull-requests:
if: ${{ (github.event.issue.pull_request || github.event.pull_request) &&
!contains('allisonport-db dennyglee scottsand-db tdas zsxwing', github.event.sender.login) &&
(github.event.pull_request.state == 'open' || github.event.issue.state == 'open') }}
runs-on: ubuntu-latest
steps:
- uses: alex-page/github-project-automation-plus@2af3cf061aeca8ac6ab40a960eee1968a7f9ce0e # TODO: update to use a version after fixes are merged & released
with:
project: oss-delta-prs
column: Needs Review
repo-token: ${{ secrets.PROJECT_BOARD_AUTOMATION_TOKEN }}
110 changes: 110 additions & 0 deletions connectors/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
*#*#
*.#*
*.iml
*.ipr
*.iws
*.pyc
*.pyo
*.swp
*~
.DS_Store
.bsp
.cache
.classpath
.ensime
.ensime_cache/
.ensime_lucene
.generated-mima*
.idea/
.idea_modules/
.project
.pydevproject
.scala_dependencies
.settings
*.pbix
/lib/
R-unit-tests.log
R/unit-tests.out
R/cran-check.out
R/pkg/vignettes/sparkr-vignettes.html
R/pkg/tests/fulltests/Rplots.pdf
build/*.jar
build/apache-maven*
build/scala*
build/zinc*
cache
conf/*.cmd
conf/*.conf
conf/*.properties
conf/*.sh
conf/*.xml
conf/java-opts
conf/slaves
dependency-reduced-pom.xml
derby.log
dev/create-release/*final
dev/create-release/*txt
dev/pr-deps/
dist/
docs/_site
docs/api
sql/docs
sql/site
lib_managed/
lint-r-report.log
log/
logs/
out/
project/boot/
project/build/target/
project/plugins/lib_managed/
project/plugins/project/build.properties
project/plugins/src_managed/
project/plugins/target/
python/lib/pyspark.zip
python/deps
docs/python/_static/
docs/python/_templates/
docs/python/_build/
python/test_coverage/coverage_data
python/test_coverage/htmlcov
python/pyspark/python
reports/
scalastyle-on-compile.generated.xml
scalastyle-output.xml
scalastyle.txt
spark-*-bin-*.tgz
spark-tests.log
src_managed/
streaming-tests.log
target/
unit-tests.log
work/
docs/.jekyll-metadata

# For Hive
TempStatsStore/
metastore/
metastore_db/
sql/hive-thriftserver/test_warehouses
warehouse/
spark-warehouse/

# For R session data
.RData
.RHistory
.Rhistory
*.Rproj
*.Rproj.*

.Rproj.user

**/src/main/resources/js

# For SBT
.jvmopts

# For VS
/.vs
/obj
/bin
8 changes: 8 additions & 0 deletions connectors/AUTHORS
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# This is the official list of the Delta Lake Project Authors for copyright purposes.

# Names should be added to this file as:
# Name or Organization <email address>
# The email address is not required for organizations.

Databricks
Scribd Inc
74 changes: 74 additions & 0 deletions connectors/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
We happily welcome contributions to Delta Lake Connectors. We use [GitHub Issues](/../../issues/) to track community reported issues and [GitHub Pull Requests ](/../../pulls/) for accepting changes.

# Governance
Delta lake governance is conducted by the Technical Steering Committee (TSC), which is currently composed of the following members:
- Michael Armbrust (michael.armbrust@gmail.com)
- Reynold Xin (reynoldx@gmail.com)
- Matei Zaharia (matei@cs.stanford.edu)

The founding technical charter can be found [here](https://delta.io/pdfs/delta-charter.pdf).

# Communication
Before starting work on a major feature, please reach out to us via GitHub, Slack, email, etc. We will make sure no one else is already working on it and ask you to open a GitHub issue.
A "major feature" is defined as any change that is > 100 LOC altered (not including tests), or changes any user-facing behavior.
We will use the GitHub issue to discuss the feature and come to agreement.
This is to prevent your time being wasted, as well as ours.
The GitHub review process for major features is also important so that organizations with commit access can come to agreement on design.
If it is appropriate to write a design document, the document must be hosted either in the GitHub tracking issue, or linked to from the issue and hosted in a world-readable location.
Specifically, if the goal is to add a new extension, please read the extension policy.
Small patches and bug fixes don't need prior communication.

# Coding style
We generally follow the Apache Spark Scala Style Guide.

# Sign your work
The sign-off is a simple line at the end of the explanation for the patch. Your signature certifies that you wrote the patch or otherwise have the right to pass it on as an open-source patch. The rules are pretty simple: if you can certify the below (from developercertificate.org):

```
Developer Certificate of Origin
Version 1.1
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
```

Then you just add a line to every git commit message:

```
Signed-off-by: Joe Smith <joe.smith@email.com>
Use your real name (sorry, no pseudonyms or anonymous contributions.)
```

If you set your `user.name` and `user.email` git configs, you can sign your commit automatically with git commit -s.
Loading

0 comments on commit 31cee97

Please sign in to comment.