Added REST API as a data source #1

GeorgelPreput · 2024-03-05T16:20:18Z

As per the recent Champions call, I've adapted my existing REST API data source to the PySpark Data Sources framework.

Currently supports as options:

Sending headers
No authentication, basic authentication, bearer token (via headers) and OAUTH2
Both GET and POST
Sending parameters
Sending text data
Sending JSON data

Output dataframe contains both the contents of the request, as well as the actual response, be that text data, JSON data or an error.

Known issues:

Having the protocol (http/https) in the call to .load("https://url-goes-here") makes Spark directly assume that the source should be Delta (at least in Databricks). Therefore, the URL passed must not have the protocol, and it can be added as an optional parameter: .option("protocol", "http"). The implicit default for this option is https.
Since both the request and the response get saved, a further # TODO might be to try to redact any tokens and client secrets from the output. Will add that upon request.

allisonwang-db

This is awesome! Thanks for adding it 👍

pyspark_datasources/restapi.py

GeorgelPreput · 2024-03-12T13:27:45Z

Is there anything else I should add / modify?

Victor Blaga added 2 commits March 5, 2024 15:55

Downgraded requests version to match DBR 14.3 LTS

853dc53

Added REST API as a data source

231d725

allisonwang-db reviewed Mar 6, 2024

View reviewed changes

pyspark_datasources/restapi.py Show resolved Hide resolved

pyspark_datasources/restapi.py Outdated Show resolved Hide resolved

pyspark_datasources/restapi.py Outdated Show resolved Hide resolved

Victor Blaga added 3 commits March 7, 2024 12:01

PR issues addressed.

3860d98

Extra test for POST request

7b61b38

Updated tests

5bb930b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added REST API as a data source #1

Added REST API as a data source #1

GeorgelPreput commented Mar 5, 2024 •

edited

Loading

allisonwang-db left a comment

GeorgelPreput commented Mar 12, 2024

Added REST API as a data source #1

Are you sure you want to change the base?

Added REST API as a data source #1

Conversation

GeorgelPreput commented Mar 5, 2024 • edited Loading

allisonwang-db left a comment

Choose a reason for hiding this comment

GeorgelPreput commented Mar 12, 2024

GeorgelPreput commented Mar 5, 2024 •

edited

Loading