Skip to content

Commit

Permalink
Merge pull request #171 from teamclairvoyant/REST-45-RB
Browse files Browse the repository at this point in the history
[REST-45] Added code for supporting compression in response body
  • Loading branch information
rahulbhatia023 authored Jan 5, 2024
2 parents 38c3fd1 + 967c905 commit 291f1c2
Show file tree
Hide file tree
Showing 17 changed files with 336 additions and 60 deletions.
12 changes: 6 additions & 6 deletions site/docs/config_classes/checkpoint_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ trigger a checkpoint.

The checkpoint configuration contains below config options to be provided by the user:

| Config Name | Mandatory | Default Value | Description |
|:-------------|:---------:|:-------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| name | Yes | - | Unique name for your checkpoint |
| token | No | - | Token request configuration represented by `TokenConfig` class |
| data | Yes | - | Main data request configuration represented by `DataConfig` class |
| sparkConfigs | No | - | Map of spark configurations specific to the checkpoint. <br>If the same config is also present in `application.conf` file, then checkpoint specific config gets the priority. |
| Config Name | Mandatory | Default Value | Description |
|:-------------|:---------:|:-------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| name | Yes | - | Unique name for your checkpoint |
| token | No | - | Token request configuration represented by `TokenConfig` class |
| data | Yes | - | Main data request configuration represented by `DataConfig` class |
| sparkConfigs | No | - | Map of spark configurations specific to the checkpoint. <br/>If the same config is also present in `application.conf` file, then checkpoint specific config gets the priority. |

User can provide checkpoint configuration file in HOCON format in the below format:

Expand Down
2 changes: 1 addition & 1 deletion site/docs/config_classes/data_response_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ User need to provide below configs for Data Response Configuration:

| Config Name | Mandatory | Default Value | Description |
|:----------------|:---------:|:-------------:|:----------------------------------------------------------------------------------------------------|
| body | Yes | - | The body config represented by `DataResponseBodyConfig` |
| body | Yes | - | The body config represented by `RestonomerResponseBody`. |
| transformations | No | - | List of transformations to be applied on the restonomer response dataframe |
| persistence | Yes | - | The persistence attribute that tells where to persist the transformed restonomer response dataframe |

Expand Down
3 changes: 2 additions & 1 deletion site/docs/persistence/redshift_persistence.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,8 @@ User can pass below options to the `RedshiftWriterOptions` instance:
<td>No</td>
<td>-</td>
<td>
<p>A description for the table. Will be set using the SQL COMMENT command, and should show up in most query tools. See also the description metadata to set descriptions on individual columns.
<p>A description for the table. Will be set using the SQL COMMENT command, and should show up in most query tools. See also the description metadata to set descriptions on individual columns.</p>
</td>
</tr>
<tr>
<td>pre-actions</td>
Expand Down
39 changes: 38 additions & 1 deletion site/docs/response_body/text/csv_text.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,47 @@ data = {
}
```

## Compression

In case the csv text that is returned by the api is compressed, user can configure the checkpoint in below format:

```hocon
name = "checkpoint_csv_response_dataframe_converter"
data = {
data-request = {
url = "http://localhost:8080/csv-response-converter"
}
data-response = {
body = {
type = "Text"
compression = "GZIP"
text-format = {
type = "CSVTextFormat"
sep = ";"
}
}
persistence = {
type = "LocalFileSystem"
file-format = {
type = "ParquetFileFormat"
}
file-path = "/tmp/response_body"
}
}
}
```

As of now, restonomer supports only `GZIP` compression format.

## CSV Text Format Configurations

Just like `sep`, user can configure below other properties for CSV text format that will help restonomer for parsing:

| Parameter Name | Default Value | Description |
| :-------------------------------- | :-------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|:----------------------------------|:---------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| char-to-escape-quote-escaping | \ | Sets a single character used for escaping the escape for the quote character. |
| column-name-of-corrupt-record | _corrupt_record | Allows renaming the new field having malformed string created by PERMISSIVE mode. <br/>This overrides `spark.sql.columnNameOfCorruptRecord`. |
| comment | # | Sets a single character used for skipping lines beginning with this character. |
Expand Down
41 changes: 39 additions & 2 deletions site/docs/response_body/text/html_text.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# HTML Table

Restonomer can parse the api response of text type in HTML table format. User need to configure the checkpoint in below format:
Restonomer can parse the api response of text type in HTML table format. User need to configure the checkpoint in below
format:

```hocon
name = "checkpoint_html_response_dataframe_converter"
Expand Down Expand Up @@ -29,8 +30,44 @@ data = {
}
```

## Compression

In case the html text that is returned by the api is compressed, user can configure the checkpoint in below format:

```hocon
name = "checkpoint_html_response_dataframe_converter"
data = {
data-request = {
url = "http://localhost:8080/html-response-converter"
}
data-response = {
body = {
type = "Text"
compression = "GZIP"
text-format = {
type = "HTMLTableTextFormat"
}
}
persistence = {
type = "LocalFileSystem"
file-format = {
type = "ParquetFileFormat"
}
file-path = "/tmp/response_body"
}
}
}
```

As of now, restonomer supports only `GZIP` compression format.

## HTML Table Text Format Configurations

User can configure below other properties for HTML text format that will help restonomer for parsing:

| Parameter Name | Default Value | Mandatory | Description |
| :------------- | :-----------: | :-------: | :---------------------------------------------------------------------------- |
|:---------------|:-------------:|:---------:|:------------------------------------------------------------------------------|
| tableName | None | No | The name of the table in the `table` tag that you want to read the data from. |
45 changes: 42 additions & 3 deletions site/docs/response_body/text/json_text.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# JSON

Restonomer can parse the api response of text type in JSON format. User need to configure the checkpoint in below format:
Restonomer can parse the api response of text type in JSON format. User need to configure the checkpoint in below
format:

```hocon
name = "checkpoint_json_response_dataframe_converter"
Expand Down Expand Up @@ -30,10 +31,48 @@ data = {
}
```

Just like `primitives-as-string`, user can configure below other properties for JSON text format that will help restonomer for parsing:
## Compression

In case the json text that is returned by the api is compressed, user can configure the checkpoint in below format:

```hocon
name = "checkpoint_json_response_dataframe_converter"
data = {
data-request = {
url = "http://localhost:8080/json-response-converter"
}
data-response = {
body = {
type = "Text"
compression = "GZIP"
text-format = {
type = "JSONTextFormat"
primitives-as-string = true
}
}
persistence = {
type = "LocalFileSystem"
file-format = {
type = "ParquetFileFormat"
}
file-path = "/tmp/response_body"
}
}
}
```

As of now, restonomer supports only `GZIP` compression format.

## JSON Text Format Configurations

Just like `primitives-as-string`, user can configure below other properties for JSON text format that will help
restonomer for parsing:

| Parameter Name | Default Value | Description |
| :------------------------------------- | :-------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------- |
|:---------------------------------------|:---------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|
| allow-backslash-escaping-any-character | false | Allows accepting quoting of all character using backslash quoting mechanism. |
| allow-comments | false | Ignores Java/C++ style comment in JSON records. |
| allow-non-numeric-numbers | true | Allows JSON parser to recognize set of “Not-a-Number” (NaN) tokens as legal floating number values. |
Expand Down
42 changes: 40 additions & 2 deletions site/docs/response_body/text/xml_text.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,48 @@ data = {
}
```

Just like `row-tag`, user can configure below other properties for XML text format that will help restonomer for parsing:
## Compression

In case the xml text that is returned by the api is compressed, user can configure the checkpoint in below format:

```hocon
name = "checkpoint_xml_response_dataframe_converter"
data = {
data-request = {
url = "http://localhost:8080/xml-response-converter"
}
data-response = {
body = {
type = "Text"
compression = "GZIP"
text-format = {
type = "XMLTextFormat"
row-tag = "ROW"
}
}
persistence = {
type = "LocalFileSystem"
file-format = {
type = "ParquetFileFormat"
}
file-path = "/tmp/response_body"
}
}
}
```

As of now, restonomer supports only `GZIP` compression format.

## XML Text Format Configurations

Just like `row-tag`, user can configure below other properties for XML text format that will help restonomer for
parsing:

| Parameter Name | Default Value | Description |
| :---------------------------- | :-----------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|:------------------------------|:-------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| attribute-prefix | _ | The prefix for attributes so that we can differentiate attributes and elements. |
| charset | UTF-8 | Defaults to 'UTF-8' but can be set to other valid charset names. |
| column-name-of-corrupt-record | _corrupt_record | Allows renaming the new field having malformed string created by PERMISSIVE mode. This overrides spark.sql.columnNameOfCorruptRecord. |
Expand Down
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[
{
"Product line": "Aparatos de cuidado personal",
"Seller": "Amazon.es",
"Tracking ID": "eselmundo-21",
"Date shipped": "October 30, 2021",
"Price": 41.24,
"Referral fee rate": "8.00%",
"Quantity": 1,
"Revenue": 41.24,
"Earnings": 3.3,
"Sub Tag": "-"
},
{
"Product line": "Aparatos &amp; cuidado personal",
"Seller": "Amazon.es",
"Tracking ID": "eselmundo-21",
"Date shipped": "October 30, 2021",
"Price": 82.3,
"Referral fee rate": "8.00%",
"Quantity": 1,
"Revenue": 82.3,
"Earnings": 4.5,
"Sub Tag": "-"
}
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"request": {
"method": "GET",
"url": "/gzip-csv-response-converter"
},
"response": {
"status": 200,
"bodyFileName": "sample_gzip_csv_response_body.gz"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name = "checkpoint_gzip_csv_response_dataframe_converter"

data = {
data-request = {
url = "http://localhost:8080/gzip-csv-response-converter"
}

data-response = {
body = {
type = "Text"
text-format = {
type = "CSVTextFormat"
}
compression = "GZIP"
}

persistence = {
type = "LocalFileSystem"
file-format = {
type = "ParquetFileFormat"
}
file-path = "/tmp/response_body"
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,6 @@ data = {
}

data-response = {
body = {
type = "JSON"
primitives-as-string = true
}

body = {
type = "Text"
text-format = {
Expand Down
Loading

0 comments on commit 291f1c2

Please sign in to comment.