Skip to content

Commit

Permalink
Create databricks doc page (#35)
Browse files Browse the repository at this point in the history
* Create databricks doc page
* Fix reference to host and cluster id
* Update ADF docs
* schedule_interval -> schedule
* Remove k8 reference
* Alpha order sidebar
* Update datahub docs
* Fix Alpha order of Notifications
* Change Data Set to dataset

---------

Co-authored-by: Mayra Pena <mayraapena2016@gmail.com>
Co-authored-by: Noel Gomez <noel_gomez@yahoo.com>
  • Loading branch information
3 people authored Sep 11, 2024
1 parent 621a0e7 commit fb316ed
Show file tree
Hide file tree
Showing 25 changed files with 356 additions and 139 deletions.
64 changes: 33 additions & 31 deletions docs/_sidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,31 +13,24 @@
- **Diving Deeper**
- [How to](/how-tos/)
- [Airflow](/how-tos/airflow/)
- [Initial setup](/how-tos/airflow/initial-setup.md)
- [Run dbt](/how-tos/airflow/run-dbt.md)
- [Generate DAGs from yml](/how-tos/airflow/generate-dags-from-yml.md)
- [Calling External Python Scripts](/how-tos/airflow/external-python-dag.md)
- [Use Variables and Connections](/how-tos/airflow/use-variables-and-connections.md)
- [Dynamically Set Schedule](/how-tos/airflow/dynamically-set-schedule.md)
- [Run Airbyte sync jobs](/how-tos/airflow/run-airbyte-sync-jobs.md)
- [Run Fivetran sync jobs](/how-tos/airflow/run-fivetran-sync-jobs.md)
- [Add Dag Documentation](/how-tos/airflow/create-dag-level-docs.md)
- [Send Emails](/how-tos/airflow/send-emails.md)
- [Send Microsoft Teams notifications](/how-tos/airflow/send-ms-teams-notifications.md)
- [Send Slack notifications](/how-tos/airflow/send-slack-notifications.md)
- [Get Current Branch Name from a DAG Task](/how-tos/airflow/get-current-branch-name.md)
- [Custom Worker Environment](/how-tos/airflow/customize-worker-environment.md)
- [Request Memory and CPU](/how-tos/airflow/request-resources-on-workers.md)
- [Sync Airflow database](/how-tos/airflow/sync-database.md)
- [VS Code](/how-tos/vscode/)
- [Initial Configuration](/how-tos/vscode/initial.md)
- [BigQuery](/how-tos/vscode/bigquery_setup.md)
- [Databricks](/how-tos/vscode/databricks_setup.md)
- [Redshift](/how-tos/vscode/redshift_setup.md)
- [Snowflake](/how-tos/vscode/snowflake_setup.md)
- [Override VS Code settings](/how-tos/vscode/override.md)
- [Reset User Env](/how-tos/vscode/reset-user-env.md)
- [Reset Git](how-tos/vscode/reset-git.md)
- [Airflow - Initial setup](/how-tos/airflow/initial-setup.md)
- [Airflow - Sync Airflow database](/how-tos/airflow/sync-database.md)
- [DAGs - Add Dag Documentation](/how-tos/airflow/create-dag-level-docs.md)
- [DAGs - Calling External Python Scripts](/how-tos/airflow/external-python-dag.md)
- [DAGs - Dynamically Set Schedule](/how-tos/airflow/dynamically-set-schedule.md)
- [DAGs - Generate DAGs from yml](/how-tos/airflow/generate-dags-from-yml.md)
- [DAGs - Run ADF Pipelines](/how-tos/airflow/run-adf-pipeline.md)
- [DAGs - Run Airbyte sync jobs](/how-tos/airflow/run-airbyte-sync-jobs.md)
- [DAGs - Run dbt](/how-tos/airflow/run-dbt.md)
- [DAGs - Run Databricks Notebooks](/how-tos/airflow/run-databricks-notebook.md)
- [DAGs - Run Fivetran sync jobs](/how-tos/airflow/run-fivetran-sync-jobs.md)
- [DAGs - Use Variables and Connections](/how-tos/airflow/use-variables-and-connections.md)
- [Git - Get Current Branch Name from a DAG Task](/how-tos/airflow/get-current-branch-name.md)
- [Notifications - Send Emails](/how-tos/airflow/send-emails.md)
- [Notifications - Send Microsoft Teams notifications](/how-tos/airflow/send-ms-teams-notifications.md)
- [Notifications - Send Slack notifications](/how-tos/airflow/send-slack-notifications.md)
- [Worker - Custom Worker Environment](/how-tos/airflow/customize-worker-environment.md)
- [Worker - Request Memory and CPU](/how-tos/airflow/request-resources-on-workers.md)
- [Datacoves](/how-tos/datacoves/)
- [Configure Connection Templates](/how-tos/datacoves/how_to_connection_template.md)
- [Configure Environments](/how-tos/datacoves/how_to_environments.md)
Expand All @@ -48,9 +41,6 @@
- [Configure Service Connections](/how-tos/datacoves/how_to_service_connections.md)
- [Manage Users](/how-tos/datacoves/how_to_manage_users.md)
- [Update Repository](/getting-started/Admin/configure-repository.md)
- [Superset](/how-tos/superset/)
- [Add a Database](/how-tos/superset/how_to_database.md)
- [Add a Data Set](/how-tos/superset/how_to_data_set.md)
- [Datahub](/how-tos/datahub/)
- [Manage datahub using CLI](/how-tos/datahub/how_to_datahub_cli.md)
- [DataOps](/how-tos/dataops/)
Expand All @@ -59,6 +49,18 @@
- [SSH Keys configuration](/how-tos/git/ssh-keys)
- [Snowflake](/how-tos/snowflake/)
- [Warehouses, Schemas and Roles](/how-tos/snowflake/warehouses-schemas-roles)
- [Superset](/how-tos/superset/)
- [Add a Database](/how-tos/superset/how_to_database.md)
- [Add a Dataset](/how-tos/superset/how_to_data_set.md)
- [VS Code](/how-tos/vscode/)
- [Initial Configuration](/how-tos/vscode/initial.md)
- [BigQuery](/how-tos/vscode/bigquery_setup.md)
- [Databricks](/how-tos/vscode/databricks_setup.md)
- [Redshift](/how-tos/vscode/redshift_setup.md)
- [Snowflake](/how-tos/vscode/snowflake_setup.md)
- [Override VS Code settings](/how-tos/vscode/override.md)
- [Reset User Env](/how-tos/vscode/reset-user-env.md)
- [Reset Git](how-tos/vscode/reset-git.md)
- [Explanation](/explanation/)
- [Best Practices](/explanation/best-practices/)
- [Datacoves](/explanation/best-practices/datacoves/)
Expand All @@ -71,7 +73,6 @@
- [Snowflake](/explanation/best-practices/snowflake/)
- [Security Model](/explanation/best-practices/snowflake/security-model)
- [GDPR and Time-Travel](/explanation/best-practices/snowflake/time-travel)
- [Tutorials](/tutorials/)
- [Reference](/reference/)
- [Administration Menu](reference/admin-menu/)
- [Account Settings & Billing](/reference/admin-menu/settings_billing.md)
Expand All @@ -89,11 +90,12 @@
- [Datacoves Operators](/reference/airflow/datacoves-operator.md)
- [Datacoves](/reference/datacoves/)
- [VPC Deployment](/reference/datacoves/vpc-deployment.md)
- [VS Code](/reference/vscode/)
- [Datacoves Environment Variables](/reference/vscode/datacoves-env-vars.md)
- [Metrics & Logs](/reference/metrics-and-logs/)
- [Grafana](/reference/metrics-and-logs/grafana.md)
- [Security](/reference/security/)
- [VS Code](/reference/vscode/)
- [Datacoves Environment Variables](/reference/vscode/datacoves-env-vars.md)
- [Tutorials](/tutorials/)
- **Platform**
- [Status Tracker](https://datacoves.statuspage.io/)
- [SLA](sla.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,4 +65,4 @@ The `visualization` folder is used to place configs related to superset or other
The `visualization/streamlit` folder is used for Streamlit apps. This folder is only needed if using Streamlit.

### .vscode/settings.json
The `.vscode/settings.json` folder is used for customized settings in order to override the default workspace settings. This file can contain secrets so be sure to add it to the `.gitignore` to avoid version control. See our [How to Override default VS Code settings](how-tos/vscode/override.md) for more info
The `.vscode/settings.json` folder is used for customized settings in order to override the default workspace settings. This file can contain secrets so be sure to add it to the `.gitignore` to avoid version control. See our [How to Override default VS sCode settings](how-tos/vscode/override.md) for more info
2 changes: 1 addition & 1 deletion docs/getting-started/Admin/create-account.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,5 +42,5 @@ To ensure a smooth call, please have the answers to the following questions read
- What do you want to call your account? (This is usually the company name)
- What do you want to call your project? (This can be something like Marketing DW, Finance 360, etc)
- Do you currently have a CI/CD process and associated script like GitHub Actions workflow? If not, do you plan on creating a CI/CD process?
- Do you need any specific python library on Airflow or VS code? (outside the standard dbt related items)
- Do you need any specific python library on Airflow or VS Code? (outside the standard dbt related items)

2 changes: 1 addition & 1 deletion docs/getting-started/developer/snowflake-extension.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
- Autocomplete
- Running Queries

For more information, please see the **[Snowflake VSCode Extension Docs](https://docs.snowflake.com/en/user-guide/vscode-ext)**
For more information, please see the **[Snowflake VS Code Extension Docs](https://docs.snowflake.com/en/user-guide/vscode-ext)**

<div style="position: relative; padding-bottom: 56.25%; height: 0;"><iframe src="https://www.loom.com/embed/96272782ea2b4639b8372a0ec85c9268?sid=68867e61-005a-4a6a-9863-0fb3728ef6c2" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
48 changes: 22 additions & 26 deletions docs/how-tos/airflow/customize-worker-environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,8 @@ TRANSFORM_CONFIG = {
spec=k8s.V1PodSpec(
containers=[
k8s.V1Container(
name="transform",
# Replace with your image repo and tag
name="base",
image="<IMAGE REPO>:<IMAGE TAG>",
bash_command="echo SUCCESS!",
)
]
)
Expand All @@ -46,43 +44,41 @@ TRANSFORM_CONFIG = {
schedule_interval="0 0 1 */12 *",
tags=["version_2"],
catchup=False,
yaml_sample_dag={
"schedule_interval": "0 0 1 */12 *",
"tags": ["version_4"],
"catchup": False,
"default_args": {
"start_date": datetime.datetime(2023, 1, 1, 0, 0),
"owner": "airflow",
"email": "some_user@exanple.com",
"email_on_failure": True,
},
},
)
def custommize_worker_dag():
def yaml_teams_dag():
transform = DatacovesBashOperator(
task_id="transform", executor_config=TRANSFORM_CONFIG
task_id="transform",
bash_command="echo SUCCESS!",
executor_config=TRANSFORM_CONFIG,
)


dag = customize_worker_dag()
dag = yaml_teams_dag()
```

### YAML version
In the yml dag you can configure the image.

```yaml
...

# DAG Tasks
nodes:
...
transform:
description: "Sample DAG with custom image"
schedule_interval: "0 0 1 */12 *"
tags:
- version_2
default_args:
start_date: 2023-01-01
owner: Noel Gomez
email: gomezn@example.com
email_on_failure: true
catchup: false

# DAG Tasks
nodes:
transform:
operator: operators.datacoves.bash.DatacovesBashOperator
type: task
config:
# Replace with your custom docker image <IMAGE REPO>:<IMAGE TAG>
image: <IMAGE REPO>:<IMAGE TAG>

bash_command: "echo SUCCESS!"
...

bash_command: "echo SUCCESS!"
```
4 changes: 2 additions & 2 deletions docs/how-tos/airflow/dynamically-set-schedule.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def get_schedule(default_input: Union[str, None]) -> Union[str, None]:
```
**Step 3:** In your DAG, import the `get_schedule` function using `from orchestrate.python_scripts.get_schedule import get_schedule` and pass in your desired schedule.

ie) If your desired schedule is `'0 1 * * *'` then you will set `schedule_interval=get_schedule('0 1 * * *')` as seen in the example below.
ie) If your desired schedule is `'0 1 * * *'` then you will set `schedule=get_schedule('0 1 * * *')` as seen in the example below.
```python
from airflow.decorators import dag
from operators.datacoves.bash import DatacovesBashOperator
Expand All @@ -66,7 +66,7 @@ from orchestrate.python_scripts.get_schedule import get_schedule
# This is a regular CRON schedule. Helpful resources
# https://cron-ai.vercel.app/
# https://crontab.guru/
schedule_interval=get_schedule('0 1 * * *'), # Replace with desired schedule
schedule=get_schedule('0 1 * * *'), # Replace with desired schedule
)
def datacoves_sample_dag():
# Calling dbt commands
Expand Down
2 changes: 1 addition & 1 deletion docs/how-tos/airflow/external-python-dag.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ DATACOVES_VIRTUAL_ENV = "/opt/datacoves/virtualenvs/main/bin/activate"
# This is a regular CRON schedule. Helpful resources
# https://cron-ai.vercel.app/
# https://crontab.guru/
schedule_interval="0 0 1 */12 *",
schedule="0 0 1 */12 *",
)
def datacoves_sample_dag():

Expand Down
4 changes: 2 additions & 2 deletions docs/how-tos/airflow/generate-dags-from-yml.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Let's create our first DAG using YAML.

```yml
description: "Sample DAG for dbt build"
schedule_interval: "0 0 1 */12 *"
schedule: "0 0 1 */12 *"
tags:
- version_2
default_args:
Expand Down Expand Up @@ -127,7 +127,7 @@ from operators.datacoves.dbt import DatacovesDbtOperator
"email_on_failure": True,
},
description="Sample DAG for dbt build",
schedule_interval="0 0 1 */12 *",
schedule="0 0 1 */12 *",
tags=["version_2"],
catchup=False,
)
Expand Down
35 changes: 16 additions & 19 deletions docs/how-tos/airflow/request-resources-on-workers.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ In the following example, we're requesting a minimum of 8Gb of memory and 1000m

```python
import datetime

from airflow.decorators import dag
from kubernetes.client import models as k8s
from operators.datacoves.bash import DatacovesBashOperator

# Configuration for Kubernetes Pod Override with Resource Requests
TRANSFORM_CONFIG = {
"pod_override": k8s.V1Pod(
spec=k8s.V1PodSpec(
Expand All @@ -32,7 +32,6 @@ TRANSFORM_CONFIG = {
),
}


@dag(
default_args={
"start_date": datetime.datetime(2023, 1, 1, 0, 0),
Expand All @@ -41,43 +40,41 @@ TRANSFORM_CONFIG = {
"email_on_failure": True,
},
description="Sample DAG with custom resources",
schedule_interval="0 0 1 */12 *",
schedule="0 0 1 */12 *",
tags=["version_2"],
catchup=False,
yaml_sample_dag={
"schedule_interval": "0 0 1 */12 *",
"tags": ["version_4"],
"catchup": False,
"default_args": {
"start_date": datetime.datetime(2023, 1, 1, 0, 0),
"owner": "airflow",
"email": "some_user@exanple.com",
"email_on_failure": True,
},
},
)
def request_resources_dag():
transform = DatacovesBashOperator(
task_id="transform", executor_config=TRANSFORM_CONFIG
transform_task = DatacovesBashOperator(
task_id="transform",
executor_config=TRANSFORM_CONFIG
)


dag = request_resources_dag()
```

### YAML version
In the yml DAG you can configure the memory and cpu resources.

```yaml
description: "Sample DAG with custom resources"
schedule_interval: "0 0 1 */12 *"
tags:
- version_2
default_args:
start_date: 2023-01-01
owner: Noel Gomez
email: gomezn@example.com
email_on_failure: true
catchup: false

# DAG Tasks
nodes:
...
transform:
operator: operators.datacoves.bash.DatacovesBashOperator
type: task
config:
resources:
memory: 8Gi
cpu: 1000m
...
```
Loading

0 comments on commit fb316ed

Please sign in to comment.