Definitions explanation is not clear or understandable (and other suggestions) #25717

cobrienbeam · 2024-11-03T23:55:00Z

What's the issue or suggestion?

A Definitions object is a set of Dagster definitions available and loadable by Dagster tools.

This is a circular sentence. If a definitions object is a set of Dagster definitions available then what are the Dagster definitions and what makes them available vs not available? It's totally unclear.

Additionally, the added explanation does not really help explain:

The Definitions object is used to assign definitions to a code location, and each code location can only have a single Definitions object. This object maps to one code location. With code locations, users isolate multiple Dagster projects from each other without requiring multiple deployments. You’ll learn more about code locations a bit later in this lesson.

What are code locations, and why can they have only a single Definitions object? Okay so the cardinality between Defintions objects and code locations are 1:1, but that doesn't really explain the rest of it.

Additional information

A Definitions object is like a project manifest for Dagster - it bundles together all the assets, jobs, schedules, and other components that make up a single Dagster project. It's like a menu that tells Dagster exactly what's available to run in this specific project. Each separate project (called a code location) needs its own Definitions object, and you can't have multiple Definitions objects in the same location. This setup lets you keep different Dagster projects completely separate from each other, without needing to set up multiple Dagster deployments.

Why do we need this?

Two main reasons:

Project Isolation: Let's say you have two different data projects:

# analytics/definitions.py
defs = Definitions(
    assets=[revenue_dashboard, customer_metrics]
)

# marketing/definitions.py
defs = Definitions(
    assets=[email_campaigns, social_media_stats]
)

Each project has its own Definitions, so they don't interfere with each other.

Discovery: When Dagster starts up, it looks for these Definitions objects to know what assets, jobs, and resources are available to run.

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

cobrienbeam · 2024-11-04T00:37:34Z

Additionally, maybe there could be a link out to a page that discusses the use of projects vs deployments. I like how in the next and react documentation that it links out to different sections to discuss potential tradeoffs of one selection vs another.

In this discussion of when to use additional projects vs additional deployments:

Security/Compliance Requirements:

Company Infrastructure
|── Production Deployment (PCI Compliant)
│ └── Financial Projects
│ |── payment_processing
│ └── customer_billing
│
└── Standard Deployment
|── Marketing Projects
└── Analytics Projects

If some projects need stricter security or compliance requirements (like PCI for payment data), separating them into different deployments helps with compliance.

Resource Isolation:

Infrastructure
|── Heavy Computing Deployment (32 CPU, 128GB RAM)
│ └── ML Training Projects
│ |── model_training
│ └── batch_inference
│
└── Light Computing Deployment (4 CPU, 16GB RAM)
└── ETL Projects
|── daily_reports
└── data_ingestion

When projects have vastly different resource needs, separate deployments prevent resource contention.

Team/Organization Structure:

Company
|── Team A Deployment
│ └── Projects with specific permissions/access
│
└── Team B Deployment
└── Different security groups/access patterns

When teams need complete isolation or different access patterns.

Environment Criticality:

Business Critical Deployment
|── Revenue impacting jobs
└── Customer-facing data pipelines

Non-Critical Deployment
|── Internal analytics
└── Experimental projects

When downtime impact varies significantly between projects.

Scale/Performance:

When you have so many projects that the UI becomes slow
When job runs start queueing too much
When the deployment's database gets too large
The key question is: "Do these projects NEED to be separate?" rather than "CAN they be separate?".

Using a single deployment has the following benefits:

Easier maintenance
Centralized monitoring
Shared resources
Simpler infrastructure

And then provide more information on workspaces using the definitions.py files instead of init.py:

You need to explicitly tell Dagster where to find your definitions through the workspace.yaml file:

load_from:
  - python_file: marketing/definitions.py
    location_name: marketing_tools
  
  - python_file: finance/definitions.py
    location_name: finance_tools

cobrienbeam · 2024-11-04T01:17:16Z

I didn't quite understand the use of the unpacking operator notation in the definition example:

The asterisk * in Python is the "unpacking operator".

# Let's say trip_assets contains these assets:
trip_assets = [taxi_trips, taxi_zones, taxi_trips_file]

# And metric_assets contains:
metric_assets = [revenue_by_day, trips_by_day]

# When you use * it "unpacks" the lists:
defs = Definitions(
    assets=[*trip_assets, *metric_assets]
)

# This is equivalent to writing:
defs = Definitions(
    assets=[
        taxi_trips,
        taxi_zones, 
        taxi_trips_file,
        revenue_by_day,
        trips_by_day
    ]
)

Without the *, you'd get nested lists:

# Without unpacking (WRONG):
defs = Definitions(
    assets=[trip_assets, metric_assets]
)
# This would be like:
assets=[[taxi_trips, taxi_zones], [revenue_by_day]]  # Nested lists!

# With unpacking (CORRECT):
defs = Definitions(
    assets=[*trip_assets, *metric_assets]
)
# This correctly flattens to:
assets=[taxi_trips, taxi_zones, revenue_by_day]  # Flat list!

You'll often see this pattern when you want to combine multiple lists into a single flat list.

It's like saying "take everything out of these lists and put them all together in one new list."

cobrienbeam · 2024-11-04T01:54:32Z

I wish the explanation on os.getenv and EnvVar was a little bit clearer:

With os.getenv:

Start Dagster server
Value of DUCKDB_DATABASE is locked in
Change environment variable
Run asset → still uses old database path
Must restart server to pick up new value

With EnvVar:

Start Dagster server
Run asset → checks DUCKDB_DATABASE value
Change environment variable
Run asset again → uses new database path
No server restart needed!

It's especially useful for:

Switching between development/staging/production databases
Updating API keys
Changing resource configurations without downtime
Testing with different configurations

lydialimlh · 2024-11-04T14:42:59Z

You seem to have understood the unpacking operator of python quite well, you've correctly explained how it works. (I'm just a rando, not from the Dagster team)

cobrienbeam · 2024-11-04T17:27:14Z

You seem to have understood the unpacking operator of python quite well, you've correctly explained how it works. (I'm just a rando, not from the Dagster team)

That was my proposal for the documentation in a callout or side link, etc. regarding the asterisk notation in the example.

cobrienbeam added the area: docs Related to documentation in general label Nov 3, 2024

cobrienbeam changed the title ~~Definitions explanation is not clear or understandable~~ Definitions explanation is not clear or understandable (and other suggestions) Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Definitions explanation is not clear or understandable (and other suggestions) #25717

Definitions explanation is not clear or understandable (and other suggestions) #25717

cobrienbeam commented Nov 3, 2024 •

edited

Loading

cobrienbeam commented Nov 4, 2024 •

edited

Loading

cobrienbeam commented Nov 4, 2024 •

edited

Loading

cobrienbeam commented Nov 4, 2024

lydialimlh commented Nov 4, 2024

cobrienbeam commented Nov 4, 2024

Definitions explanation is not clear or understandable (and other suggestions) #25717

Definitions explanation is not clear or understandable (and other suggestions) #25717

Comments

cobrienbeam commented Nov 3, 2024 • edited Loading

What's the issue or suggestion?

Additional information

Message from the maintainers

cobrienbeam commented Nov 4, 2024 • edited Loading

cobrienbeam commented Nov 4, 2024 • edited Loading

cobrienbeam commented Nov 4, 2024

lydialimlh commented Nov 4, 2024

cobrienbeam commented Nov 4, 2024

cobrienbeam commented Nov 3, 2024 •

edited

Loading

cobrienbeam commented Nov 4, 2024 •

edited

Loading

cobrienbeam commented Nov 4, 2024 •

edited

Loading