name: Good First Issue
about: A beginner-friendly task perfect for first-time contributors
title: '[GOOD FIRST ISSUE] Add Docstrings to DataProduct Class Methods'
labels: 'good first issue, documentation, enhancement'
assignees: ''
Welcome! 👋
This is a beginner-friendly issue perfect for first-time contributors to the Intugle project. We've designed this task to help you get familiar with our codebase while making a meaningful contribution.
Task Description
Add comprehensive docstrings to methods in the DataProduct class located in src/intugle/data_product.py. This is the second most important user-facing API that users interact with to create data products from the semantic layer.
Methods like __init__(), load_all(), plot_graph(), and plot_sources_graph() are missing docstrings entirely.
Why This Matters
The DataProduct class enables users to generate SQL queries and create data products by simply selecting attributes across tables. Good documentation here helps users:
- Understand Capabilities: Know what data products can do
- Generate Queries: Learn how to build queries from the semantic layer
- Visualize Relationships: Use graph plotting features
- Debug Issues: Understand what each method does under the hood
What You'll Learn
- Writing documentation for data transformation APIs
- Understanding SQL generation and ETL concepts
- Documenting query building workflows
- Explaining graph-based join optimization
Step-by-Step Guide
Prerequisites
Setup Instructions
-
Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/data-tools.git
cd data-tools
-
Create a virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies
-
Create a new branch
git checkout -b docs/add-data-product-docstrings
Implementation Steps
-
Open the file src/intugle/data_product.py
-
Add docstring to __init__() method (line 24):
- Explain that DataProduct loads the semantic model from YAML files
- Document the
models_dir_path parameter
- Mention what gets initialized (manifest, field_details, links, join optimizer)
- Add example
-
Add docstring to load_all() method (line 104):
- Explain that it loads all datasets from the manifest
- Mention this is called automatically during initialization
- Document any side effects
-
Add docstring to plot_graph() method (line 264):
- Explain what graph it plots (table relationships)
- Document the
graph parameter
- Mention visualization requirements (matplotlib, graphviz, etc.)
-
Add docstring to plot_sources_graph() method (line 267):
- Explain it visualizes all tables and their relationships
- Mention difference from
plot_graph() (shows all vs specific)
- Add example
Files to Modify
- File:
src/intugle/data_product.py
- Change: Add comprehensive docstrings to missing methods
- Line(s): 24 (init), 104 (load_all), 264 (plot_graph), 267 (plot_sources_graph)
Example Code
def plot_graph(self, graph):
"""
Plot a specific relationship graph.
Visualizes table relationships as a network graph, showing tables as nodes
and foreign key relationships as edges.
Args:
graph: NetworkX graph object containing table relationships to visualize.
Typically obtained from the join optimizer.
Example:
>>> dp = DataProduct()
>>> # Get graph for specific tables
>>> graph = dp.join.generate_graph(["patients", "claims"])
>>> dp.plot_graph(graph)
Note:
Requires matplotlib and graphviz to be installed for visualization.
The graph is displayed inline in Jupyter notebooks or saved to a file
in other contexts.
"""
Testing Your Changes
-
Verify docstrings render correctly:
from intugle import DataProduct
help(DataProduct)
help(DataProduct.__init__)
help(DataProduct.plot_sources_graph)
-
Test in a Jupyter notebook:
- Docstrings should appear as tooltips with
Shift+Tab
- Examples should be copy-pasteable
-
Run tests:
Submitting Your Work
Please run the following command to automatically fix linting issues before committing: ruff check --fix .
-
Commit your changes
git add src/intugle/data_product.py
git commit -m "Add comprehensive docstrings to DataProduct methods"
-
Push to your fork
git push origin docs/add-data-product-docstrings
-
Create a Pull Request
- Go to the original repository
- Click "Pull Requests" → "New Pull Request"
- Select your branch
- Fill out the PR template
- Reference this issue with "Fixes #ISSUE_NUMBER"
Expected Outcome
The DataProduct class should have clear docstrings for all methods that:
- Explain purpose and use cases
- Document parameters and return values
- Include practical examples
- Mention prerequisites and requirements
Definition of Done
Resources
Need Help?
Don't hesitate to ask questions! We're here to help you succeed.
- Comment below with your questions
- Join our Discord for real-time support
- Tag maintainers: @raphael-intugle (if specific help needed)
Skills You'll Use
Thank you for contributing to Intugle!
Tips for Success:
- Look at notebooks to see how DataProduct is used in practice
- The methods with good docstrings (plan, build, generate_query) are good references
- Focus on explaining WHY users would call each method
- Include practical examples from real use cases
- Have fun! 🎉
name: Good First Issue
about: A beginner-friendly task perfect for first-time contributors
title: '[GOOD FIRST ISSUE] Add Docstrings to DataProduct Class Methods'
labels: 'good first issue, documentation, enhancement'
assignees: ''
Welcome! 👋
This is a beginner-friendly issue perfect for first-time contributors to the Intugle project. We've designed this task to help you get familiar with our codebase while making a meaningful contribution.
Task Description
Add comprehensive docstrings to methods in the
DataProductclass located insrc/intugle/data_product.py. This is the second most important user-facing API that users interact with to create data products from the semantic layer.Methods like
__init__(),load_all(),plot_graph(), andplot_sources_graph()are missing docstrings entirely.Why This Matters
The
DataProductclass enables users to generate SQL queries and create data products by simply selecting attributes across tables. Good documentation here helps users:What You'll Learn
Step-by-Step Guide
Prerequisites
Setup Instructions
Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/data-tools.git cd data-toolsCreate a virtual environment
Install dependencies
pip install -e ".[dev]"Create a new branch
Implementation Steps
Open the file
src/intugle/data_product.pyAdd docstring to
__init__()method (line 24):models_dir_pathparameterAdd docstring to
load_all()method (line 104):Add docstring to
plot_graph()method (line 264):graphparameterAdd docstring to
plot_sources_graph()method (line 267):plot_graph()(shows all vs specific)Files to Modify
src/intugle/data_product.pyExample Code
Testing Your Changes
Verify docstrings render correctly:
Test in a Jupyter notebook:
Shift+TabRun tests:
Submitting Your Work
Commit your changes
git add src/intugle/data_product.py git commit -m "Add comprehensive docstrings to DataProduct methods"Push to your fork
Create a Pull Request
Expected Outcome
The
DataProductclass should have clear docstrings for all methods that:Definition of Done
__init__()method with exampleload_all()methodplot_graph()methodplot_sources_graph()method with examplehelp()functionResources
Need Help?
Don't hesitate to ask questions! We're here to help you succeed.
Skills You'll Use
Thank you for contributing to Intugle!
Tips for Success: