name: Good First Issue
about: A beginner-friendly task perfect for first-time contributors
title: '[GOOD FIRST ISSUE] Enrich Console Output with Summary Statistics'
labels: 'good first issue, enhancement, user-experience'
assignees: ''
Welcome! 👋
This is a beginner-friendly issue perfect for first-time contributors to the Intugle project. We've designed this task to help you get familiar with our codebase while making a meaningful contribution.
Task Description
Enhance the console output during SemanticModel.build() to display rich summary statistics at each stage. Currently, the output shows basic progress messages, but users would benefit from seeing detailed statistics about what was processed.
Current output is basic:
Starting profiling and key identification stage...
Processing dataset: patients
Processing dataset: claims
Profiling and key identification complete.
Starting link prediction stage...
--- Comparing 'patients' <=> 'claims' ---
Found 2 potential link(s).
Link prediction complete.
We want informative summaries:
Starting profiling and key identification stage...
Processing dataset: patients
Processing dataset: claims
Profiling and key identification complete.
📊 Profiling Summary
╭────────────────────────────────────╮
│ Tables Profiled: 2 │
│ Total Columns: 45 │
│ Data Types Identified: 45 │
│ │
│ Distribution: │
│ • Dimensions: 28 (62%) │
│ • Measures: 17 (38%) │
│ │
│ Primary Keys Found: 2 │
╰────────────────────────────────────╯
Starting link prediction stage...
--- Comparing 'patients' <=> 'claims' ---
Found 2 potential link(s).
Link prediction complete.
🔗 Link Prediction Summary
╭────────────────────────────────────╮
│ Links Predicted: 2 │
│ Links Validated: 2 │
│ Success Rate: 100% │
│ │
│ Relationships: │
│ • patients → claims (1-to-many) │
│ • claims → encounters (many-to-1)│
╰────────────────────────────────────╯
This is just an example, feel free to make the stats richer if you have better ideas
Why This Matters
- User Feedback: Users see what's happening under the hood
- Quality Assurance: Statistics help users verify results
- Debugging: Summary info helps identify issues
- Professional: Rich output looks polished and informative
- Transparency: Users understand what the AI models are doing
What You'll Learn
- Using the Rich library for beautiful console output
- Working with Rich Tables, Panels, and formatting
- Aggregating statistics from data structures
- Calculating percentages and distributions
- Formatting numbers and creating visual summaries
Step-by-Step Guide
Prerequisites
Setup Instructions
-
Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/data-tools.git
cd data-tools
-
Create a virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies
-
Create a new branch
git checkout -b feat/enrich-console-output
-
Run a notebook to see current output
jupyter notebook notebooks/quickstart_healthcare.ipynb
# Run through the sm.build() cell to see current output
Implementation Steps
Part 1: Add Profiling Summary
-
Open src/intugle/semantic_model.py
-
After line 70 (end of profile() method), add a summary:
def profile(self, force_recreate: bool = False):
"""Run profiling, datatype identification, and key identification for all datasets."""
console.print(
"Starting profiling and key identification stage...", style="yellow"
)
for dataset in self.datasets.values():
# ... existing code ...
console.print(
"Profiling and key identification complete.", style="bold green"
)
# NEW: Add profiling summary
self._print_profiling_summary()
def _print_profiling_summary(self):
"""Display a summary of profiling results."""
...
Part 2: Add Link Prediction Summary
- After line 85 (end of
predict_links() method), add:
def predict_links(self, force_recreate: bool = False):
"""Run link prediction across all datasets."""
# ... existing code ...
console.print("Link prediction complete.", style="bold green")
# NEW: Add link prediction summary
if hasattr(self, 'link_predictor') and self.links:
self._print_link_prediction_summary()
def _print_link_prediction_summary(self):
"""Display a summary of link prediction results."""
...
Part 3: Add Glossary Generation Summary
- After line 102 (end of
generate_glossary() method), add:
def generate_glossary(self, force_recreate: bool = False):
"""Generate business glossary for all datasets."""
# ... existing code ...
console.print("Business glossary generation complete.", style="bold green")
# NEW: Add glossary summary
self._print_glossary_summary()
def _print_glossary_summary(self):
"""Display a summary of business glossary generation."""
...
Part 4: Add Overall Build Summary (30 min)
- At the end of
build() method (after line 118), add a final summary:
def build(self, force_recreate: bool = False):
"""Run the full end-to-end knowledge building pipeline."""
# ... existing code ...
# NEW: Add final build summary
self._print_build_summary()
return self
def _print_build_summary(self):
"""Display overall build summary."""
...
Files to Modify
- File:
src/intugle/semantic_model.py
- Change: Add 4 new methods for summary display
Testing Your Changes
-
Run a notebook and check output:
jupyter notebook notebooks/quickstart_healthcare.ipynb
# Execute the sm.build() cell and observe rich output
-
Test with different datasets:
# Try with different numbers of tables
python -c "
from intugle import SemanticModel
datasets = {
'patients': {'path': 'sample_data/healthcare/patients.csv', 'type': 'csv'},
'claims': {'path': 'sample_data/healthcare/claims.csv', 'type': 'csv'},
}
sm = SemanticModel(datasets, domain='Healthcare')
sm.build()
"
# Check that statistics are correct
-
Verify calculations:
- Count tables/columns manually
- Verify percentages add up correctly
- Check link counts match reality
-
Run tests:
Example Output
Before:
Starting profiling and key identification stage...
Processing dataset: patients
Processing dataset: claims
Profiling and key identification complete.
After (Just an example):
Starting profiling and key identification stage...
Processing dataset: patients
Processing dataset: claims
Profiling and key identification complete.
╭─────────────── 📊 Profiling Summary ───────────────╮
│ Tables Profiled: 2 │
│ Total Columns: 45 │
│ Data Types Identified: 45 │
│ │
│ Distribution: │
│ • Dimensions: 28 (62.2%) │
│ • Measures: 17 (37.8%) │
│ │
│ Primary Keys Found: 2 │
╰────────────────────────────────────────────────────╯
Submitting Your Work
-
Commit your changes
git add src/intugle/semantic_model.py
git commit -m "feat: Add rich console summaries for profiling, links, and glossary"
-
Push to your fork
git push origin feat/enrich-console-output
-
Create a Pull Request
- Go to the original repository
- Click "Pull Requests" → "New Pull Request"
- Select your branch
- Fill out the PR template
- Include screenshots showing the rich output
- Reference this issue with "Fixes #ISSUE_NUMBER"
Expected Outcome
After running sm.build(), users should see:
- ✅ Rich formatted summary panels
- ✅ Accurate statistics about profiling (tables, columns, types)
- ✅ Data type distribution with percentages
- ✅ Link prediction results with success rate
- ✅ Glossary generation coverage
- ✅ Final build summary with next steps
- ✅ Beautiful formatting using Rich library
Definition of Done
Bonus Enhancements (Optional)
If you want to go further:
- Add emoji indicators (✓, ✗, ⚠) for different states
- Use Rich Tables for more complex summaries
- Add color coding based on quality metrics (green for high coverage, yellow for medium, etc.)
- Show data type breakdown by category (text, numeric, datetime, etc.)
- Add execution time for each stage
- Show cardinality information for relationships
Resources
Need Help?
Don't hesitate to ask questions! We're here to help you succeed.
- Comment below with your questions
- Join our Discord for real-time support
- Tag maintainers: @raphael-intugle (if specific help needed)
Skills You'll Use
Thank you for contributing to Intugle!
Tips for Success:
- Start with Part 1 (profiling) as it's the easiest
- Test after each part to verify statistics are correct
- Use
console.print() with Rich markup for colors
- Take screenshots to show the before/after difference
- Make the output informative but not overwhelming
- Have fun making beautiful terminal output!
name: Good First Issue
about: A beginner-friendly task perfect for first-time contributors
title: '[GOOD FIRST ISSUE] Enrich Console Output with Summary Statistics'
labels: 'good first issue, enhancement, user-experience'
assignees: ''
Welcome! 👋
This is a beginner-friendly issue perfect for first-time contributors to the Intugle project. We've designed this task to help you get familiar with our codebase while making a meaningful contribution.
Task Description
Enhance the console output during
SemanticModel.build()to display rich summary statistics at each stage. Currently, the output shows basic progress messages, but users would benefit from seeing detailed statistics about what was processed.Current output is basic:
We want informative summaries:
This is just an example, feel free to make the stats richer if you have better ideas
Why This Matters
What You'll Learn
Step-by-Step Guide
Prerequisites
Setup Instructions
Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/data-tools.git cd data-toolsCreate a virtual environment
Install dependencies
pip install -e ".[dev]"Create a new branch
Run a notebook to see current output
jupyter notebook notebooks/quickstart_healthcare.ipynb # Run through the sm.build() cell to see current outputImplementation Steps
Part 1: Add Profiling Summary
Open
src/intugle/semantic_model.pyAfter line 70 (end of
profile()method), add a summary:Part 2: Add Link Prediction Summary
predict_links()method), add:Part 3: Add Glossary Generation Summary
generate_glossary()method), add:Part 4: Add Overall Build Summary (30 min)
build()method (after line 118), add a final summary:Files to Modify
src/intugle/semantic_model.pyTesting Your Changes
Run a notebook and check output:
jupyter notebook notebooks/quickstart_healthcare.ipynb # Execute the sm.build() cell and observe rich outputTest with different datasets:
Verify calculations:
Run tests:
Example Output
Before:
After (Just an example):
Submitting Your Work
Commit your changes
git add src/intugle/semantic_model.py git commit -m "feat: Add rich console summaries for profiling, links, and glossary"Push to your fork
Create a Pull Request
Expected Outcome
After running
sm.build(), users should see:Definition of Done
Bonus Enhancements (Optional)
If you want to go further:
Resources
Need Help?
Don't hesitate to ask questions! We're here to help you succeed.
Skills You'll Use
Thank you for contributing to Intugle!
Tips for Success:
console.print()with Rich markup for colors