Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc]: Enhance Narwhals Tutorials with Backend-Agnostic Patterns #1696

Open
philip-ndikum opened this issue Jan 1, 2025 · 0 comments · May be fixed by #1704
Open

[Doc]: Enhance Narwhals Tutorials with Backend-Agnostic Patterns #1696

philip-ndikum opened this issue Jan 1, 2025 · 0 comments · May be fixed by #1704

Comments

@philip-ndikum
Copy link

What type of report is this?

Correction

Please describe the issue.

Description

Narwhals tutorials could be significantly enhanced by consolidating backend-agnostic patterns into a single, robust tutorial tailored for enterprise-grade machine learning (ML) and artificial intelligence (AI) workflows. This tutorial would focus on practical, production-ready patterns that simplify common tasks, ensure backend consistency, and align with scalable development workflows.

Key Focus Areas:

  1. Data Validation Patterns:

    • Eager validation for immediate feedback (e.g., numeric and categorical feature validation).
    • Lazy validation for optimized workflows across larger datasets.
  2. Time Series Operations:

    • Group-level metrics such as temporal aggregations (mean, null counts) for time-indexed datasets.
    • Temporal validation for uniqueness, consistency, and handling mixed frequencies.
  3. Feature Engineering:

    • Backend-agnostic numeric and categorical transformations.
    • Patterns for missing value imputation, standardization, and case consistency.

In our package TemporalScope, which leverages Narwhals for model-agnostic explainability in AI/ML workflows, these patterns would be immensely valuable for ensuring robust data preparation and validation. Specifically:

  • Use Case:
    • Validating and transforming features across Pandas, Polars, and Dask backends for explainable ML workflows.
    • Handling time series data in both single-step and multi-step forecasting pipelines.
  • Development Workflow:
    • Lean Main Environment:
      A hatch environment limited to Narwhals, without heavy dependencies like Pandas or Dask.
    • Comprehensive Test Environment:
      A hatch environment including all relevant libraries (Pandas, Polars, Dask) to validate runtime behavior and backend-agnostic patterns.

By integrating these patterns into a single, enterprise-grade tutorial, Narwhals would provide developers with clear, actionable guidance for robust AI/ML workflows [CC @kanenorman].


Suggestion

Create a condensed tutorial notebook that demonstrates these patterns, building directly on the feedback shared:

  • Universal Backend Support:
    Showcase compatibility with Pandas, Polars, and Dask.
  • Core Narwhals Patterns:
    Focus on the use of @nw.narwhalify, lazy/eager evaluation strategies, and backend-agnostic transformations.
  • Production-Ready Use Cases:
    Condense practical examples that are directly applicable to AI/ML pipelines, following @FBruzzesi recommendations (e.g., using pass_through=True or strict=False where necessary).

If you have a suggestion on how it should be, add it below.

No response

philip-ndikum added a commit to philip-ndikum/narwhals that referenced this issue Jan 2, 2025
add tutorial covering:

data validation with eager/lazy evaluation
time series operations and validation
feature engineering with backend-agnostic transformations
environment management for production/testing
closes narwhals-dev#1696
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant