[Doc]: Enhance Narwhals Tutorials with Backend-Agnostic Patterns #1696

philip-ndikum · 2025-01-01T14:38:23Z

What type of report is this?

Correction

Please describe the issue.

Description

Narwhals tutorials could be significantly enhanced by consolidating backend-agnostic patterns into a single, robust tutorial tailored for enterprise-grade machine learning (ML) and artificial intelligence (AI) workflows. This tutorial would focus on practical, production-ready patterns that simplify common tasks, ensure backend consistency, and align with scalable development workflows.

Key Focus Areas:

Data Validation Patterns:
- Eager validation for immediate feedback (e.g., numeric and categorical feature validation).
- Lazy validation for optimized workflows across larger datasets.
Time Series Operations:
- Group-level metrics such as temporal aggregations (mean, null counts) for time-indexed datasets.
- Temporal validation for uniqueness, consistency, and handling mixed frequencies.
Feature Engineering:
- Backend-agnostic numeric and categorical transformations.
- Patterns for missing value imputation, standardization, and case consistency.

In our package TemporalScope, which leverages Narwhals for model-agnostic explainability in AI/ML workflows, these patterns would be immensely valuable for ensuring robust data preparation and validation. Specifically:

Use Case:
- Validating and transforming features across Pandas, Polars, and Dask backends for explainable ML workflows.
- Handling time series data in both single-step and multi-step forecasting pipelines.
Development Workflow:
- Lean Main Environment:
  A hatch environment limited to Narwhals, without heavy dependencies like Pandas or Dask.
- Comprehensive Test Environment:
  A hatch environment including all relevant libraries (Pandas, Polars, Dask) to validate runtime behavior and backend-agnostic patterns.

By integrating these patterns into a single, enterprise-grade tutorial, Narwhals would provide developers with clear, actionable guidance for robust AI/ML workflows [CC @kanenorman].

Suggestion

Create a condensed tutorial notebook that demonstrates these patterns, building directly on the feedback shared:

Universal Backend Support:
Showcase compatibility with Pandas, Polars, and Dask.
Core Narwhals Patterns:
Focus on the use of @nw.narwhalify, lazy/eager evaluation strategies, and backend-agnostic transformations.
Production-Ready Use Cases:
Condense practical examples that are directly applicable to AI/ML pipelines, following @FBruzzesi recommendations (e.g., using pass_through=True or strict=False where necessary).

If you have a suggestion on how it should be, add it below.

No response

The text was updated successfully, but these errors were encountered:

add tutorial covering: data validation with eager/lazy evaluation time series operations and validation feature engineering with backend-agnostic transformations environment management for production/testing closes narwhals-dev#1696

philip-ndikum linked a pull request Jan 2, 2025 that will close this issue

docs(machine_learning_patterns.md): add enterprise ml patterns tutorial (#1696) #1704

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc]: Enhance Narwhals Tutorials with Backend-Agnostic Patterns #1696

[Doc]: Enhance Narwhals Tutorials with Backend-Agnostic Patterns #1696

philip-ndikum commented Jan 1, 2025

[Doc]: Enhance Narwhals Tutorials with Backend-Agnostic Patterns #1696

[Doc]: Enhance Narwhals Tutorials with Backend-Agnostic Patterns #1696

Comments

philip-ndikum commented Jan 1, 2025

What type of report is this?

Please describe the issue.

Description

Key Focus Areas:

Suggestion

If you have a suggestion on how it should be, add it below.