Skip to content

Data: CSV Advanced Operations (Low Priority) #64

@jwesleye

Description

@jwesleye

Overview

Add advanced CSV operations that maintain memory efficiency while providing more complex data transformations.

Motivation

We have 18 CSV functions including token-saving inspection tools. These advanced operations would enable more complex workflows while staying memory-efficient for large files.

Proposed Functions

Medium Priority - Data Transformation

  • sort_csv_rows - Sort by column (with sampling strategy for large files)
  • aggregate_csv_column - Compute sum/avg/min/max for column without full load
  • deduplicate_csv_rows - Remove duplicates by column(s) with memory-efficient streaming

Medium Priority - File Operations

  • merge_csv_files - Combine multiple CSV files (horizontal or vertical merge)
  • split_csv_by_column - Split into multiple files based on column value
  • transpose_csv - Swap rows and columns (memory-efficient for large files)

Lower Priority - Advanced Filtering

  • join_csv_files - SQL-style join of two CSV files on key column
  • pivot_csv_data - Create pivot table from CSV data
  • group_csv_rows - Group rows by column value with aggregations

Design Principles

  • Google ADK compliant (JSON-serializable types, no defaults)
  • @strands_tool decorator
  • Memory-efficient (streaming/chunking for large files)
  • Include skip_confirm for file creation operations
  • Consistent with existing CSV tools pattern
  • Avoid loading entire files when possible

Related

  • Extends existing data/csv_tools.py (18 functions)
  • Related to issue Data: Future Enhancement Features #57 (Data Future Enhancements)
  • Complements token-saving tools like select_csv_columns, filter_csv_rows

Module

data/csv_tools.py

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions