A Julia package for processing datasets containing multidimensional elements through windowing and dimensionality reduction techniques.
DataTreatments.jl provides tools for working with matrices, or DataFrames where each element is itself a multidimensional object (vectors, matrices, or higher-dimensional arrays). It offers:
- Windowing functions for partitioning multidimensional data
- Dimensionality reduction using feature extraction together with windowing
- Tabular transformation through feature extraction to convert complex multidimensional datasets into flat feature matrices suitable for standard machine learning models
- Data normalization with multiple methods (z-score, min-max, sigmoid, etc.) for preprocessing
- Group-wise operations via
groupbyfor consistent processing across related features - Complete reproducibility by storing all processing parameters and feature metadata
This package is particularly useful when you need to apply traditional ML algorithms that require tabular input to datasets containing structured multidimensional elements like images, spectrograms, or time series segments.
using Pkg
Pkg.add("DataTreatments")using DataTreatments
# Create a dataset with multidimensional elements
Xmatrix = [rand(1:100, 4, 2) for _ in 1:10, _ in 1:5] # 10×5 dataset where each element is a 4×2 matrix
vnames = Symbol.("auto", 1:5)
# Define processing parameters
win = splitwindow(nwindows=2)
features = (mean, std, maximum, minimum)
norm = ZScore
reducefunc = median
# Process for propositional analysis
result = DataTreatment(Xmatrix, :aggregate; vnames, win, features, norm)
# Process for modal analysis
result = DataTreatment(Xmatrix, :reducesize; vnames, win, features, reducefunc, norm)using DataTreatments
using DataFrames
# Create dataset with multidimensional elements
df = DataFrame(
channel1 = [rand(200, 120) for _ in 1:1000],
channel2 = [rand(200, 120) for _ in 1:1000],
channel3 = [rand(200, 120) for _ in 1:1000]
)
# Define processing parameters
win = adaptivewindow(nwindows=6, overlap=0.15)
features = (mean, std, maximum, minimum, median)
norm = PNorm(p=1)
reducefunc = median
# Process for propositional analysis
result = DataTreatment(df, :aggregate; win, features, norm)
# Process for modal analysis
result = DataTreatment(df, :reducesize; win, features, reducefunc, norm)
# Access processed data
X_flat = get_dataset(result) # Flat feature matrix
feature_ids = get_featureid(result) # Feature metadataDataTreatments provides several windowing strategies:
win = splitwindow(nwindows=3)
# Divides data into 3 equal, non-overlapping segmentswin = movingwindow(winsize=50, winstep=25)
# Creates overlapping windows of size 50, advancing by 25 pointswin = adaptivewindow(nwindows=5, overlap=0.2)
# Creates 5 windows with 20% overlap between consecutive windowswin = wholewindow()
# Creates a single window covering the entire dimensionUse the @evalwindow macro to apply window functions to each dimension:
X = rand(200, 120)
# Apply same windowing to all dimensions
intervals = @evalwindow X splitwindow(nwindows=4)
# Apply different windowing per dimension
intervals = @evalwindow X splitwindow(nwindows=4) movingwindow(winsize=40, winstep=20)DataTreatments.jl uses Normalization.jl as its normalization backend.
- Core normalization algorithms are provided by
Normalization.jl. DataTreatments.jlre-exportsfit,fit!,normalize, andnormalize!.DataTreatments.jladds integration for nested/multidimensional dataset elements inext/NormalizationExt.jl.NormSpecprovides a package-local convenience interface for selecting normalization type anddims.
We thank the maintainers and contributors of Normalization.jl for their work and for making this integration possible.
Grouping lets you partition related feature columns (e.g., by variable name, window, or feature) so that operations like normalization are applied with shared coefficients across each group instead of per-column. This preserves consistent scaling for semantically related parts of the dataset.
dt = DataFrame([rand(1:100, 4, 2) for _ in 1:10, _ in 1:5], :auto)
win = splitwindow(nwindows=2)
grp1 = [:x1, :x2]
groups = DataTreatments.groupby(dt, grp1)
dt_norm = DataTreatment(dt, :aggregate; win, features, groups=(:vname,), norm=Scale)A metadata container that stores information about each feature column for reproducibility and feature selection:
# Created automatically by DataTreatment
dt = DataTreatment(df, :reducesize; win=(win,), features=(mean, std, maximum))
# Access feature metadata
feature_ids = get_featureid(dt)
# Each FeatureId contains:
# - vname: Source variable name
# - feat: Feature function applied
# - nwin: Window numberA comprehensive container that stores processed data along with all parameters for full reproducibility:
# Create dataset
df = DataFrame(
channel1 = [rand(200, 120) for _ in 1:1000],
channel2 = [rand(200, 120) for _ in 1:1000],
channel3 = [rand(200, 120) for _ in 1:1000]
)
# Process with full parameter storage
win = adaptivewindow(nwindows=6, overlap=0.15)
features = (mean, std, maximum, minimum, median)
dt = DataTreatment(df, :aggregate; win, features, norm=MinMax)
# Access processed data
X_flat = get_dataset(dt) # Flat feature matrix
feature_ids = get_featureid(dt) # Feature metadata
# All parameters are stored for reproducibility
aggrtype = get_aggrtype(dt) # :aggregate
reduction = get_reducefunc(dt) # mean (default)
var_names = get_vnames(dt) # [:channel1, :channel2, :channel3]
feat_funcs = get_features(dt) # (mean, std, maximum, minimum, median)
n_windows = get_nwindows(dt) # 36- Audio Processing: Extract features from spectrograms for audio classification
- Image Analysis: Process image patches for computer vision tasks
- Time Series: Analyze segmented multivariate time series
- Signal Processing: Extract statistical features from signal windows
- Medical Data: Process multi-channel physiological signals
- Experiment Reproducibility: All parameters stored for exact replication
MIT License
Developed by the ACLAI Lab @ University of Ferrara.