DOLPHY csv cleaning DSL

a YAML-based domain specific language for cleaning csv data through a web interface. write cleaning commands in plain english ( no need for shift pressing except brackets) .

What this

Dolphy lets you clean csv files by writing simple commands in a yaml list. you type the commands in a yaml interface, upload your csv, and get a clean file back. no python, no pandas, no code - ehh maybe some code.

Design Principle

the dsl is designed so that if you are lazy with python or sql then try out this cool joke yaml language. Write it in whatever case you want I do not care, or may be I might care if some bugs popup which I will have to fix.

Getting Started

commands are written as a yaml list and pasted into the web interface:

- filter where age gt 30
- drop any null rows
- fill null in column salary using mean
- normalise columns name using lower

upload your csv, paste your commands, and click run.

Command Reference

filtering rows

filter rows using conditions. supports word-based operators so you never need symbols.

- filter where age gt 30
- filter where salary lt 70000
- filter where name eq charlie
- filter where age gte 18
- filter where score lte 100
- filter where status neq inactive

operators:

word	symbol	meaning
gt	>	greater than
lt	<	less than
gte	>=	greater than or equal
lte	<=	less than or equal
eq	==	equals
neq	!=	not equal

Combining conditions:

- filter where age gt 30 and salary lt 70000
- filter where city eq london or city eq paris
- filter where age gt 18 and name eq charlie and city eq berlin

Using brackets for complex conditions:

brackets are the only time you need the shift key.

- filter where (age gt 30 and salary lt 70000) or (city eq berlin and name eq charlie)

Dropping null and nan rows

- drop any null rows
- drop all null rows
- drop any nan rows
- drop null and nan rows
- drop rows where age is null
- drop rows where salary is nan
- drop rows where age, salary is null

Filling null values

- fill null using mean
- fill null using median
- fill null using mode
- fill null using forward
- fill null using backward
- fill null in column age using mean
- fill null in column salary, age using median
- fill null in column status using active

for a custom fill value (like a specific word or number), just write it after using.

Transformations

Apply mathematical transformations to numeric columns.

- transform columns age using log
- transform columns salary using sqrt
- transform columns score using square
- transform columns age using reciprocal
- transform columns age using yeojohnson
- transform columns age, salary using log inplace

without inplace, a new column is created (e.g. age_log). with inplace, the original column is replaced.

Normalisation

Numeric Normalisation:

- normalise columns age using z-score
- normalise columns salary using min-max
- normalise columns score using robust

String normalisation:

- normalise columns name using lower
- normalise columns city using upper
- normalise columns name using strip
- normalise columns category using label encoding

Renaming Columns

- rename age to years
- rename first name, last name to fname, lname

column names with spaces are supported — just write them as-is and separate multiple columns with commas.

Using the web app

go to the web app url
upload your csv file
write your commands as a yaml list in the editor
click run
preview the cleaned data
download the result

Full example

Given a csv with columns: age, salary, name, city

- drop any null rows
- filter where age gt 18 and salary gt 30000
- filter where (city eq london or city eq berlin) and name neq unknown
- fill null in column salary using mean
- normalise columns name using lower
- normalise columns name using strip
- normalise columns age using z-score
- transform columns salary using log inplace
- rename age to years

Notes

commands are case-insensitive — FILTER WHERE and filter where both work
column names are case-sensitive and must match the csv header exactly
string values in filter conditions do not need quotes
use commas to separate multiple columns in a single command
the only time you need the shift key is for brackets in complex filter conditions

Tech Stack

polars — fast dataframe processing with lazy evaluation
pydantic — column validation before execution
pyyaml — yaml command parsing
python ast — filter expression parsing

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
cleanerDSL		cleanerDSL
dolphy-app		dolphy-app
dolphy-backend		dolphy-backend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DOLPHY csv cleaning DSL

What this

Design Principle

Getting Started

Command Reference

filtering rows

Dropping null and nan rows

Filling null values

Transformations

Normalisation

Renaming Columns

Using the web app

Full example

Notes

Tech Stack

About

Uh oh!

Releases

Packages

Languages

License

joelkurien/dolphy

Folders and files

Latest commit

History

Repository files navigation

DOLPHY csv cleaning DSL

What this

Design Principle

Getting Started

Command Reference

filtering rows

Dropping null and nan rows

Filling null values

Transformations

Normalisation

Renaming Columns

Using the web app

Full example

Notes

Tech Stack

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages