DataLoom

Paper: https://www.vldb.org/pvldb/vol17/p4449-renen.pdf

Introduction

Schema discovery and data loading is an essential part of any data analysis pipeline. In the rapidly evolving fields of machine learning and business intelligence, this task has become frequent yet often underestimated.

Introducing DataLoom, a web-based prototype that automates the tedious aspects of this process using Large Language Models, while leveraging traditional algorithms for well-understood problems, thus streamlining the schema discovery and data loading experience.

Setup

Run requirements.txt in the backend folder for backend python files.
```
pip install -r /path/to/requirements.txt
```
Run in /backend and /frontend:
```
npm install
```
Add a secrets.json with your OpenAI token to the backend folder.

Format of the secrets.json file:
```
{
    "ORG": "",
    "KEY": ""
 }
```
Run the following command to download the required .jar files for profiling in ./backend/.
```
./download.sh
```

Running

Start backend (run in ./backend/):

./node_modules/nodemon/bin/nodemon.js -w src --exec python3 src/server.py

Start frontend (run in ./frontend/):

npm start

Adding new Datasets

Add data files/folders locally.
Add the path to your data files/folders to data-paths.json in frontend/src/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DataLoom

Introduction

Setup

Running

Adding new Datasets

Files

README.md

Latest commit

History

README.md

File metadata and controls

DataLoom

Introduction

Setup

Running

Adding new Datasets