Skip to content

rllm-project/tlsql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TLSQL

Design SQL-like API for Table Learning

A Python library that converts custom SQL-like statements into standard SQL queries for machine learning workflows


Python License


About

TLSQL is a Python library that converts custom SQL-like statements into standard SQL queries for machine learning workflows. TLSQL supports three types of statements for machine learning workflows. Each statement has a specific syntax and purpose.

Dataset Splitting

TLSQL uses three statements to divide your dataset into training, validation, and test sets:

  • TRAIN WITH: Specifies the training set - data used to train the model
  • PREDICT VALUE: Specifies the test set - data used for final evaluation (prediction targets)
  • VALIDATE WITH: Specifies the validation set - data used for model selection and hyperparameter tuning

TLSQL Syntax

1. TRAIN WITH Statement

The TRAIN WITH statement specifies which columns and tables to use for training data, along with optional filtering conditions. This statement defines the dataset used to train your machine learning model.

Syntax

TRAIN WITH (column_selectors)
FROM Tables(table1, table2, ...)
[WHERE conditions]

Examples

TRAIN WITH (users.*, movies.*, ratings.*)
FROM Tables(users, movies, ratings)
WHERE users.Gender='M' AND movies.Year >= 2000

2. PREDICT VALUE Statement

The PREDICT VALUE statement specifies the target column for prediction and the task type (classification or regression). This statement defines the test set - the data for which you want to make predictions. The WHERE clause in this statement filters which rows are included in the test set.

Syntax

PREDICT VALUE(table.column, TASK_TYPE)
FROM table
[WHERE conditions]

Task Types

  • CLF: Classification task - predicts discrete categories
  • REG: Regression task - predicts continuous values

Examples

PREDICT VALUE(users.Age, CLF)
FROM users
WHERE users.Gender='F' OR users.userID IN (1,2,3,4,5)

3. VALIDATE WITH Statement

The VALIDATE WITH statement specifies validation data with the same syntax as TRAIN WITH. This statement defines the validation set used for model selection. If omitted, the pipeline will use k-fold cross-validation on the training data.

Syntax

VALIDATE WITH (column_selectors)
FROM Tables(table1, table2, ...)
[WHERE conditions]

Examples

VALIDATE WITH (users.*, movies.*, ratings.*)
FROM Tables(users, movies, ratings)
WHERE users.Gender='M' AND movies.Year < 2000

Supported Operators

Comparison Operators

  • =, !=, <>, <, <=, >, >=

Logical Operators

  • AND, OR, NOT

Special Operators

  • IN (value1, value2, ...): Check if value is in list
  • BETWEEN value1 AND value2: Range check

About

Table Learning Structured Query Language

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages