An experimental DuckDB extension that exposes functionality from DuckDB's native SQL parser.
parser_tools
is a DuckDB extension designed to provide SQL parsing capabilities within the database. It allows you to analyze SQL queries and extract structural information directly in SQL. This extension provides parsing functions for tables, WHERE clauses, function calls, and statements.
- Extract table references from a SQL query with context information (e.g.
FROM
,JOIN
, etc.) - Extract function calls from a SQL query with context information (e.g.
SELECT
,WHERE
,HAVING
, etc.) - Parse WHERE clauses to extract conditions and operators
- Parse multi-statement SQL to extract individual statements or count the number of statements
- Support for window functions, nested functions, and CTEs
- Includes schema, name, and context information for all extractions
- Built on DuckDB's native SQL parser
- Simple SQL interface — no external tooling required
- Only
SELECT
statements are supported for table and function parsing - WHERE clause parsing supports additional statement types
- Full parse tree is not exposed (only specific structural elements)
INSTALL parser_tools FROM community;;
LOAD parser_tools;
SELECT * FROM parse_tables('SELECT * FROM MyTable');
┌─────────┬─────────┬─────────┐
│ schema │ table │ context │
│ varchar │ varchar │ varchar │
├─────────┼─────────┼─────────┤
│ main │ MyTable │ from │
└─────────┴─────────┴─────────┘
This tells you that MyTable
in the main
schema was used in the FROM
clause of the query.
select * from parse_tables('with EarlyAdopters as (select * from Users where id < 10) select * from EarlyAdopters;');
┌─────────┬───────────────┬──────────┐
│ schema │ table │ context │
│ varchar │ varchar │ varchar │
├─────────┼───────────────┼──────────┤
│ │ EarlyAdopters │ cte │
│ main │ Users │ from │
│ main │ EarlyAdopters │ from_cte │
└─────────┴───────────────┴──────────┘
This tells us a few things:
EarlyAdopters
was defined as a CTE.- The
Users
table was referenced in a from clause. EarlyAdopters
was referenced in a from clause (but it's a cte, not a table).
Context helps identify where elements are used in the query.
from
: table in the mainFROM
clausejoin_left
: left side of aJOIN
join_right
: right side of aJOIN
cte
: a Common Table Expression being definedfrom_cte
: usage of a CTE as if it were a tablesubquery
: table reference inside a subquery
select
: function in aSELECT
clausewhere
: function in aWHERE
clausehaving
: function in aHAVING
clauseorder_by
: function in anORDER BY
clausegroup_by
: function in aGROUP BY
clausenested
: function call nested within another function
This extension provides parsing functions for tables, functions, WHERE clauses, and statements. Each category includes both table functions (for detailed results) and scalar functions (for programmatic use).
In general, errors (e.g. Parse Exception) will not be exposed to the user, but instead will result in an empty result. This simplifies batch processing. When validity is needed, is_parsable can be used.
These functions extract function calls from SQL queries, including window functions and nested function calls.
Parses a SQL SELECT
query and returns all function calls along with their context of use (e.g. select
, where
, having
, order_by
, etc.).
SELECT * FROM parse_functions('SELECT upper(name), count(*) FROM users WHERE length(email) > 0;');
A table with:
function_name
: the name of the functionschema
: schema name (default"main"
if unspecified)context
: where the function appears in the query
SELECT * FROM parse_functions($$
SELECT upper(name), count(*)
FROM users
WHERE length(email) > 0
GROUP BY substr(department, 1, 3)
HAVING sum(salary) > 100000
ORDER BY lower(name)
$$);
function_name | schema | context |
---|---|---|
upper | main | select |
count_star | main | select |
length | main | where |
substr | main | group_by |
sum | main | having |
lower | main | order_by |
Returns a list of function names (strings) referenced in the SQL query.
SELECT parse_function_names('SELECT upper(name), lower(email) FROM users;');
----
['upper', 'lower']
A list of strings, each being a function name.
SELECT parse_function_names('SELECT rank() OVER (ORDER BY salary) FROM users;');
----
['rank']
Similar to the table function, but returns a list of structs instead of a result table. Each struct contains:
function_name
(VARCHAR)schema
(VARCHAR)context
(VARCHAR)
SELECT parse_functions('SELECT upper(name), count(*) FROM users;');
----
[{'function_name': upper, 'schema': main, 'context': select}, {'function_name': count_star, 'schema': main, 'context': select}]
A list of STRUCTs with function name, schema, and context.
SELECT list_filter(parse_functions('SELECT upper(name) FROM users WHERE lower(email) LIKE "%@example.com"'), f -> f.context = 'where') AS where_functions;
----
[{'function_name': lower, 'schema': main, 'context': where}]
Parses a SQL SELECT
query and returns all referenced tables along with their context of use (e.g. from
, join_left
, cte
, etc.).
SELECT * FROM parse_tables('SELECT * FROM my_table JOIN other_table USING (id)');
A table with:
schema
: schema name (default"main"
if unspecified)table
: table namecontext
: where the table appears in the query
One of:from
,join_left
,join_right
,from_cte
,cte
,subquery
SELECT * FROM parse_tables($$
WITH cte1 AS (SELECT * FROM x)
SELECT * FROM cte1 JOIN y ON cte1.id = y.id
$$);
schema | table | context |
---|---|---|
cte1 | cte | |
main | x | from |
main | y | join_right |
cte1 | from_cte |
Returns a list of table names (strings) referenced in the SQL query. Can optionally exclude CTE-related references.
SELECT parse_table_names('SELECT * FROM my_table');
----
['my_table']
SELECT parse_table_names('with cte_test as(select 1) select * from MyTable, cte_test', false); -- include CTEs
----
[cte_test, MyTable, cte_test]
A list of strings, each being a table name.
SELECT parse_table_names('SELECT * FROM a JOIN b USING (id)');
----
['a', 'b']
Similar to the table function, but returns a list of structs instead of a result table. Each struct contains:
schema
(VARCHAR)table
(VARCHAR)context
(VARCHAR)
SELECT parse_tables('select * from MyTable');
----
[{'schema': main, 'table': MyTable, 'context': from}]
A list of STRUCTs with schema, table name, and context.
SELECT parse_tables('select * from MyTable t inner join Other o on o.id = t.id');
----
[{'schema': main, 'table': MyTable, 'context': from}, {'schema': main, 'table': Other, 'context': join_right}]
Checks whether a given SQL string is syntactically valid (i.e. can be parsed by DuckDB).
SELECT is_parsable('SELECT * FROM users');
-- true
SELECT is_parsable('SELEKT * FROM users');
-- false
A boolean indicating whether the input SQL string is parsable (true
) or not (false
).
SELECT query, is_parsable(query) AS valid
FROM (VALUES
('SELECT * FROM good_table'),
('BAD SQL SELECT *'),
('WITH cte AS (SELECT 1) SELECT * FROM cte')
) AS t(query);
┌───────────────────────────────────────────────┬────────┐
│ query │ valid │
│ varchar │ boolean│
├───────────────────────────────────────────────┼────────┤
│ SELECT * FROM good_table │ true │
│ BAD SQL SELECT * │ false │
│ WITH cte AS (SELECT 1) SELECT * FROM cte │ true │
└───────────────────────────────────────────────┴────────┘
These functions parse multi-statement SQL strings and extract individual statements or count them.
Parses a SQL string containing multiple statements and returns each statement as a separate row.
SELECT * FROM parse_statements('SELECT 42; SELECT 43;');
A table with:
statement
: the SQL statement text
SELECT * FROM parse_statements($$
SELECT * FROM users WHERE active = true;
INSERT INTO log VALUES ('query executed');
SELECT count(*) FROM transactions;
$$);
statement |
---|
SELECT * FROM users WHERE (active = true) |
INSERT INTO log (VALUES ('query executed')) |
SELECT count_star() FROM transactions |
Returns a list of statement strings from a multi-statement SQL query.
SELECT parse_statements('SELECT 42; SELECT 43;');
----
[SELECT 42, SELECT 43]
A list of strings, each being a SQL statement.
SELECT parse_statements('SELECT 1; INSERT INTO test VALUES (2); SELECT 3;');
----
[SELECT 1, 'INSERT INTO test (VALUES (2))', SELECT 3]
Returns the number of statements in a multi-statement SQL query.
SELECT num_statements('SELECT 42; SELECT 43;');
----
2
An integer count of the number of SQL statements.
SELECT num_statements($$
WITH cte AS (SELECT 1) SELECT * FROM cte;
UPDATE users SET last_seen = now();
SELECT count(*) FROM users;
DELETE FROM temp_data;
$$);
----
4
To build the extension, run:
GEN=ninja make
The main binaries that will be built are:
./build/release/duckdb
./build/release/test/unittest
./build/release/extension/parser_tools/parser_tools.duckdb_extension
duckdb
is the binary for the duckdb shell with the extension code automatically loaded.unittest
is the test runner of duckdb. Again, the extension is already linked into the binary.parser_tools.duckdb_extension
is the loadable binary as it would be distributed.
To run the extension code, simply start the shell with ./build/release/duckdb
(which has the parser_tools extension built-in).
Now we can use the features from the extension directly in DuckDB:
D select * from parse_tables('select * from MyTable');
┌─────────┬─────────┬─────────┐
│ schema │ table │ context │
│ varchar │ varchar │ varchar │
├─────────┼─────────┼─────────┤
│ main │ MyTable │ from │
└─────────┴─────────┴─────────┘
To run the extension dev build from an existing distribution of duckdb (e.g. cli):
$ duckdb -unsigned
D install parser_tools from './build/release/repository/v1.2.1/osx_amd64/parser_tools.duckdb_extension';
D load parser_tools;
D select * from parse_tables('select * from MyTable');
┌─────────┬─────────┬─────────┐
│ schema │ table │ context │
│ varchar │ varchar │ varchar │
├─────────┼─────────┼─────────┤
│ main │ MyTable │ from │
└─────────┴─────────┴─────────┘
See Writing Tests to learn more about duckdb's testing philosophy. To that end, we define tests in sql at: test/sql.
The tests can be run with:
make test
and easily re-ran as changes are made with:
GEN=ninja make && make test