add native duckdb loader with tests by mpr1255 · Pull Request #3035 · saulpw/visidata

mpr1255 · 2026-03-19T09:28:45Z

Problem:

add DuckDB as a native core loader on develop
make the loader submission match the published core loader checklist

Approach:

add a native DuckDB index/table/view loader with exec-sql, saver support, and deferred commit handling
keep focused cmdlog coverage for open, view, exec-sql, PK commits, no-PK safety, alternate schemas, composite PKs, and save roundtrip
make the canonical loader replay open the checked-in sample_data/benchmark.duckdb fixture and save the replayed result to tests/golden/load-duckdb.tsv

Checklist:

followed the core loader checklist: https://www.visidata.org/docs/contributing/#loader
open_duckdb returns the DuckDB sheet
explicit rowtype and # rowdef are present
the duckdb dependency is noted in requirements.txt
a small sample DuckDB dataset is checked into sample_data/
tests/load-duckdb.vd opens the checked-in sample dataset
tests/golden/load-duckdb.tsv is generated from replaying that loader test
dev/formats.jsonl includes the DuckDB section

Risks:

DuckDB cmdlog tests require a Python environment with the duckdb package installed
deferred writes remain intentionally disabled for tables without a primary key

Testing:

PYTHON=.venv/bin/python dev/test.sh load-duckdb load-duckdb-view duckdb-exec-sql duckdb-commit-changes duckdb-commit-no-pk duckdb-open-alt-schema duckdb-commit-composite-pk duckdb-save-sheet
PYTHON=.venv/bin/python dev/test-all.sh tests/test-vdx.sh

mpr1255 · 2026-03-19T09:37:13Z

Hey just a note -- all this was done with codex 4.5xhigh + claude code opus 4.6 max effort. I used gpt 4.5 pro for some reviews. I also have various skills to do code reviews. So I bounced all the models off each other for a while, reviewed various outputs, asked for more investigation and added some additional tests.

This was mainly to solve a pain point for me -- I love visidata and wanted to use it for duckdb files. I have it aliased as 'vd' now and it seems to work great. Hope it's useful and not more AI slop.

CLAassistant · 2026-03-19T18:30:51Z

All committers have signed the CLA.

saulpw · 2026-03-24T04:44:04Z

Thanks, @mpr1255, it's much appreciated! and thank you for the disclosure, great job. I just did a writeup this weekend about this (https://visidata.org/ai), it sounds like this is AI level 6 or 7. Do you understand most of the code and overall technical choices (like you could find your way around without AI, level 6) or did you not even look at the code really (level 7)?

mpr1255 · 2026-03-24T07:52:32Z

Yeah probably closer to 7 than 6, I'd say? I looked at the code but didn't seriously read it. I mostly bossed the LLMs around into doing that. Like I would say "compare it closely to the sqlite loader -- do we differ only where absolutely necessary? tell me how." and then like, "ok is this as simple as it can be? No LLM slop??" [I realise the irony]. Then I would get codex and claude to check each other's work, and sent it to 5.4 Pro for extra AI power.

Here is an example. Claude tells me

_iterfetch() is unnecessary. Since VisiData stores all rows in self.rows anyway, batched fetching saves no memory. result.fetchall() does the same thing in one line. The sqlite loader doesn't batch either — it just yields from the cursor directly.

I'm a bit suss. I feed this to codex to confirm. Spelunks in the codebase and 10 minutes later says:

No, replacing _iterfetch() with fetchall() would be a real regression, not just cleanup. The current helper at visidata/loaders/duckdb.py:45 feeds rows incrementally into VisiData’s load loop at visidata/sheets.py:317, which appends each row into self.rows at visidata/sheets.py:320. yield from result.fetchall() would first materialize the entire result before yielding the first row, so you lose incremental loading and you also defeat the early max_rows cutoff for precious sheets at visidata/sheets.py:318 and visidata/basesheet.py:105.

It is true that VisiData still keeps all rows in memory long-term. self.rows is reset and filled in visidata/sheets.py:314, so steady-state memory is O(n) either way. But _iterfetch() still reduces peak transient memory. I measured a VisiData-like append loop on DuckDB 1.5.0 with 1,000,000 4-int rows: fetchall() peaked at 191.1 MB, fetchmany(1000) at 183.4 MB. That is the extra whole-result Python list from fetchall().

The sqlite comparison is misleading. SQLite can yield rows directly from its cursor at visidata/loaders/sqlite.py:112. DuckDB does not expose that same path here: con.execute() returns a non-iterable connection object in practice, so some adapter loop is needed. DuckDB’s Python API also documents fetchall() as fetching all rows and fetchmany() as fetching the next set: https://duckdb.org/docs/stable/clients/python/reference/

Now -- am I following all the rabbitholes there? No. I am doing a "lgtm" and just trying to ensure the tests pass and that I did the most minimal possible implementation. Also, I used it a bunch and it behaved normally.

BTW big other issue: visidata will fill memory obviously if you try to open a gigantic file. But duckdb seems to behave worse because it shoves things in so fast. So I turned on lazy loading for the main db view (before you click into a table) and recover the metadata duck db surfaces for rows/columns. So you can find out about the tables in your db and not have to calculate them fresh each time. For no memory cost. FYI.

add native duckdb loader with tests

2b647ce

mpr1255 added 2 commits March 19, 2026 10:39

drop gitignore changes from duckdb PR

2a6f1c1

align duckdb loader tests with loader checklist

63ee69d

mpr1255 added 4 commits March 24, 2026 07:48

auto-commit: reconcile working tree before pull

9d75c99

replace duckdb binary fixtures with generated test databases

72c1be8

make duckdb index sheets load child tables lazily

f47a866

restore duckdb index metadata and respect max_rows in SQL

280aeff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add native duckdb loader with tests#3035

add native duckdb loader with tests#3035
mpr1255 wants to merge 7 commits intosaulpw:developfrom
mpr1255:duckdb-loader

mpr1255 commented Mar 19, 2026 •

edited

Loading

Uh oh!

mpr1255 commented Mar 19, 2026

Uh oh!

CLAassistant commented Mar 19, 2026 •

edited

Loading

Uh oh!

saulpw commented Mar 24, 2026

Uh oh!

mpr1255 commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mpr1255 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mpr1255 commented Mar 19, 2026

Uh oh!

CLAassistant commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saulpw commented Mar 24, 2026

Uh oh!

mpr1255 commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mpr1255 commented Mar 19, 2026 •

edited

Loading

CLAassistant commented Mar 19, 2026 •

edited

Loading