FireDuckExt - DuckDB Extension for Google Cloud Firestore

Query Google Cloud Firestore directly from DuckDB using SQL.

Features

Read data from Firestore collections with firestore_scan()
Insert data from subqueries, CSVs, or tables with firestore_insert()
Update and delete with firestore_update(), firestore_delete()
Batch operations for bulk updates and deletes
Array transforms with firestore_array_union(), firestore_array_remove(), firestore_array_append()
Collection group queries for querying across nested collections
Filter pushdown sends supported WHERE clauses to Firestore for faster queries
Vector embedding support with Firestore vector fields mapped to ARRAY(DOUBLE, N)
DuckDB secret management for secure credential storage

Quick Start

-- Load the extension
LOAD fire_duck_ext;

-- Configure credentials
CREATE SECRET my_firestore (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    SERVICE_ACCOUNT_JSON '/path/to/credentials.json'
);

-- Query a collection
SELECT * FROM firestore_scan('users');

-- Filter with SQL
SELECT __document_id, name, email
FROM firestore_scan('users')
WHERE status = 'active';

-- Insert documents from a subquery
call firestore_insert('users', (
    SELECT 'Alice' AS name, 30 AS age
));

-- Insert with explicit document IDs
call firestore_insert('users',
    (SELECT 'alice123' AS id, 'Alice' AS name, 30 AS age),
    document_id := 'id');

-- Update documents
call firestore_update('users', 'user123', 'status', 'verified');

-- Batch update with DuckDB filtering
SET VARIABLE ids = (
    SELECT list(__document_id)
    FROM firestore_scan('users')
    WHERE status = 'pending'
);
call firestore_update_batch('users', getvariable('ids'), 'status', 'reviewed');

Authentication

Service Account (Recommended for production)

SERVICE_ACCOUNT_JSON accepts either a file path or the full JSON text of a service account key.

File path:

CREATE SECRET prod_firestore (
    TYPE firestore,
    PROJECT_ID 'my-project',
    SERVICE_ACCOUNT_JSON '/path/to/service-account.json'
);

Inline JSON:

CREATE SECRET prod_firestore (
    TYPE firestore,
    PROJECT_ID 'my-project',
    SERVICE_ACCOUNT_JSON '{
        "type": "service_account",
        "project_id": "my-project",
        "private_key_id": "key123abc",
        "private_key": "-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----\n",
        "client_email": "firestore-sa@my-project.iam.gserviceaccount.com",
        "client_id": "123456789",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://oauth2.googleapis.com/token"
    }'
);

API Key (For development/testing)

An API key provides unauthenticated access and is suitable for development, testing, or accessing public Firestore databases. API key auth does not support batchWrite, so batch operations fall back to individual requests.

CREATE SECRET dev_firestore (
    TYPE firestore,
    PROJECT_ID 'my-project',
    API_KEY 'AIzaSyYourApiKeyHere'
);

Environment Variable

# Set the path to your service account JSON file
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

# Then run DuckDB - no secret creation needed!
duckdb

The extension automatically reads GOOGLE_APPLICATION_CREDENTIALS on startup and creates an internal secret that matches all databases (DATABASE '*'), so no CREATE SECRET is needed.

Custom Database

-- Single named database (with service account)
CREATE SECRET my_secret (
    TYPE firestore,
    PROJECT_ID 'my-project',
    SERVICE_ACCOUNT_JSON '/path/to/credentials.json',
    DATABASE 'my-database'
);

-- Single named database (with API key)
CREATE SECRET my_secret (
    TYPE firestore,
    PROJECT_ID 'my-project',
    API_KEY 'AIzaSyYourApiKeyHere',
    DATABASE 'my-database'
);

-- Multiple databases
CREATE SECRET my_secret (
    TYPE firestore,
    PROJECT_ID 'my-project',
    SERVICE_ACCOUNT_JSON '/path/to/credentials.json',
    DATABASES ['(default)', 'my-other-db']
);

-- Wildcard (matches all databases)
CREATE SECRET my_secret (
    TYPE firestore,
    PROJECT_ID 'my-project',
    SERVICE_ACCOUNT_JSON '/path/to/credentials.json',
    DATABASE '*'
);

If DATABASE/DATABASES is omitted, it defaults to (default).

Firebase Emulator

-- Set environment variable first
-- export FIRESTORE_EMULATOR_HOST=localhost:8080

CREATE SECRET emulator (
    TYPE firestore,
    PROJECT_ID 'test-project',
    API_KEY 'fake-key'
);

Functions

Function	Description
`firestore_scan('collection')`	Read all documents from a collection
`firestore_scan('~collection')`	Collection group query (all subcollections)
`firestore_insert('collection', (SELECT ...), document_id := 'col')`	Insert documents from a subquery
`firestore_update('collection', 'doc_id', 'field1', value1, ...)`	Update fields on a single document
`firestore_delete('collection', 'doc_id')`	Delete a document
`firestore_update_batch('collection', ['id1', ...], 'field1', value1, ...)`	Batch update
`firestore_delete_batch('collection', ['id1', ...])`	Batch delete
`firestore_array_union('collection', 'doc_id', 'field', ['v1', ...])`	Add to array (no duplicates)
`firestore_array_remove('collection', 'doc_id', 'field', ['v1', ...])`	Remove from array
`firestore_array_append('collection', 'doc_id', 'field', ['v1', ...])`	Append to array

Insert

firestore_insert takes a collection name and a subquery, and inserts each row as a Firestore document.

-- Auto-generated document IDs
call firestore_insert('users', (
    SELECT name, age FROM read_csv('new_users.csv')
));

-- Explicit document IDs from a column
call firestore_insert('users',
    (SELECT user_id, name, age FROM read_csv('new_users.csv')),
    document_id := 'user_id');

-- Insert from a DuckDB table
call firestore_insert('employees',
    (SELECT * FROM employee_staging),
    document_id := 'emp_id');

-- Insert into nested collections
call firestore_insert('users/user1/notes', (
    SELECT 'note1' AS id, 'Remember to buy milk' AS content
), document_id := 'id');

The document_id parameter specifies which column to use as the Firestore document ID. That column is excluded from the document fields. If omitted, Firestore auto-generates IDs.

Type Mapping

Firestore Type	DuckDB Type
string	VARCHAR
integer	BIGINT
double	DOUBLE
boolean	BOOLEAN
timestamp	TIMESTAMP
array	LIST
map	VARCHAR (JSON string)
vector	ARRAY(DOUBLE, N)
null	NULL
geoPoint	STRUCT(latitude DOUBLE, longitude DOUBLE)
reference	VARCHAR
bytes	BLOB

Vector Embeddings

Firestore vector fields (created via FieldValue.vector()) are mapped to DuckDB's fixed-size ARRAY(DOUBLE, N) type, where N is the vector dimension inferred from the data.

-- Read vectors
SELECT label, vector FROM firestore_scan('embeddings');
-- label: cat, vector: [1.0, 2.0, 3.0]

-- Access individual elements (1-indexed)
SELECT label, vector[1] AS first_dim FROM firestore_scan('embeddings');

-- Write vectors back (preserves Firestore vector format for vector search)
call firestore_update('embeddings', 'emb1',
    'vector', [100.0, 200.0, 300.0]::DOUBLE[3]);

-- Compute distances between vectors
SELECT a.label, b.label,
    sqrt(list_sum(list_transform(
        generate_series(1, 3),
        i -> power(a.vector[i] - b.vector[i], 2)
    ))) AS distance
FROM firestore_scan('embeddings') a, firestore_scan('embeddings') b
WHERE a.label < b.label;

Filter Pushdown

The extension pushes supported WHERE clauses to Firestore's query API to reduce data transfer. Supported filters:

Equality: field = value
Inequality: field != value
Range: field > value, field >= value, field < value, field <= value
IN: field IN ('a', 'b', 'c')
IS NOT NULL: field IS NOT NULL

DuckDB re-applies all filters after the scan for correctness, so unsupported filters (LIKE, IS NULL, OR, etc.) still work -- they just scan all documents first.

Use EXPLAIN to see which filters are pushed down:

EXPLAIN SELECT * FROM firestore_scan('users') WHERE status = 'active' AND age > 25;
-- Shows "Firestore Pushed Filters: status EQUAL 'active', age GREATER_THAN 25"

Missing Documents

By default, firestore_scan() includes "phantom" documents — documents that have no fields but serve as parent paths for subcollections. This matches the behavior of the Firebase Console and is controlled by the show_missing parameter (default: true).

-- Default: includes phantom/missing documents
SELECT * FROM firestore_scan('artifacts/default-app-id/users');

-- Opt out to only return documents with fields
SELECT * FROM firestore_scan('artifacts/default-app-id/users', show_missing := false);

When a collection contains only phantom documents (no fields at all), the result includes just the __document_id column, letting you discover document IDs for navigating into subcollections.

Note: The Firestore Emulator does not support showMissing. The extension detects the emulator automatically and skips the parameter.

Building from Source

Prerequisites

CMake 3.5+
C++17 compiler
vcpkg for dependency management

Build Steps

# Clone the repository
git clone --recurse-submodules https://github.com/yourusername/fire_duck_ext.git
cd fire_duck_ext

# Set up vcpkg
git clone https://github.com/Microsoft/vcpkg.git
./vcpkg/bootstrap-vcpkg.sh
export VCPKG_TOOLCHAIN_PATH=`pwd`/vcpkg/scripts/buildsystems/vcpkg.cmake

# Build
make release

# Run tests
make test

Build Output

./build/release/duckdb                                        # DuckDB shell with extension
./build/release/test/unittest                                 # Test runner
./build/release/extension/fire_duck_ext/fire_duck_ext.duckdb_extension  # Loadable extension

Running Integration Tests

Integration tests require the Firebase Emulator:

# Install Firebase CLI
npm install -g firebase-tools

# Run tests with emulator
firebase emulators:exec --only firestore --project test-project \
    "./test/scripts/run_integration_tests.sh"

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 374 Commits
.github		.github
docs		docs
duckdb @ 6ddac80		duckdb @ 6ddac80
extension-ci-tools @ af154d2		extension-ci-tools @ af154d2
scripts		scripts
src		src
test		test
third_party		third_party
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
description.yml		description.yml
extension_config.cmake		extension_config.cmake
firebase.json		firebase.json
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FireDuckExt - DuckDB Extension for Google Cloud Firestore

Features

Quick Start

Authentication

Service Account (Recommended for production)

API Key (For development/testing)

Environment Variable

Custom Database

Firebase Emulator

Functions

Insert

Type Mapping

Vector Embeddings

Filter Pushdown

Missing Documents

Building from Source

Prerequisites

Build Steps

Build Output

Running Integration Tests

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

BorisBesky/fire_duck_ext

Folders and files

Latest commit

History

Repository files navigation

FireDuckExt - DuckDB Extension for Google Cloud Firestore

Features

Quick Start

Authentication

Service Account (Recommended for production)

API Key (For development/testing)

Environment Variable

Custom Database

Firebase Emulator

Functions

Insert

Type Mapping

Vector Embeddings

Filter Pushdown

Missing Documents

Building from Source

Prerequisites

Build Steps

Build Output

Running Integration Tests

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages