Try chDB v3.0.0
pip install -U chdb
Major Changes
Performance
Session Mode and Memory Engine Support
- Session mode in chDB 3.0 is now equivalent to the new connection-based implementation
- Improved support for ClickHouse memory engine and stateful operations
- Enhanced performance for INSERT and UPDATE operations
- Better handling of background mutation threads and part compaction
- New support for SQLite-style ":memory:" connection string mode
- Implemented using ClickHouse memory engine under the hood
New Default Query Mode: Connection-based Queries
- Connection-based queries are now the recommended and default way to interact with chDB
- Users should first create a connection using chDB connection or DBAPI
- Queries can be executed through:
- Connection with cursor (fetch results via cursor)
- Direct connection query (simplified method, results returned directly)
New arch chart
What's Changed
- Fix upload pypi and github release by @auxten in #278
- Update PyPi info by @auxten in #279
- Add citation just like ClickHouse, as chDB is mentioned in ClickHouse Paper by @auxten in #280
- SQLite like API mode of chDB by @auxten in #283
- Skip dup call build.sh in setup.py by @auxten in #285
- Fix starving caused by GIL by @auxten in #289
- Refactor query execution to use connection object by @auxten in #290
- Patchset 3.0 by @auxten in #294
Full Changelog: v2.1.1...v3.0.0
Connection and Session Management
Connection String Format
# Supported formats with examples:
":memory:" # In-memory database
"test.db" # Relative path
"file:test.db" # Explicit file protocol
"/path/to/test.db" # Absolute path
"file:/path/to/test.db" # Absolute path with protocol
"file:test.db?param1=value1¶m2=value2" # With query parameters
"file::memory:?verbose&log-level=test" # In-memory with parameters
"///path/to/test.db?param1=value1" # Triple slash absolute path
Connection Parameters
- Parameters in connection strings are passed to ClickHouse engine as startup args
- Special parameter handling:
mode=ro
is converted to--readonly=1
for read-only mode
- For full parameter list, refer to
clickhouse local --help --verbose
Session Behavior
- Sessions maintain query state throughout their lifecycle
- Only one active session allowed at a time
- Creating a new session automatically closes any existing session
- Sessions can be temporary or persistent:
- Without specified path: Creates auto-cleaned temporary directory
- With specified path: Maintains persistent database state
Connection States
- Only one global connection can exist at a time
- Creating a new connection with the same connection string returns the existing connection
- Creating a new connection with a different path before closing the existing one will trigger a user warning
API Reference
Detailed API Reference
Connection Class
class connect:
def __init__(self, path: str = ":memory:"):
"""Initialize connection with optional path parameter"""
def cursor(self) -> cursor:
"""Create and return a new cursor object"""
def execute(self, query_str: str) -> Any:
"""Execute a query directly on the connection"""
def query(self, query_str: str, format: str = "CSV") -> query_result:
"""Execute a query and return a query_result object
Args:
query_str: SQL query string
format: Output format (default: "CSV")
"""
def commit(self):
"""Commit current transaction"""
def close(self):
"""Close the connection"""
Cursor Class
class cursor:
def __init__(self, connection):
"""Initialize cursor with a connection object"""
def execute(self, query: str) -> None:
"""Execute a SQL query"""
def commit(self) -> None:
"""Commit current transaction"""
def close(self) -> None:
"""Close the cursor"""
def get_memview(self) -> memoryview:
"""Get memory view of result data"""
def data_size(self) -> int:
"""Get size of result data in bytes"""
def rows_read(self) -> int:
"""Get number of rows read"""
def bytes_read(self) -> int:
"""Get number of bytes read"""
def elapsed(self) -> float:
"""Get query execution time in seconds"""
def has_error(self) -> bool:
"""Check if there was an error in execution"""
def error_message(self) -> str:
"""Get error message if there was an error"""
Usage Examples
Basic Memory Engine Example
from chdb import connect
conn = connect(":memory:")
cursor = conn.cursor()
# Create a table
cursor.execute("""
CREATE TABLE users (
id Int32,
name String,
scores Array(UInt8)
) ENGINE = Memory
""")
# Insert data
cursor.execute("""
INSERT INTO users VALUES
(1, 'Alice', [95, 87, 92]),
(2, 'Bob', [88, 85, 90]),
(3, 'Charlie', [91, 89, 94])
""")
# Query data
cursor.execute("SELECT * FROM users WHERE id = 1")
row = cursor.fetchone()
Advanced DBAPI Usage
from chdb import dbapi
# Create connection
conn = dbapi.connect("path/to/db")
cur = conn.cursor()
# Create table
cur.execute("""
CREATE TABLE rate (
day Date,
value Int64
) ENGINE = ReplacingMergeTree ORDER BY day
""")
# Insert data (single and multiple rows)
cur.execute("INSERT INTO rate VALUES (%s, %s)", ("2021-01-01", 24))
cur.executemany("INSERT INTO rate VALUES (%s, %s)", [
("2021-01-02", 128),
("2021-01-03", 256)
])
# Update data
cur.executemany(
"ALTER TABLE rate UPDATE value = %s WHERE day = %s",
[(72, "2021-01-02"), (96, "2021-01-03")]
)
# Fetch methods
cur.execute("SELECT value FROM rate ORDER BY day DESC")
row = cur.fetchone() # Single row
rows = cur.fetchmany(2) # Multiple rows
all_rows = cur.fetchall() # All remaining rows
# Clean up
cur.close()
conn.close()
Best Practices
- Create a single connection and reuse it throughout your project
- Use connection-based queries instead of direct chDB queries
- Close existing connections before creating new ones with different paths
- Be mindful of thread usage when executing queries
Migration Guide
Users upgrading from pre-2.2 versions should:
- Update code to use connection-based queries
- Review and update connection string parameters
- Update any code relying on the debug output format
- Audit thread usage in applications using chDB
For more detailed examples and API usage, refer to:
For binding developers
New API avail:
/**
* Creates a new chDB connection.
* Only one active connection is allowed per process.
* Creating a new connection with different path requires closing existing connection.
*
* @param argc Number of command-line arguments
* @param argv Command-line arguments array (--path=<db_path> to specify database location)
* @return Pointer to connection pointer, or NULL on failure
* @note Default path is ":memory:" if not specified
*/
CHDB_EXPORT struct chdb_conn ** connect_chdb(int argc, char ** argv);
/**
* Closes an existing chDB connection and cleans up resources.
* Thread-safe function that handles connection shutdown and cleanup.
*
* @param conn Pointer to connection pointer to close
*/
CHDB_EXPORT void close_conn(struct chdb_conn ** conn);
/**
* Executes a query on the given connection.
* Thread-safe function that handles query execution in a separate thread.
*
* @param conn Connection to execute query on
* @param query SQL query string to execute
* @param format Output format string (e.g., "CSV", default format)
* @return Query result structure containing output or error message
* @note Returns error result if connection is invalid or closed
*/
CHDB_EXPORT struct local_result_v2 * query_conn(struct chdb_conn * conn, const char * query, const char * format);
- Please see programs/local/chdb.h