Skip to content

v3.0.0

Latest
Compare
Choose a tag to compare
@auxten auxten released this 09 Jan 09:46
d26d2ce

Try chDB v3.0.0

pip install -U chdb

Major Changes

Performance

  • Some performance improvement (12% faster than DuckDB 1.1.3 and chDB 2.1)
    image

Session Mode and Memory Engine Support

  • Session mode in chDB 3.0 is now equivalent to the new connection-based implementation
  • Improved support for ClickHouse memory engine and stateful operations
  • Enhanced performance for INSERT and UPDATE operations
  • Better handling of background mutation threads and part compaction
  • New support for SQLite-style ":memory:" connection string mode
    • Implemented using ClickHouse memory engine under the hood

New Default Query Mode: Connection-based Queries

  • Connection-based queries are now the recommended and default way to interact with chDB
  • Users should first create a connection using chDB connection or DBAPI
  • Queries can be executed through:
    • Connection with cursor (fetch results via cursor)
    • Direct connection query (simplified method, results returned directly)

New arch chart

What's Changed

Full Changelog: v2.1.1...v3.0.0

Connection and Session Management

Connection String Format

# Supported formats with examples:
":memory:"                                    # In-memory database
"test.db"                                    # Relative path
"file:test.db"                               # Explicit file protocol
"/path/to/test.db"                           # Absolute path
"file:/path/to/test.db"                      # Absolute path with protocol
"file:test.db?param1=value1&param2=value2"   # With query parameters
"file::memory:?verbose&log-level=test"       # In-memory with parameters
"///path/to/test.db?param1=value1"           # Triple slash absolute path

Connection Parameters

  • Parameters in connection strings are passed to ClickHouse engine as startup args
  • Special parameter handling:
    • mode=ro is converted to --readonly=1 for read-only mode
  • For full parameter list, refer to clickhouse local --help --verbose

Session Behavior

  • Sessions maintain query state throughout their lifecycle
  • Only one active session allowed at a time
  • Creating a new session automatically closes any existing session
  • Sessions can be temporary or persistent:
    • Without specified path: Creates auto-cleaned temporary directory
    • With specified path: Maintains persistent database state

Connection States

  • Only one global connection can exist at a time
  • Creating a new connection with the same connection string returns the existing connection
  • Creating a new connection with a different path before closing the existing one will trigger a user warning

API Reference

Detailed API Reference

Connection Class

class connect:
    def __init__(self, path: str = ":memory:"):
        """Initialize connection with optional path parameter"""
        
    def cursor(self) -> cursor:
        """Create and return a new cursor object"""
        
    def execute(self, query_str: str) -> Any:
        """Execute a query directly on the connection"""
        
    def query(self, query_str: str, format: str = "CSV") -> query_result:
        """Execute a query and return a query_result object
        
        Args:
            query_str: SQL query string
            format: Output format (default: "CSV")
        """
        
    def commit(self):
        """Commit current transaction"""
        
    def close(self):
        """Close the connection"""

Cursor Class

class cursor:
    def __init__(self, connection):
        """Initialize cursor with a connection object"""
        
    def execute(self, query: str) -> None:
        """Execute a SQL query"""
        
    def commit(self) -> None:
        """Commit current transaction"""
        
    def close(self) -> None:
        """Close the cursor"""
        
    def get_memview(self) -> memoryview:
        """Get memory view of result data"""
        
    def data_size(self) -> int:
        """Get size of result data in bytes"""
        
    def rows_read(self) -> int:
        """Get number of rows read"""
        
    def bytes_read(self) -> int:
        """Get number of bytes read"""
        
    def elapsed(self) -> float:
        """Get query execution time in seconds"""
        
    def has_error(self) -> bool:
        """Check if there was an error in execution"""
        
    def error_message(self) -> str:
        """Get error message if there was an error"""

Usage Examples

Basic Memory Engine Example

from chdb import connect

conn = connect(":memory:")
cursor = conn.cursor()

# Create a table
cursor.execute("""
    CREATE TABLE users (
        id Int32,
        name String,
        scores Array(UInt8)
    ) ENGINE = Memory
""")

# Insert data
cursor.execute("""
    INSERT INTO users VALUES
    (1, 'Alice', [95, 87, 92]),
    (2, 'Bob', [88, 85, 90]),
    (3, 'Charlie', [91, 89, 94])
""")

# Query data
cursor.execute("SELECT * FROM users WHERE id = 1")
row = cursor.fetchone()

Advanced DBAPI Usage

from chdb import dbapi

# Create connection
conn = dbapi.connect("path/to/db")
cur = conn.cursor()

# Create table
cur.execute("""
    CREATE TABLE rate (
        day Date,
        value Int64
    ) ENGINE = ReplacingMergeTree ORDER BY day
""")

# Insert data (single and multiple rows)
cur.execute("INSERT INTO rate VALUES (%s, %s)", ("2021-01-01", 24))
cur.executemany("INSERT INTO rate VALUES (%s, %s)", [
    ("2021-01-02", 128),
    ("2021-01-03", 256)
])

# Update data
cur.executemany(
    "ALTER TABLE rate UPDATE value = %s WHERE day = %s",
    [(72, "2021-01-02"), (96, "2021-01-03")]
)

# Fetch methods
cur.execute("SELECT value FROM rate ORDER BY day DESC")
row = cur.fetchone()           # Single row
rows = cur.fetchmany(2)        # Multiple rows
all_rows = cur.fetchall()      # All remaining rows

# Clean up
cur.close()
conn.close()

Best Practices

  1. Create a single connection and reuse it throughout your project
  2. Use connection-based queries instead of direct chDB queries
  3. Close existing connections before creating new ones with different paths
  4. Be mindful of thread usage when executing queries

Migration Guide

Users upgrading from pre-2.2 versions should:

  1. Update code to use connection-based queries
  2. Review and update connection string parameters
  3. Update any code relying on the debug output format
  4. Audit thread usage in applications using chDB

For more detailed examples and API usage, refer to:

For binding developers

New API avail:

/**
 * Creates a new chDB connection.
 * Only one active connection is allowed per process.
 * Creating a new connection with different path requires closing existing connection.
 * 
 * @param argc Number of command-line arguments
 * @param argv Command-line arguments array (--path=<db_path> to specify database location)
 * @return Pointer to connection pointer, or NULL on failure
 * @note Default path is ":memory:" if not specified
 */
CHDB_EXPORT struct chdb_conn ** connect_chdb(int argc, char ** argv);

/**
 * Closes an existing chDB connection and cleans up resources.
 * Thread-safe function that handles connection shutdown and cleanup.
 * 
 * @param conn Pointer to connection pointer to close
 */
CHDB_EXPORT void close_conn(struct chdb_conn ** conn);

/**
 * Executes a query on the given connection.
 * Thread-safe function that handles query execution in a separate thread.
 * 
 * @param conn Connection to execute query on
 * @param query SQL query string to execute
 * @param format Output format string (e.g., "CSV", default format)
 * @return Query result structure containing output or error message
 * @note Returns error result if connection is invalid or closed
 */
CHDB_EXPORT struct local_result_v2 * query_conn(struct chdb_conn * conn, const char * query, const char * format);