Merge pull request #63 from seapagan/add-list-etc

seapagan · Jan 28, 2025 · a23821d · a23821d
2 parents 13ed6fd + 105bfbb
commit a23821d
Show file tree

Hide file tree

Showing 17 changed files with 900 additions and 90 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -12,7 +12,7 @@ repos:
       - id: end-of-file-fixer
 
   - repo: https://github.com/renovatebot/pre-commit-hooks
-    rev: 39.133.3
+    rev: 39.137.2
     hooks:
       - id: renovate-config-validator
         files: ^renovate\.json$

diff --git a/README.md b/README.md
@@ -19,8 +19,7 @@ time).
 The ideal use case is more for Python CLI tools that need to store data in a
 database-like format without needing to learn SQL or use a full ORM.
 
-Full documentation is available on the [Documentation
-Website](https://sqliter.grantramsay.dev)
+Full documentation is available on the [Website](https://sqliter.grantramsay.dev)
 
 > [!CAUTION]
 > This project is still in the early stages of development and is lacking some
@@ -29,11 +28,6 @@ Website](https://sqliter.grantramsay.dev)
 > minimum and the releases and documentation will be very clear about any
 > breaking changes.
 >
-> Also, structures like `list`, `dict`, `set` etc are not supported **at this
-> time** as field types, since SQLite does not have a native column type for
-> these. This is the **next planned enhancement**. These will need to be
-> `pickled` first then stored as a BLOB in the database.
->
 > See the [TODO](TODO.md) for planned features and improvements.
 
 - [Features](#features)
@@ -46,7 +40,8 @@ Website](https://sqliter.grantramsay.dev)
 ## Features
 
 - Table creation based on Pydantic models
-- Supports `date` and `datetime` fields. List/Dict/Set fields are planned.
+- Supports `date` and `datetime` fields
+- Support for complex data types (`list`, `dict`, `set`, `tuple`) stored as BLOBs
 - Automatic primary key generation
 - User defined indexes on any field
 - Set any field as UNIQUE

diff --git a/TODO.md b/TODO.md
@@ -13,13 +13,16 @@ Items marked with :fire: are high priority.
   data.
 - add more tests where 'auto_commit' is set to False to ensure that commit is
   not called automatically.
-- :fire: support structures like, `list`, `dict`, `set`, `tuple` etc. in the
-  model. These will need to be `pickled` first then stored as a BLOB in the
-  database
-- :fire: similarly - perhaps add a `JSON` field type to allow storing JSON data
-  in a field, and an `Object` field type to allow storing arbitrary Python
-  objects? Perhaps a `Binary` field type to allow storing arbitrary binary data?
-  (just uses the existing `bytes` mapping but more explicit)
+- :fire: perhaps add a `JSON` field type to allow storing JSON data in a field,
+  and an `Object` field type to allow storing arbitrary Python objects? Perhaps
+  a `Binary` field type to allow storing arbitrary binary data? (just uses the
+  existing `bytes` mapping but more explicit)
+- Consider performance optimizations for field validation:
+  - Benchmark shows ~50% overhead for field assignments with validation
+  - Potential solutions:
+    - Add a "fast mode" configuration option
+    - Create bulk update methods that temporarily disable validation
+    - Optimize validation for specific field types
 - on update, check if the model has actually changed before sending the update
   to the database. This will prevent unnecessary updates and leave the
   `updated_at` correct. However, this will always require a query to the

diff --git a/demo.py b/demo.py
@@ -25,6 +25,8 @@ class UserModel(BaseDBModel):
     name: str
     content: Optional[str]
     admin: bool = False
+    list_of_str: list[str]
+    a_set: set[str]
 
     class Meta:
         """Override the table name for the UserModel."""
@@ -49,16 +51,22 @@ def main() -> None:
             name="John Doe",
             content="This is information about John Doe.",
             admin=True,
+            list_of_str=["a", "b", "c"],
+            a_set={"x", "y", "z"},
         )
         user2 = UserModel(
             slug="jdoe2",
             name="Jane Doe",
             content="This is information about Jane Doe.",
+            list_of_str=["x", "y", "z"],
+            a_set={"linux", "mac", "windows"},
         )
         user3 = UserModel(
             slug="jb",
             name="Yogie Bear",
             content=None,
+            list_of_str=[],
+            a_set={"apple", "banana", "cherry"},
         )
         try:
             db.insert(user1)

diff --git a/docs/guide/fields.md b/docs/guide/fields.md
@@ -53,3 +53,45 @@ This is exactly the same as using the `fields()` method with a single field, but
 very specific and obvious. **There is NO equivalent argument to this in the
 `select()` method**. An exception **WILL** be raised if you try to use this method
 with more than one field.
+
+## Complex Data Types
+
+SQLiter supports storing complex Python data types in the database. The following types are supported:
+
+- `list[T]`: Lists of any type T
+- `dict[K, V]`: Dictionaries with keys of type K and values of type V
+- `set[T]`: Sets of any type T
+- `tuple[T, ...]`: Tuples of any type T
+
+These types are automatically serialized and stored as BLOBs in the database. Here's an example of using complex types:
+
+```python
+from typing import Any
+from sqliter import Model
+
+class UserPreferences(Model):
+    tags: list[str] = []  # List of string tags
+    metadata: dict[str, Any] = {}  # Dictionary with string keys and any value type
+    friends: set[int] = set()  # Set of user IDs
+    coordinates: tuple[float, float] = (0.0, 0.0)  # Tuple of two floats
+
+# Create and save an instance
+prefs = UserPreferences(
+    tags=["python", "sqlite", "orm"],
+    metadata={"theme": "dark", "notifications": True},
+    friends={1, 2, 3},
+    coordinates=(51.5074, -0.1278)
+)
+prefs.save()
+
+# Query and use the complex types
+loaded_prefs = UserPreferences.get(prefs.id)
+print(loaded_prefs.tags)  # ['python', 'sqlite', 'orm']
+print(loaded_prefs.metadata["theme"])  # 'dark'
+print(1 in loaded_prefs.friends)  # True
+print(loaded_prefs.coordinates)  # (51.5074, -0.1278)
+```
+
+The complex types are automatically validated using Pydantic's type system, ensuring that only values of the correct type can be stored. When querying, the values are automatically deserialized back into their original Python types.
+
+Note that since these types are stored as BLOBs, you cannot perform SQL operations on their contents (like searching or filtering). If you need to search or filter based on these values, you should consider storing them in a different format or in separate tables.
diff --git a/docs/guide/models.md b/docs/guide/models.md
@@ -37,6 +37,8 @@ the table.
 
 The following field types are currently supported:
 
+Basic Types:
+
 - `str`
 - `int`
 - `float`
@@ -45,9 +47,14 @@ The following field types are currently supported:
 - `datetime`
 - `bytes`
 
-More field types are planned for the near future, since I have the
-serialization/ deserialization locked in. This will include `list`, `dict`,
-`set`, and possibly `JSON` and `Object` fields.
+Complex Types:
+
+- `list[T]` - Lists of any type T
+- `dict[K, V]` - Dictionaries with keys of type K and values of type V
+- `set[T]` - Sets of any type T
+- `tuple[T, ...]` - Tuples of any type T
+
+Complex types are automatically serialized and stored as BLOBs in the database. For more details on using complex types, see the [Fields Guide](fields.md#complex-data-types).
 
 ### Adding Indexes
 

diff --git a/docs/index.md b/docs/index.md
@@ -27,11 +27,6 @@ database-like format without needing to learn SQL or use a full ORM.
 > minimum and the releases and documentation will be very clear about any
 > breaking changes.
 >
-> Also, structures like `list`, `dict`, `set` etc are not supported **at this
-> time** as field types, since SQLite does not have a native column type for
-> these. This is the **next planned enhancement**. These will need to be
-> `pickled` first then stored as a BLOB in the database.
->
 > See the [TODO](todo/index.md) for planned features and improvements.
 
 ## Features

diff --git a/pyproject.toml b/pyproject.toml
@@ -105,6 +105,7 @@ lint.ignore = [
   "FBT002",
   "FBT003",
   "B006",
+  "S301",   # in this library we use 'pickle' for saving and loading list etc
 ] # These rules are too strict even for us 😝
 lint.extend-ignore = [
   "COM812",

diff --git a/requirements-dev.txt b/requirements-dev.txt
@@ -36,7 +36,7 @@ mkdocs-material==9.5.49
 mkdocs-material-extensions==1.3.1
 mkdocs-minify-plugin==0.8.0
 mock==5.1.0
-mypy==1.14.0
+mypy==1.14.1
 mypy-extensions==1.0.0
 nodeenv==1.9.1
 packaging==24.2
@@ -71,7 +71,7 @@ regex==2024.11.6
 requests==2.32.3
 rich==13.9.4
 rtoml==0.12.0
-ruff==0.8.4
+ruff==0.9.3
 shellingham==1.5.4
 simple-toml-settings==0.8.0
 six==1.17.0

diff --git a/sqliter/constants.py b/sqliter/constants.py
@@ -38,4 +38,8 @@
     bytes: "BLOB",
     datetime.datetime: "INTEGER",  # Store as Unix timestamp
     datetime.date: "INTEGER",  # Store as Unix timestamp
+    list: "BLOB",
+    dict: "BLOB",
+    set: "BLOB",
+    tuple: "BLOB",
 }
diff --git a/sqliter/model/model.py b/sqliter/model/model.py
@@ -10,6 +10,7 @@
 from __future__ import annotations
 
 import datetime
+import pickle
 import re
 from typing import (
     Any,
@@ -58,7 +59,7 @@ class BaseDBModel(BaseModel):
     model_config = ConfigDict(
         extra="ignore",
         populate_by_name=True,
-        validate_assignment=False,
+        validate_assignment=True,
         from_attributes=True,
     )
 
@@ -181,7 +182,9 @@ def serialize_field(cls, value: SerializableField) -> SerializableField:
         """
         if isinstance(value, (datetime.datetime, datetime.date)):
             return to_unix_timestamp(value)
-        return value  # Return value as-is for non-datetime fields
+        if isinstance(value, (list, dict, set, tuple)):
+            return pickle.dumps(value)
+        return value  # Return value as-is for other fields
 
     # Deserialization after fetching from the database
 
@@ -205,12 +208,31 @@ def deserialize_field(
             A datetime or date object if the field type is datetime or date,
             otherwise returns the value as-is.
         """
-        field_type = cls.__annotations__.get(field_name)
+        if value is None:
+            return None
 
-        if field_type in (datetime.datetime, datetime.date) and isinstance(
-            value, int
+        # Get field type if it exists in model_fields
+        field_info = cls.model_fields.get(field_name)
+        if field_info is None:
+            # If field doesn't exist in model, return value as-is
+            return value
+
+        field_type = field_info.annotation
+
+        if (
+            isinstance(field_type, type)
+            and issubclass(field_type, (datetime.datetime, datetime.date))
+            and isinstance(value, int)
         ):
             return from_unix_timestamp(
                 value, field_type, localize=return_local_time
             )
-        return value  # Return value as-is for non-datetime fields
+
+        origin_type = get_origin(field_type) or field_type
+        if origin_type in (list, dict, set, tuple) and isinstance(value, bytes):
+            try:
+                return pickle.loads(value)
+            except pickle.UnpicklingError:
+                return value
+
+        return value
diff --git a/sqliter/sqliter.py b/sqliter/sqliter.py
@@ -503,7 +503,13 @@ def insert(
             raise RecordInsertionError(table_name) from exc
         else:
             data.pop("pk", None)
-            return model_class(pk=cursor.lastrowid, **data)
+            # Deserialize each field before creating the model instance
+            deserialized_data = {}
+            for field_name, value in data.items():
+                deserialized_data[field_name] = model_class.deserialize_field(
+                    field_name, value, return_local_time=self.return_local_time
+                )
+            return model_class(pk=cursor.lastrowid, **deserialized_data)
 
     def get(
         self, model_class: type[BaseDBModel], primary_key_value: int
@@ -540,7 +546,17 @@ def get(
                     field: result[idx]
                     for idx, field in enumerate(model_class.model_fields)
                 }
-                return model_class(**result_dict)
+                # Deserialize each field before creating the model instance
+                deserialized_data = {}
+                for field_name, value in result_dict.items():
+                    deserialized_data[field_name] = (
+                        model_class.deserialize_field(
+                            field_name,
+                            value,
+                            return_local_time=self.return_local_time,
+                        )
+                    )
+                return model_class(**deserialized_data)
         except sqlite3.Error as exc:
             raise RecordFetchError(table_name) from exc
         else: