Skip to content

feat: psycopg 3.1.0 update #597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jun 22, 2025
Merged

feat: psycopg 3.1.0 update #597

merged 31 commits into from
Jun 22, 2025

Conversation

Joshua-Briggs
Copy link
Member

@Joshua-Briggs Joshua-Briggs commented Jun 19, 2025

User description

features:
updated pyproject dependency group
added close and check function to postgresindex class updated old code to be compatible with latest release of psycopg added support for different index types such as flat and ivfflat added rollbacks to prevent errors from previous attempts to make sql commands


PR Type

Enhancement, Documentation


Description

  • Migrate PostgresIndex from psycopg2 to psycopg v3

  • Introduce IndexType enum for flat, hnsw, ivfflat

  • Enhance connection management and validation methods

  • Update dependency and add usage notebook docs


Changes walkthrough 📝

Relevant files
Enhancement
postgres.py
Migrate to psycopg v3 and index type support                         

semantic_router/index/postgres.py

  • Replace psycopg2 imports and connections with psycopg v3
  • Add IndexType enum (FLAT, HNSW, IVFFLAT)
  • Adjust index creation logic based on index_type
  • Implement has_connection, close, __del__, and rollbacks
  • +94/-42 
    Documentation
    postgres-sync.ipynb
    Add Postgres index usage notebook                                               

    docs/indexes/postgres-sync.ipynb

  • Add Jupyter notebook demonstrating PostgresIndex usage
  • Show environment setup, setup_index, and queries
  • Illustrate CRUD operations and route selection workflow
  • +482/-0 
    Dependencies
    pyproject.toml
    Update psycopg dependency version                                               

    pyproject.toml

    • Bump postgres dependency from psycopg2 to psycopg>=3.1
    +1/-1     

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • features:
    updated pyproject dependency group
    added close and check function to postgresindex class
    updated old code to be compatible with latest release of psycopg
    added support for different index types such as flat and ivfflat
    added rollbacks to prevent errors from previous attempts to make sql commands
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 Security concerns

    SQL injection:
    direct string interpolation in SQL queries can lead to injection vulnerabilities. Replace f-string query construction with parameterized queries using cur.execute(query, params) or similar safe methods.

    ⚡ Recommended focus areas for review

    SQL Injection Risk

    Several methods build SQL queries by interpolating variables (e.g., route_name, route_filter) directly into f-strings. This can allow SQL injection or malformed queries and should be replaced with parameterized queries.

    if not isinstance(self.conn, psycopg.Connection):
        raise TypeError("Index has not established a connection to Postgres")
    with self.conn.cursor() as cur:
        cur.execute(f"DELETE FROM {table_name} WHERE route = '{route_name}'")
        self.conn.commit()
    Error Handling

    The exception handling in _create_route_index and _create_index catches psycopg.errors.DuplicateTable and rolls back, but duplicate index errors may raise a different exception (e.g., DuplicateObject). Verify the correct exception class and ensure other errors aren’t silently swallowed.

    if not isinstance(self.conn, psycopg.Connection):
        raise TypeError("Index has not established a connection to Postgres")
    try:
        with self.conn.cursor() as cur:
            cur.execute(f"CREATE INDEX {table_name}_route_idx ON {table_name} USING btree (route);")
            self.conn.commit()
    except psycopg.errors.DuplicateTable:
        self.conn.rollback()
    except Exception:
        self.conn.rollback()
        raise
    Python Version Compatibility

    The code imports and uses StrEnum, which is only available in Python 3.11+. Confirm that the minimum Python version of the project supports this, or provide a fallback for earlier versions.

    class IndexType(StrEnum):
        FLAT = "flat"
        HNSW = "hnsw"
        IVFFLAT = "ivfflat"

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Security
    Use parameterized queries

    Directly embedding route_name in the SQL string opens the door to SQL injection. Use
    parameterized queries for any user‐supplied values to ensure safety.

    semantic_router/index/postgres.py [391]

    -cur.execute(f"DELETE FROM {table_name} WHERE route = '{route_name}'")
    +cur.execute(
    +    f"DELETE FROM {table_name} WHERE route = %s",
    +    (route_name,)
    +)
    Suggestion importance[1-10]: 9

    __

    Why: Embedding route_name directly in the SQL risks SQL injection; using parameterized queries is a critical security improvement.

    High
    General
    Select correct operator per metric

    The IVFFLAT and FLAT branches always use vector_cosine_ops regardless of the chosen
    Metric. Parameterize the operator based on self.metric to match the correct vector
    operator for dot‐product, Euclidean, and Manhattan metrics.

    semantic_router/index/postgres.py [276-284]

     elif self.index_type == IndexType.IVFFLAT:
    +    op = {
    +        Metric.COSINE: "vector_cosine_ops",
    +        Metric.DOTPRODUCT: "vector_ip_ops",
    +        Metric.EUCLIDEAN: "vector_l2_ops",
    +        Metric.MANHATTAN: "vector_l1_ops",
    +    }[self.metric]
         cur.execute(
             f"""
    -        CREATE INDEX {table_name}_vector_idx ON {table_name} USING ivfflat (vector vector_cosine_ops) WITH (lists = 100);
    +        CREATE INDEX {table_name}_vector_idx ON {table_name}
    +          USING ivfflat (vector {op}) WITH (lists = 100);
             """
         )
     elif self.index_type == IndexType.FLAT:
    -    # Create ivfflat with lists=1 for flat search
    +    op = {
    +        Metric.COSINE: "vector_cosine_ops",
    +        Metric.DOTPRODUCT: "vector_ip_ops",
    +        Metric.EUCLIDEAN: "vector_l2_ops",
    +        Metric.MANHATTAN: "vector_l1_ops",
    +    }[self.metric]
         cur.execute(
             f"""
    -        CREATE INDEX {table_name}_vector_idx ON {table_name} USING ivfflat (vector vector_cosine_ops) WITH (lists = 1);
    +        CREATE INDEX {table_name}_vector_idx ON {table_name}
    +          USING ivfflat (vector {op}) WITH (lists = 1);
             """
         )
    Suggestion importance[1-10]: 8

    __

    Why: Parameterizing the vector operator for IVFFLAT and FLAT based on self.metric fixes incorrect index creation for non‐cosine metrics, ensuring correct query behavior.

    Medium
    Update install instructions

    The pip package for psycopg v3 with the binary extension is psycopg[binary]. Update
    the install instructions to reflect the correct package name.

    semantic_router/index/postgres.py [139-142]

     except ImportError:
         raise ImportError(
    -        "Please install psycopg to use PostgresIndex. "
    -        "You can install it with: `pip install 'semantic-router[postgres]'`"
    +        "Please install psycopg[binary] to use PostgresIndex. "
    +        "You can install it with: `pip install semantic-router[postgres]`"
         )
    Suggestion importance[1-10]: 5

    __

    Why: The guidance should mention psycopg[binary] to match the new dependency, but this is a minor documentation tweak rather than functional code change.

    Low
    Possible issue
    Catch proper duplicate error

    Creating an index that already exists raises a DuplicateObject error in psycopg3,
    not DuplicateTable. Catch psycopg.errors.DuplicateObject to properly handle
    duplicate index creation.

    semantic_router/index/postgres.py [238-239]

    -except psycopg.errors.DuplicateTable:
    +except psycopg.errors.DuplicateObject:
         self.conn.rollback()
    Suggestion importance[1-10]: 7

    __

    Why: In psycopg3, DuplicateObject is raised for duplicate index creation, so catching the correct exception ensures errors are handled as intended.

    Medium

    Copy link

    codecov bot commented Jun 19, 2025

    Codecov Report

    Attention: Patch coverage is 56.03113% with 113 lines in your changes missing coverage. Please review.

    Project coverage is 75.88%. Comparing base (47797df) to head (0e4c445).
    Report is 33 commits behind head on main.

    Files with missing lines Patch % Lines
    semantic_router/index/postgres.py 55.24% 111 Missing ⚠️
    semantic_router/index/base.py 50.00% 2 Missing ⚠️
    Additional details and impacted files
    @@            Coverage Diff             @@
    ##             main     #597      +/-   ##
    ==========================================
    + Coverage   74.25%   75.88%   +1.62%     
    ==========================================
      Files          48       48              
      Lines        4374     4520     +146     
    ==========================================
    + Hits         3248     3430     +182     
    + Misses       1126     1090      -36     

    ☔ View full report in Codecov by Sentry.
    📢 Have feedback on the report? Share it here.

    🚀 New features to boost your workflow:
    • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
    • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

    @jamescalam
    Copy link
    Member

    bugbot run verbose=true

    Copy link

    cursor bot commented Jun 20, 2025

    bugbot is starting with request id serverGenReqId_a0d0cc0a-67b2-4f69-a488-f04ffa22bea6

    cursor[bot]

    This comment was marked as outdated.

    cursor[bot]

    This comment was marked as outdated.

    cursor[bot]

    This comment was marked as outdated.

    @jamescalam
    Copy link
    Member

    bugbot run

    cursor[bot]

    This comment was marked as outdated.

    @Joshua-Briggs
    Copy link
    Member Author

    bugbot run

    cursor[bot]

    This comment was marked as outdated.

    cursor[bot]

    This comment was marked as outdated.

    …s/semantic-router into josh/postgres/psycopg-update
    
    merge
    @Joshua-Briggs
    Copy link
    Member Author

    bugbot run

    cursor[bot]

    This comment was marked as outdated.

    cursor[bot]

    This comment was marked as outdated.

    cursor[bot]

    This comment was marked as outdated.

    Copy link

    @cursor cursor bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Bug: PostgresIndex Initialization Requires Manual Call

    The PostgresIndex no longer automatically creates its database table and indexes. The setup_index() call was removed from its constructor, and the new _init_index() method is not automatically invoked by the PostgresIndex constructor or the Router's _init_index_state() method. This breaks backward compatibility, requiring users to manually call _init_index() and provide dimensions when initializing PostgresIndex instances.

    semantic_router/routers/base.py#L470-L478

    self, sparse_encoder: Optional[SparseEncoder]
    ) -> Optional[SparseEncoder]:
    """Get the sparse encoder to be used for creating sparse vector embeddings.
    :param sparse_encoder: The sparse encoder to use.
    :type sparse_encoder: Optional[SparseEncoder]
    :return: The sparse encoder to use.
    :rtype: Optional[SparseEncoder]
    """

    semantic_router/index/postgres.py#L177-L181

    self.namespace = namespace
    self.conn = psycopg.connect(conninfo=self.connection_string)
    if not self.has_connection():
    raise ValueError("Index has not established a connection to Postgres")

    Fix in Cursor


    Was this report helpful? Give feedback by reacting with 👍 or 👎

    @jamescalam jamescalam merged commit 20df245 into main Jun 22, 2025
    8 of 10 checks passed
    @jamescalam jamescalam deleted the josh/postgres/psycopg-update branch June 22, 2025 09:22
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    Update postgres2 to postgres Bring PostgresIndex inline with other indexes
    2 participants