Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: hybrid router and async pinecone upgrades #493

Merged
merged 71 commits into from
Jan 13, 2025

Conversation

jamescalam
Copy link
Member

@jamescalam jamescalam commented Jan 1, 2025

Will close #492

This pull request includes several updates to the semantic_router project. The changes focus on updating dependencies, enhancing the indexing functionality, and improving the test configuration. It also resolves some previously unnoticed bugs in async PineconeIndex and pytest coverage.

Dependency Updates:

  • Updated POETRY_VERSION from 1.8.4 to 2.0.1 in multiple workflow files to ensure compatibility with the latest version of Poetry. (.github/workflows/docs.yml: [1] .github/workflows/lint.yml: [2] .github/workflows/release.yml: [3] .github/workflows/test.yml: [4]
  • Updated project version from 0.1.0.dev5 to 0.1.0.dev6 in pyproject.toml and semantic_router/__init__.py to reflect the new release. (pyproject.toml: [1] semantic_router/init.py: [2]
  • Updated pinecone-text dependency to remove the upper version limit. (pyproject.toml: pyproject.tomlL20-R20)

Indexing Enhancements:

  • Added IndexConfig class and updated BaseIndex class to include new methods such as describe and is_ready. (semantic_router/index/base.py: [1] [2]
  • Introduced build_records function in PineconeIndex to handle the creation of records for Pinecone indexing. (semantic_router/index/pinecone.py: semantic_router/index/pinecone.pyR22-R74)
  • Modified get_utterances methods across different index classes to include an include_metadata parameter. (semantic_router/index/base.py: [1] semantic_router/index/hybrid_local.py: [2] semantic_router/index/local.py: [3]

Test Configuration:

  • Added --exitfirst and --maxfail=1 options to pytest commands in Makefile to improve test failure handling. (Makefile: MakefileL15-R22)

Documentation:

  • Updated documentation to reflect the new project version and removed outdated cells from the introduction notebook. (docs/source/conf.py: [1] docs/00-introduction.ipynb: [2]

Code Cleanup:

@jamescalam jamescalam changed the title feat: add hybrid router to sync tests feat: bring hybrid router inline with semantic Jan 1, 2025
Copy link

github-actions bot commented Jan 1, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Test Coverage Completeness

Ensure that the newly added HybridRouter is thoroughly tested across all scenarios, including edge cases, to validate its integration and functionality.

def get_test_routers():
    routers = [SemanticRouter]
    if importlib.util.find_spec("pinecone_text") is not None:
        routers.append(HybridRouter)
    return routers


@pytest.mark.parametrize(
    "index_cls,router_cls",
    [
        (index, router)
        for index in get_test_indexes()
        for router in get_test_routers()
    ],
)
class TestSemanticRouter:
    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    def test_initialization(self, openai_encoder, routes, index_cls, router_cls):
        index = init_index(index_cls)
        _ = router_cls(
            encoder=openai_encoder,
            routes=routes,
            top_k=10,
            index=index,
            auto_sync="local",
        )

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    def test_second_initialization_sync(self, openai_encoder, routes, index_cls, router_cls):
        index = init_index(index_cls)
        route_layer = router_cls(
            encoder=openai_encoder, routes=routes, index=index, auto_sync="local"
        )
        if index_cls is PineconeIndex:
            time.sleep(PINECONE_SLEEP)  # allow for index to be populated
        assert route_layer.is_synced()

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    def test_second_initialization_not_synced(
        self, openai_encoder, routes, routes_2, index_cls, router_cls
    ):
        index = init_index(index_cls)
        _ = router_cls(
            encoder=openai_encoder, routes=routes, index=index, auto_sync="local"
        )
        route_layer = router_cls(
            encoder=openai_encoder, routes=routes_2, index=index
        )
        if index_cls is PineconeIndex:
            time.sleep(PINECONE_SLEEP)  # allow for index to be populated
        assert route_layer.is_synced() is False

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    def test_utterance_diff(self, openai_encoder, routes, routes_2, index_cls, router_cls):
        index = init_index(index_cls)
        _ = router_cls(
            encoder=openai_encoder, routes=routes, index=index, auto_sync="local"
        )
        route_layer_2 = router_cls(
            encoder=openai_encoder, routes=routes_2, index=index
        )
        if index_cls is PineconeIndex:
            time.sleep(PINECONE_SLEEP)  # allow for index to be populated
        diff = route_layer_2.get_utterance_diff(include_metadata=True)
        assert '+ Route 1: Hello | None | {"type": "default"}' in diff
        assert '+ Route 1: Hi | None | {"type": "default"}' in diff
        assert "- Route 1: Hello | None | {}" in diff
        assert "+ Route 2: Au revoir | None | {}" in diff
        assert "- Route 2: Hi | None | {}" in diff
        assert "+ Route 2: Bye | None | {}" in diff
        assert "+ Route 2: Goodbye | None | {}" in diff
        assert "+ Route 3: Boo | None | {}" in diff

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    def test_auto_sync_local(self, openai_encoder, routes, routes_2, index_cls, router_cls):
        if index_cls is PineconeIndex:
            # TEST LOCAL
            pinecone_index = init_index(index_cls)
            _ = router_cls(
                encoder=openai_encoder,
                routes=routes,
                index=pinecone_index,
            )
            time.sleep(PINECONE_SLEEP)  # allow for index to be populated
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes_2,
                index=pinecone_index,
                auto_sync="local",
            )
            time.sleep(PINECONE_SLEEP)  # allow for index to be populated
            assert route_layer.index.get_utterances() == [
                Utterance(route="Route 1", utterance="Hello"),
                Utterance(route="Route 2", utterance="Hi"),
            ], "The routes in the index should match the local routes"

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    def test_auto_sync_remote(self, openai_encoder, routes, routes_2, index_cls, router_cls):
        if index_cls is PineconeIndex:
            # TEST REMOTE
            pinecone_index = init_index(index_cls)
            _ = router_cls(
                encoder=openai_encoder,
                routes=routes_2,
                index=pinecone_index,
                auto_sync="local",
            )
            time.sleep(PINECONE_SLEEP)  # allow for index to be populated
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes,
                index=pinecone_index,
                auto_sync="remote",
            )
            time.sleep(PINECONE_SLEEP)  # allow for index to be populated
            assert route_layer.index.get_utterances() == [
                Utterance(route="Route 1", utterance="Hello"),
                Utterance(route="Route 2", utterance="Hi"),
            ], "The routes in the index should match the local routes"

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    def test_auto_sync_merge_force_local(
        self, openai_encoder, routes, routes_2, index_cls, router_cls
    ):
        if index_cls is PineconeIndex:
            # TEST MERGE FORCE LOCAL
            pinecone_index = init_index(index_cls)
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes,
                index=pinecone_index,
                auto_sync="local",
            )
            time.sleep(PINECONE_SLEEP)  # allow for index to be populated
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes_2,
                index=pinecone_index,
                auto_sync="merge-force-local",
            )
            time.sleep(PINECONE_SLEEP)  # allow for index to be populated
            # confirm local and remote are synced
            assert route_layer.is_synced()
            # now confirm utterances are correct
            local_utterances = route_layer.index.get_utterances()
            # we sort to ensure order is the same
            # TODO JB: there is a bug here where if we include_metadata=True it fails
            local_utterances.sort(key=lambda x: x.to_str(include_metadata=False))
            assert local_utterances == [
                Utterance(route="Route 1", utterance="Hello"),
                Utterance(route="Route 1", utterance="Hi"),
                Utterance(route="Route 2", utterance="Au revoir"),
                Utterance(route="Route 2", utterance="Bye"),
                Utterance(route="Route 2", utterance="Goodbye"),
                Utterance(route="Route 2", utterance="Hi"),
            ], "The routes in the index should match the local routes"

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    def test_auto_sync_merge_force_remote(
        self, openai_encoder, routes, routes_2, index_cls, router_cls
    ):
        if index_cls is PineconeIndex:
            # TEST MERGE FORCE LOCAL
            pinecone_index = init_index(index_cls)
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes,
                index=pinecone_index,
                auto_sync="local",
            )
            time.sleep(PINECONE_SLEEP)  # allow for index to be populated
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes_2,
                index=pinecone_index,
                auto_sync="merge-force-remote",
            )
            time.sleep(PINECONE_SLEEP)  # allow for index to be populated
            # confirm local and remote are synced
            assert route_layer.is_synced()
            # now confirm utterances are correct
            local_utterances = route_layer.index.get_utterances()
            # we sort to ensure order is the same
            local_utterances.sort(
                key=lambda x: x.to_str(include_metadata=include_metadata(index_cls))
            )
            assert local_utterances == [
                Utterance(
                    route="Route 1", utterance="Hello", metadata={"type": "default"}
                ),
                Utterance(
                    route="Route 1", utterance="Hi", metadata={"type": "default"}
                ),
                Utterance(route="Route 2", utterance="Au revoir"),
                Utterance(route="Route 2", utterance="Bye"),
                Utterance(route="Route 2", utterance="Goodbye"),
                Utterance(route="Route 2", utterance="Hi"),
                Utterance(route="Route 3", utterance="Boo"),
            ], "The routes in the index should match the local routes"

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    def test_sync(self, openai_encoder, index_cls, router_cls):
        route_layer = router_cls(
            encoder=openai_encoder,
            routes=[],
            index=init_index(index_cls),
            auto_sync=None,
        )
        route_layer.sync("remote")
        time.sleep(PINECONE_SLEEP)  # allow for index to be populated
        # confirm local and remote are synced
        assert route_layer.is_synced()

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    def test_auto_sync_merge(self, openai_encoder, routes, routes_2, index_cls, router_cls):
        if index_cls is PineconeIndex:
            # TEST MERGE
            pinecone_index = init_index(index_cls)
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes_2,
                index=pinecone_index,
                auto_sync="local",
            )
            time.sleep(PINECONE_SLEEP)  # allow for index to be populated
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes,
                index=pinecone_index,
                auto_sync="merge",
            )
            time.sleep(PINECONE_SLEEP)  # allow for index to be populated
            # confirm local and remote are synced
            assert route_layer.is_synced()
            # now confirm utterances are correct
            local_utterances = route_layer.index.get_utterances()
            # we sort to ensure order is the same
            local_utterances.sort(
                key=lambda x: x.to_str(include_metadata=include_metadata(index_cls))
            )
            assert local_utterances == [
                Utterance(
                    route="Route 1", utterance="Hello", metadata={"type": "default"}
                ),
                Utterance(
                    route="Route 1", utterance="Hi", metadata={"type": "default"}
                ),
                Utterance(route="Route 2", utterance="Au revoir"),
                Utterance(route="Route 2", utterance="Bye"),
                Utterance(route="Route 2", utterance="Goodbye"),
                Utterance(route="Route 2", utterance="Hi"),
                Utterance(route="Route 3", utterance="Boo"),
            ], "The routes in the index should match the local routes"

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    def test_sync_lock_prevents_concurrent_sync(
        self, openai_encoder, routes, index_cls, router_cls
    ):
        """Test that sync lock prevents concurrent synchronization operations"""
        index = init_index(index_cls)
        route_layer = router_cls(
            encoder=openai_encoder,
            routes=routes,
            index=index,
            auto_sync=None,
        )

        # Acquire sync lock
        route_layer.index.lock(value=True)
        if index_cls is PineconeIndex:
            time.sleep(PINECONE_SLEEP)

        # Attempt to sync while lock is held should raise exception
        with pytest.raises(Exception):
            route_layer.sync("local")

        # Release lock
        route_layer.index.lock(value=False)
        if index_cls is PineconeIndex:
            time.sleep(PINECONE_SLEEP)

        # Should succeed after lock is released
        route_layer.sync("local")
        if index_cls is PineconeIndex:
            time.sleep(PINECONE_SLEEP)
        assert route_layer.is_synced()

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    def test_sync_lock_auto_releases(self, openai_encoder, routes, index_cls, router_cls):
        """Test that sync lock is automatically released after sync operations"""
        index = init_index(index_cls)
        route_layer = router_cls(
            encoder=openai_encoder,
            routes=routes,
            index=index,
            auto_sync=None,
        )

        # Initial sync should acquire and release lock
        route_layer.sync("local")
        if index_cls is PineconeIndex:
            time.sleep(PINECONE_SLEEP)

        # Lock should be released, allowing another sync
        route_layer.sync("local")  # Should not raise exception
        if index_cls is PineconeIndex:
            time.sleep(PINECONE_SLEEP)
        assert route_layer.is_synced()

        # clear index
        route_layer.index.index.delete(namespace="", delete_all=True)


@pytest.mark.parametrize("index_cls", get_test_indexes())
class TestAsyncSemanticRouter:
    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    @pytest.mark.asyncio
    async def test_initialization(self, openai_encoder, routes, index_cls, router_cls):
        index = init_index(index_cls, init_async_index=True)
        _ = router_cls(
            encoder=openai_encoder,
            routes=routes,
            top_k=10,
            index=index,
            auto_sync="local",
        )

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    @pytest.mark.asyncio
    async def test_second_initialization_sync(self, openai_encoder, routes, index_cls, router_cls):
        index = init_index(index_cls, init_async_index=True)
        route_layer = router_cls(
            encoder=openai_encoder, routes=routes, index=index, auto_sync="local"
        )
        if index_cls is PineconeIndex:
            await asyncio.sleep(PINECONE_SLEEP)  # allow for index to be populated
        assert route_layer.async_is_synced()

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    @pytest.mark.asyncio
    async def test_second_initialization_not_synced(
        self, openai_encoder, routes, routes_2, index_cls, router_cls
    ):
        index = init_index(index_cls, init_async_index=True)
        _ = router_cls(
            encoder=openai_encoder, routes=routes, index=index, auto_sync="local"
        )
        route_layer = router_cls(
            encoder=openai_encoder, routes=routes_2, index=index
        )
        if index_cls is PineconeIndex:
            await asyncio.sleep(PINECONE_SLEEP)  # allow for index to be populated
        assert await route_layer.async_is_synced() is False

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    @pytest.mark.asyncio
    async def test_utterance_diff(self, openai_encoder, routes, routes_2, index_cls, router_cls):
        index = init_index(index_cls, init_async_index=True)
        _ = router_cls(
            encoder=openai_encoder, routes=routes, index=index, auto_sync="local"
        )
        route_layer_2 = router_cls(
            encoder=openai_encoder, routes=routes_2, index=index
        )
        if index_cls is PineconeIndex:
            await asyncio.sleep(PINECONE_SLEEP)  # allow for index to be populated
        diff = await route_layer_2.aget_utterance_diff(include_metadata=True)
        assert '+ Route 1: Hello | None | {"type": "default"}' in diff
        assert '+ Route 1: Hi | None | {"type": "default"}' in diff
        assert "- Route 1: Hello | None | {}" in diff
        assert "+ Route 2: Au revoir | None | {}" in diff
        assert "- Route 2: Hi | None | {}" in diff
        assert "+ Route 2: Bye | None | {}" in diff
        assert "+ Route 2: Goodbye | None | {}" in diff
        assert "+ Route 3: Boo | None | {}" in diff

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    @pytest.mark.asyncio
    async def test_auto_sync_local(self, openai_encoder, routes, routes_2, index_cls, router_cls):
        if index_cls is PineconeIndex:
            # TEST LOCAL
            pinecone_index = init_index(index_cls, init_async_index=True)
            _ = router_cls(
                encoder=openai_encoder,
                routes=routes,
                index=pinecone_index,
            )
            await asyncio.sleep(PINECONE_SLEEP)  # allow for index to be populated
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes_2,
                index=pinecone_index,
                auto_sync="local",
            )
            await asyncio.sleep(PINECONE_SLEEP)  # allow for index to be populated
            assert await route_layer.index.aget_utterances() == [
                Utterance(route="Route 1", utterance="Hello"),
                Utterance(route="Route 2", utterance="Hi"),
            ], "The routes in the index should match the local routes"

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    @pytest.mark.asyncio
    async def test_auto_sync_remote(self, openai_encoder, routes, routes_2, index_cls, router_cls):
        if index_cls is PineconeIndex:
            # TEST REMOTE
            pinecone_index = init_index(index_cls, init_async_index=True)
            _ = router_cls(
                encoder=openai_encoder,
                routes=routes_2,
                index=pinecone_index,
                auto_sync="local",
            )
            await asyncio.sleep(PINECONE_SLEEP)  # allow for index to be populated
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes,
                index=pinecone_index,
                auto_sync="remote",
            )
            await asyncio.sleep(PINECONE_SLEEP)  # allow for index to be populated
            assert await route_layer.index.aget_utterances() == [
                Utterance(route="Route 1", utterance="Hello"),
                Utterance(route="Route 2", utterance="Hi"),
            ], "The routes in the index should match the local routes"

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    @pytest.mark.asyncio
    async def test_auto_sync_merge_force_local(
        self, openai_encoder, routes, routes_2, index_cls, router_cls
    ):
        if index_cls is PineconeIndex:
            # TEST MERGE FORCE LOCAL
            pinecone_index = init_index(index_cls, init_async_index=True)
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes,
                index=pinecone_index,
                auto_sync="local",
            )
            await asyncio.sleep(PINECONE_SLEEP)  # allow for index to be populated
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes_2,
                index=pinecone_index,
                auto_sync="merge-force-local",
            )
            await asyncio.sleep(PINECONE_SLEEP)  # allow for index to be populated
            # confirm local and remote are synced
            assert route_layer.async_is_synced()
            # now confirm utterances are correct
            local_utterances = await route_layer.index.aget_utterances()
            # we sort to ensure order is the same
            # TODO JB: there is a bug here where if we include_metadata=True it fails
            local_utterances.sort(key=lambda x: x.to_str(include_metadata=False))
            assert local_utterances == [
                Utterance(route="Route 1", utterance="Hello"),
                Utterance(route="Route 1", utterance="Hi"),
                Utterance(route="Route 2", utterance="Au revoir"),
                Utterance(route="Route 2", utterance="Bye"),
                Utterance(route="Route 2", utterance="Goodbye"),
                Utterance(route="Route 2", utterance="Hi"),
            ], "The routes in the index should match the local routes"

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    @pytest.mark.asyncio
    async def test_auto_sync_merge_force_remote(
        self, openai_encoder, routes, routes_2, index_cls, router_cls
    ):
        if index_cls is PineconeIndex:
            # TEST MERGE FORCE LOCAL
            pinecone_index = init_index(index_cls, init_async_index=True)
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes,
                index=pinecone_index,
                auto_sync="local",
            )
            await asyncio.sleep(PINECONE_SLEEP)  # allow for index to be populated
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes_2,
                index=pinecone_index,
                auto_sync="merge-force-remote",
            )
            await asyncio.sleep(PINECONE_SLEEP)  # allow for index to be populated
            # confirm local and remote are synced
            assert route_layer.async_is_synced()
            # now confirm utterances are correct
            local_utterances = await route_layer.index.aget_utterances()
            # we sort to ensure order is the same
            local_utterances.sort(
                key=lambda x: x.to_str(include_metadata=include_metadata(index_cls))
            )
            assert local_utterances == [
                Utterance(
                    route="Route 1", utterance="Hello", metadata={"type": "default"}
                ),
                Utterance(
                    route="Route 1", utterance="Hi", metadata={"type": "default"}
                ),
                Utterance(route="Route 2", utterance="Au revoir"),
                Utterance(route="Route 2", utterance="Bye"),
                Utterance(route="Route 2", utterance="Goodbye"),
                Utterance(route="Route 2", utterance="Hi"),
                Utterance(route="Route 3", utterance="Boo"),
            ], "The routes in the index should match the local routes"

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    @pytest.mark.asyncio
    async def test_sync(self, openai_encoder, index_cls, router_cls):
        route_layer = router_cls(
            encoder=openai_encoder,
            routes=[],
            index=init_index(index_cls, init_async_index=True),
            auto_sync=None,
        )
        await route_layer.async_sync("remote")
        await asyncio.sleep(PINECONE_SLEEP)  # allow for index to be populated
        # confirm local and remote are synced
        assert await route_layer.async_is_synced()

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    @pytest.mark.asyncio
    async def test_auto_sync_merge(self, openai_encoder, routes, routes_2, index_cls, router_cls):
        if index_cls is PineconeIndex:
            # TEST MERGE
            pinecone_index = init_index(index_cls, init_async_index=True)
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes_2,
                index=pinecone_index,
                auto_sync="local",
            )
            await asyncio.sleep(PINECONE_SLEEP)  # allow for index to be populated
            route_layer = router_cls(
                encoder=openai_encoder,
                routes=routes,
                index=pinecone_index,
                auto_sync="merge",
            )
            await asyncio.sleep(PINECONE_SLEEP)  # allow for index to be populated
            # confirm local and remote are synced
            assert await route_layer.async_is_synced()
            # now confirm utterances are correct
            local_utterances = await route_layer.index.aget_utterances()
            # we sort to ensure order is the same
            local_utterances.sort(
                key=lambda x: x.to_str(include_metadata=include_metadata(index_cls))
            )
            assert local_utterances == [
                Utterance(
                    route="Route 1", utterance="Hello", metadata={"type": "default"}
                ),
                Utterance(
                    route="Route 1", utterance="Hi", metadata={"type": "default"}
                ),
                Utterance(route="Route 2", utterance="Au revoir"),
                Utterance(route="Route 2", utterance="Bye"),
                Utterance(route="Route 2", utterance="Goodbye"),
                Utterance(route="Route 2", utterance="Hi"),
                Utterance(route="Route 3", utterance="Boo"),
            ], "The routes in the index should match the local routes"

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    @pytest.mark.asyncio
    async def test_sync_lock_prevents_concurrent_sync(
        self, openai_encoder, routes, index_cls, router_cls
    ):
        """Test that sync lock prevents concurrent synchronization operations"""
        index = init_index(index_cls, init_async_index=True)
        route_layer = router_cls(
            encoder=openai_encoder,
            routes=routes,
            index=index,
            auto_sync=None,
        )

        # Acquire sync lock
        await route_layer.index.alock(value=True)
        if index_cls is PineconeIndex:
            await asyncio.sleep(PINECONE_SLEEP)

        # Attempt to sync while lock is held should raise exception
        with pytest.raises(Exception):
            await route_layer.async_sync("local")

        # Release lock
        await route_layer.index.alock(value=False)
        if index_cls is PineconeIndex:
            await asyncio.sleep(PINECONE_SLEEP)

        # Should succeed after lock is released
        await route_layer.async_sync("local")
        if index_cls is PineconeIndex:
            await asyncio.sleep(PINECONE_SLEEP)
        assert await route_layer.async_is_synced()

    @pytest.mark.skipif(
        os.environ.get("PINECONE_API_KEY") is None, reason="Pinecone API key required"
    )
    @pytest.mark.asyncio
    async def test_sync_lock_auto_releases(self, openai_encoder, routes, index_cls, router_cls):
        """Test that sync lock is automatically released after sync operations"""
        index = init_index(index_cls, init_async_index=True)
        route_layer = router_cls(
            encoder=openai_encoder,
            routes=routes,
            index=index,
            auto_sync=None,
Dynamic Router Selection Logic

Verify that the get_test_routers function correctly identifies and includes the appropriate router classes dynamically, and ensure it handles cases where dependencies are missing.

def get_test_routers():
    routers = [SemanticRouter]
    if importlib.util.find_spec("pinecone_text") is not None:
        routers.append(HybridRouter)
    return routers

Copy link

github-actions bot commented Jan 1, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Score
Possible issue
Add validation for the router_cls parameter to prevent runtime errors with unsupported classes

Ensure that router_cls is properly validated or checked before being used in test
cases to avoid potential runtime errors if an invalid or unsupported class is
passed.

tests/unit/test_sync.py [847-851]

+if not isinstance(router_cls, (SemanticRouter, HybridRouter)):
+    raise ValueError("Invalid router class provided")
 route_layer = router_cls(
     encoder=openai_encoder,
     routes=routes,
     index=index,
     auto_sync=None,
 )
Suggestion importance[1-10]: 7

Why: Adding validation for router_cls ensures that only supported classes are used, preventing potential runtime errors. This is a meaningful improvement for robustness, though the exact list of valid classes might need to be confirmed.

7
Validate index_cls before initializing to avoid invalid index initialization

Add a check to ensure index_cls is properly initialized and valid before passing it
to init_index to prevent potential initialization failures.

tests/unit/test_sync.py [264]

+if index_cls not in [PineconeIndex, QdrantIndex, LocalIndex, PostgresIndex]:
+    raise ValueError("Invalid index class provided")
 index = init_index(index_cls)
Suggestion importance[1-10]: 7

Why: Validating index_cls ensures that only supported index classes are used, reducing the risk of initialization failures. This is a practical enhancement for error prevention and robustness.

7
Validate PINECONE_SLEEP to ensure it is a positive number before using it in sleep operations

Ensure that PINECONE_SLEEP is defined and has a valid value to avoid potential
runtime errors during the sleep operation.

tests/unit/test_sync.py [268]

+if not isinstance(PINECONE_SLEEP, (int, float)) or PINECONE_SLEEP <= 0:
+    raise ValueError("PINECONE_SLEEP must be a positive number")
 time.sleep(PINECONE_SLEEP)  # allow for index to be populated
Suggestion importance[1-10]: 6

Why: Ensuring PINECONE_SLEEP is a valid positive number prevents potential runtime errors during sleep operations. While useful, this check might be redundant if PINECONE_SLEEP is already defined and controlled elsewhere in the codebase.

6
General
Validate routes and routes_2 to ensure they are not empty before initializing the router

Add a check to ensure routes and routes_2 are not empty or invalid before passing
them to the router initialization to prevent unexpected behavior.

tests/unit/test_sync.py [749-753]

+if not routes_2:
+    raise ValueError("Routes cannot be empty")
 route_layer = router_cls(
     encoder=openai_encoder,
     routes=routes_2,
     index=pinecone_index,
     auto_sync="merge-force-remote",
 )
Suggestion importance[1-10]: 6

Why: Adding a validation for routes and routes_2 prevents unexpected behavior due to empty inputs. This is a reasonable improvement for ensuring the correctness of router initialization, though its necessity depends on the broader context of the test setup.

6

@jamescalam jamescalam changed the title feat: bring hybrid router inline with semantic feat: hybrid router and async pinecone upgrades Jan 3, 2025
@jamescalam jamescalam self-assigned this Jan 4, 2025
@jamescalam jamescalam linked an issue Jan 4, 2025 that may be closed by this pull request
Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 76.02041% with 47 lines in your changes missing coverage. Please review.

Project coverage is 74.44%. Comparing base (9915053) to head (6aead63).
Report is 73 commits behind head on main.

Files with missing lines Patch % Lines
semantic_router/routers/base.py 65.95% 16 Missing ⚠️
semantic_router/routers/hybrid.py 79.68% 13 Missing ⚠️
semantic_router/index/pinecone.py 84.09% 7 Missing ⚠️
semantic_router/index/base.py 72.22% 5 Missing ⚠️
semantic_router/index/postgres.py 42.85% 4 Missing ⚠️
semantic_router/index/local.py 83.33% 1 Missing ⚠️
semantic_router/index/qdrant.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #493      +/-   ##
==========================================
- Coverage   74.84%   74.44%   -0.41%     
==========================================
  Files          43       43              
  Lines        3753     3846      +93     
==========================================
+ Hits         2809     2863      +54     
- Misses        944      983      +39     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jamescalam jamescalam merged commit fbf5a64 into main Jan 13, 2025
9 of 10 checks passed
@jamescalam jamescalam deleted the james/full-hybrid-support branch January 13, 2025 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot use evaluate or fit methods on HybridRouter Merge HybridLayer with RouteLayer
1 participant