Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Cannot connect to milvus running on a k8s cluster - TypeError: no default reduce due to non-trivial cinit #2697

Open
1 task done
andreab67 opened this issue Mar 16, 2025 · 1 comment
Labels
kind/bug Something isn't working

Comments

@andreab67
Copy link

andreab67 commented Mar 16, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

I have a setup where Milvus is deployed on my home Kubernetes cluster. the cluster has 4 nodes:

F:\Test>kubectl get nodes
NAME            STATUS   ROLES           AGE   VERSION
k8scontroller   Ready    control-plane   40h   v1.32.3
k8snode1        Ready    worker          40h   v1.32.3
k8snode2        Ready    worker          40h   v1.32.3
k8snode3        Ready    worker          40h   v1.32.3

F:\Test>kubectl get pods -n milvus
NAME                                                   READY   STATUS      RESTARTS        AGE
milvus-1742067666-minio-provisioning-rhnnc             0/1     Completed   0               17h
milvus-1742070012-attu-69f6ff669b-vgs9w                1/1     Running     0               16h
milvus-1742070012-data-coordinator-6568d8cf6c-bt7cl    1/1     Running     0               16h
milvus-1742070012-data-node-65f79c54bd-5gkcv           1/1     Running     0               16h
milvus-1742070012-etcd-0                               1/1     Running     0               16h
milvus-1742070012-etcd-1                               1/1     Running     0               16h
milvus-1742070012-etcd-2                               1/1     Running     0               16h
milvus-1742070012-etcd-pre-upgrade-d54cf               0/1     Completed   0               16h
milvus-1742070012-index-coordinator-6df4c47b99-kdwn4   1/1     Running     0               16h
milvus-1742070012-index-node-759fdf54b6-2s4m8          1/1     Running     0               16h
milvus-1742070012-kafka-controller-0                   1/1     Running     2 (3h30m ago)   16h
milvus-1742070012-minio-57768f94cc-7xb82               1/1     Running     0               16h
milvus-1742070012-minio-provisioning-4mxgh             0/1     Completed   0               16h
milvus-1742070012-proxy-76bc496d8c-phr28               1/1     Running     0               16h
milvus-1742070012-query-coordinator-767bbfb68c-gc5d7   1/1     Running     0               16h
milvus-1742070012-query-node-5f967fccd6-jxzzm          1/1     Running     0               16h
milvus-1742070012-root-coordinator-6dbf7695c5-bp7ph    1/1     Running     0               16h

I can successfully connect to attu and I can see my database from a browser in the same network.

Image

I have extracted the ca certificate of the k8s cluster and put it into a directory called CA:

 Directory of C:\CA

03/16/2025  07:07 AM    <DIR>          .
03/15/2025  04:57 PM             2,100 andrea-ca.crt
03/15/2025  07:02 PM             2,278 client.crt
03/15/2025  07:02 PM             3,292 client.key
03/16/2025  07:06 AM               576 k8s-ca.crt

As you see I tried several things on top of k8s-ca.crt.

I am not able to connect from python to milvus.

This is my code prototype:

import PyPDF2
from ebooklib import epub, ITEM_DOCUMENT
from bs4 import BeautifulSoup

import os
import sys
import grpc
from pymilvus import MilvusClient, connections, Collection, CollectionSchema, FieldSchema, DataType, utility


with open(r"C:\CA\k8s-ca.crt", "rb") as f:
    trusted_certs = f.read()

credentials = grpc.ssl_channel_credentials(root_certificates=trusted_certs)

connections.connect(
    alias="default",
    uri="https://milvus.andrea-house.com",  # Note: host and port only.
    token="admin:Milvus",
    db_name="default",
    channel_credentials=credentials
)

client = MilvusClient("default")

def read_pdf(file_path):
    """Extract text from a PDF file."""
    text = ""
    with open(file_path, "rb") as file:
        pdf_reader = PyPDF2.PdfReader(file)
        for page in pdf_reader.pages:
            page_text = page.extract_text()
            if page_text:
                text += page_text + "\n"
    return text

def read_epub(file_path):
    """Extract text from an EPUB file."""
    book = epub.read_epub(file_path)
    text = ""
    for item in book.get_items():
        # Only process document items
        if item.get_type() == ITEM_DOCUMENT:
            soup = BeautifulSoup(item.get_content(), features="html.parser")
            text += soup.get_text() + "\n"
    return text

def chunk_text(text, max_chunk_size=500):
    """
    Split the text into chunks each with a maximum number of words.
    Adjust max_chunk_size as needed.
    """
    words = text.split()
    chunks = []
    for i in range(0, len(words), max_chunk_size):
        chunk = " ".join(words[i:i + max_chunk_size])
        chunks.append(chunk)
    return chunks

def create_milvus_collection(collection_name, dim):
    """
    Create a Milvus collection with a given dimension if it doesn't exist.
    The collection includes:
      - An auto-generated primary key "id"
      - A "embedding" field to store vector embeddings
      - A "text" field to store the corresponding text chunk
    """
    if utility.has_collection(collection_name):
        collection = Collection(collection_name)
        print(f"Collection '{collection_name}' already exists.")
        return collection
    else:
        fields = [
            FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
            FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim),
            FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535)
        ]
        schema = CollectionSchema(fields, description="Book chunks and embeddings")
        collection = Collection(name=collection_name, schema=schema)
        print(f"Created collection '{collection_name}'.")
        return collection

def main(file_path):
    # Determine file extension and extract text accordingly
    ext = os.path.splitext(file_path)[1].lower()
    if ext == ".pdf":
        book_text = read_pdf(file_path)
    elif ext == ".epub":
        book_text = read_epub(file_path)
    else:
        print("Unsupported file format. Please provide a PDF or EPUB file.")
        return

    if not book_text.strip():
        print("No text was extracted from the file.")
        return

    # Chunk the book text for manageable embedding generation
    chunks = chunk_text(book_text, max_chunk_size=500)
    print(f"Extracted {len(chunks)} text chunks from the book.")

    # Load the sentence transformer model to generate embeddings
    model = SentenceTransformer("all-MiniLM-L6-v2")
    embeddings = model.encode(chunks, convert_to_numpy=True)

    # Connect to Milvus (update host/port if necessary)
    connections.connect("default", host="milvus.andrea-house.com", port="443")

    # Create (or get) a collection named "python" with the proper embedding dimension
    dim = embeddings.shape[1]
    collection = create_milvus_collection("python", dim)

    # Prepare data for insertion; note that the auto_id field ("id") is skipped
    data = [
        embeddings.tolist(),  # embedding field
        chunks                # text field
    ]

    # Insert data into the collection and flush to persist
    insert_result = collection.insert(data)
    collection.flush()
    print("Data inserted into Milvus successfully.")
    print(f"Insert result: {insert_result}")

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python load_book.py <path_to_book>")
        sys.exit(1)
    file_path = sys.argv[1]
    main(file_path)

This is my error:

Traceback (most recent call last):
  File "F:\Test\load_book.py", line 16, in <module>
    connections.connect(
  File "C:\Python311\Lib\site-packages\pymilvus\orm\connections.py", line 390, in connect
    kwargs_copy = copy.deepcopy(kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\copy.py", line 146, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\copy.py", line 146, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\copy.py", line 161, in deepcopy
    rv = reductor(4)
         ^^^^^^^^^^^
  File "<stringsource>", line 2, in grpc._cython.cygrpc.SSLChannelCredentials.__reduce_cython__
TypeError: no default __reduce__ due to non-trivial __cinit__

I have been looking into encapsulating the connection and trying to disable deep copy.

No luck...

Thanks for looking at this.

Regards

Andrea

Expected Behavior

I should be able to connect to Milvus as I do from my browser

Steps/Code To Reproduce behavior

Try to connect to a Milvus instance running inside a kubernetes cluster using an airgapped environment 

Environment details

- Hardware/Software conditions (OS, CPU, GPU, Memory): Kubernetes cluster for Milvus - Airgapped
- Method of installation (Docker, or from source): - used "milvus" chart repository https://milvus.io/docs/install_cluster-helm.md
- Milvus version (v0.3.1, or v0.4.0): 2.5.6
- Milvus configuration (Settings you made in `server_config.yaml`): I swapped the ingress controller certificate with one I generated from andrea-ca.crt - I have andrea-ca.crt added to my trusted root authority in my windows system.

Anything else?

I am using Private CAs generated by CloudFlare CFSSL - https://github.com/cloudflare/cfssl

No response

@andreab67 andreab67 added the kind/bug Something isn't working label Mar 16, 2025
@andreab67 andreab67 changed the title [Bug]: Cannot connect to milvus running on a k8s cluster - not on the cloud [Bug]: Cannot connect to milvus running on a k8s cluster - TypeError: no default reduce due to non-trivial cinit Mar 16, 2025
@XuanYang-cn
Copy link
Contributor

@andreab67 Please refer to this doc https://milvus.io/docs/tls.md

If you're going to connect to a Milvus server with TLS, here's how to connect with Milvus, you don't need to read the certificate files, just pass in the corresponding paths.

# One way TLS
connections.connect(
    ...
    secure=True,
    server_pem_path="path_to/server.pem",
    server_name="localhost"
)

# Two way TLS
connect(
...
    client_pem_path="path_to/client.pem",
    client_key_path="path_to/client.key",
...
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants