Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] core: qdrant_migrator #2571

Merged
merged 5 commits into from
Nov 17, 2023
Merged

[WIP] core: qdrant_migrator #2571

merged 5 commits into from
Nov 17, 2023

Conversation

spolu
Copy link
Contributor

@spolu spolu commented Nov 16, 2023

Tested locally (log below). A document was added and another one deleted during the migration.

spolu@spolu-box ~/code/dust/core (spolu-migrate_collection) $ cargo run --bin qdrant_migrator 
Tooling to migrate Qdrant collections

Usage: qdrant_migrator <COMMAND>

Commands:
  show                  Show qdrant state
  set-shadow-write      Set `shadow_write_cluster` (!!! creates collection on `shadow_write_cluster`)
  clear-shadow-write    Clear `shadow_write_cluster` (!!! deletes collection from `shadow_write_cluster`
  migrate-shadow-write  Migrate `cluster` collection to `shadow_write_cluster`
  commit-shadow-write   Switch `shadow_write_cluster` and `cluster` (!!! moves read traffic to `shadow_write_cluster`)
  help                  Print this message or the help of the given subcommand(s)

Options:
  -h, --help  Print help
spolu@spolu-box ~/code/dust/core (spolu-migrate_collection) $ cargo run --bin qdrant_migrator set-shadow-write 4 test-migrate dedicated-0
  Finished dev [unoptimized + debuginfo] target(s) in 0.11s
   Running `target/debug/qdrant_migrator set-shadow-write 4 test-migrate dedicated-0`
[!] Error: Data source not found
spolu@spolu-box ~/code/dust/core (spolu-migrate_collection) $ cargo run --bin qdrant_migrator set-shadow-write 4 test-migrate dedicated-0
  Finished dev [unoptimized + debuginfo] target(s) in 0.10s
   Running `target/debug/qdrant_migrator set-shadow-write 4 test-migrate dedicated-0`
[!] Error: Data source not found
spolu@spolu-box ~/code/dust/core (spolu-migrate_collection) $ cargo run --bin qdrant_migrator set-shadow-write 4 test-migrate dedicated-0
"    Finished dev [unoptimized + debuginfo] target(s) in 0.10s
   Running `target/debug/qdrant_migrator set-shadow-write 4 test-migrate dedicated-0`
[!] Error: Data source not found
spolu@spolu-box ~/code/dust/core (spolu-migrate_collection) $ ^C
spolu@spolu-box ~/code/dust/core (spolu-migrate_collection) $ cargo run --bin qdrant_migrator set-shadow-write 4 "test-migrate" dedicated-0
  Finished dev [unoptimized + debuginfo] target(s) in 0.10s
   Running `target/debug/qdrant_migrator set-shadow-write 4 test-migrate dedicated-0`
[!] Error: Data source not found
spolu@spolu-box ~/code/dust/core (spolu-migrate_collection) $ cargo run --bin qdrant_migrator set-shadow-write 5 test-migrate dedicated-0
  Finished dev [unoptimized + debuginfo] target(s) in 0.11s
   Running `target/debug/qdrant_migrator set-shadow-write 5 test-migrate dedicated-0`
[✓] Created qdrant shadow_write_cluster collection: collection=ds_0fdeb4abe22bd528f9c7cb967b778c6a2fc45ba96340fd96e3c44654a9c77644 shadow_write_cluster=dedicated-0
[✓] Updated data source: collection=ds_0fdeb4abe22bd528f9c7cb967b778c6a2fc45ba96340fd96e3c44654a9c77644 cluster=main-0 shadow_write_cluster=dedicated-0
spolu@spolu-box ~/code/dust/core (spolu-migrate_collection) $ cargo run --bin qdrant_migrator migrate-shadow-write 5 test-migrate
  Finished dev [unoptimized + debuginfo] target(s) in 0.10s
   Running `target/debug/qdrant_migrator migrate-shadow-write 5 test-migrate`
[i] Migrated points: count=2 total=2
[i] Done migrating: total=2
spolu@spolu-box ~/code/dust/core (spolu-migrate_collection) $ cargo run --bin qdrant_migrator commit-shadow-write 5 test-migrate
  Finished dev [unoptimized + debuginfo] target(s) in 0.09s
   Running `target/debug/qdrant_migrator commit-shadow-write 5 test-migrate`
[i] Updated data source: collection=ds_0fdeb4abe22bd528f9c7cb967b778c6a2fc45ba96340fd96e3c44654a9c77644 cluster=dedicated-0 shadow_write_cluster=main-0
spolu@spolu-box ~/code/dust/core (spolu-migrate_collection) $ cargo run --bin qdrant_migrator clear-shadow-write 5 test-migrate
  Finished dev [unoptimized + debuginfo] target(s) in 0.11s
   Running `target/debug/qdrant_migrator clear-shadow-write 5 test-migrate`
[?] [DANGER] Are you sure you want to delete this qdrant shadow_write_cluster collection? (this is definitive) shadow_write_cluster=main-0 points_count=2 Confirm ([y]/n) ? 
[✓] Deleted qdrant shadow_write_cluster collection: collection=ds_0fdeb4abe22bd528f9c7cb967b778c6a2fc45ba96340fd96e3c44654a9c77644 shadow_write_cluster=main-0
[✓] Updated data source: collection=ds_0fdeb4abe22bd528f9c7cb967b778c6a2fc45ba96340fd96e3c44654a9c77644 cluster=dedicated-0 shadow_write_cluster=none

@spolu spolu requested a review from fontanierh November 17, 2023 15:15
fontanierh
fontanierh previously approved these changes Nov 17, 2023
Copy link
Contributor

@fontanierh fontanierh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

format!("ds_{}", self.internal_id)
}

pub async fn setup(
pub async fn update_config(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing we ever want to update is qdrant_config no ? Would it be better to only expose that ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The store function will remain the same so better align this one as it can be reused

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have kept the sore fn flexible, but only expose what you can actually change in update_config as this is the interface for the rest of the app.
It's "clear" that you can do "unsupported" things by calling store, but to me data_source functions shouldn't leave things in an inconsistent state

@@ -10,16 +12,37 @@ use serde::{Deserialize, Serialize};
pub enum QdrantCluster {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can [serde(rename_all = "kebab-case")]

static QDRANT_CLUSTER_VARIANTS: &[QdrantCluster] =
&[QdrantCluster::Main0, QdrantCluster::Dedicated0];

impl ToString for QdrantCluster {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of scope but I am suprised that we have to implement these ourselves everytime, there must be a serde (or other) thing to do this for us

},
}

/// A fictional versioning CLI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand what is fictional about it, wdym ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo cult from a lib

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol

let args = Cli::parse();

let rt = tokio::runtime::Builder::new_multi_thread()
.worker_threads(32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not leave the default ? it will pick the # of cores available on the system.
I don't think we gain anything from having 32 since we'll be running this on machines that don't have many cores.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is aligned with dust_api it's not only about the core and we know the default can be a foot gun if some things are blocking in unexpected ways

// the embeedding size (which is what happens here). May need to be revisited in
// future.
let mut credentials = Credentials::new();
credentials.insert("OPENAI_API_KEY".to_string(), "foo".to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a little bit of a code smell, but understandable if we don't want to fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is but it will error if it becomes irrelevant (eg we use another provider that require a credentials to perform this function)

project_id,
data_source_id,
} => {
// This is the most dangerous command of all as it is the only one to actually
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's accurate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the clear command the most dangerous one ?


let qdrant_client = qdrant_clients.main_client(&ds.config().qdrant_config);

// Delete collection on shadow_write_cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved 3 lines below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function actually deleting ? I thought this one was copying ?

@spolu spolu merged commit edceae6 into main Nov 17, 2023
1 check passed
@spolu spolu deleted the spolu-migrate_collection branch November 17, 2023 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants