Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,33 @@
}
]
},
{
"group": "User guide",
"pages": [
{
"group": "Table management",
"pages": [
"user-guides/tables/index",
"user-guides/tables/create",
"user-guides/tables/schema",
"user-guides/tables/update",
"user-guides/tables/versioning",
"user-guides/tables/consistency"
]
},
{
"group": "Indexing data",
"pages": [
"user-guides/indexing/index",
"user-guides/indexing/vector-index",
"user-guides/indexing/fts-index",
"user-guides/indexing/scalar-index",
"user-guides/indexing/gpu-indexing",
"user-guides/indexing/reindexing"
]
}
]
},
{
"group": "API & SDK Reference",
"pages": [
Expand Down
195 changes: 195 additions & 0 deletions docs/get-started/quickstart.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
---
title: Quickstart
sidebarTitle: "Quickstart"
description: "Get started with LanceDB in minutes."
weight: 6
---
import ImportLanceDBPy from '/snippets/py/import_lancedb.mdx';
import ConnectToLanceDBPy from '/snippets/py/connect_to_lancedb.mdx';
import AddDataPy from '/snippets/py/basic_usage_add_data.mdx';

The quickest way to get going with LanceDB is with the open source version, which is file-based and
runs in-process (like SQLite). Let's get started using Python in just a few steps!

## 1. Install LanceDB

Install LanceDB in your client SDK.

<CodeGroup>
```bash Python icon=Python
pip install lancedb
```

```bash TypeScript icon=js
npm install @lancedb/lancedb
```

```bash Rust icon=Rust
cargo add lancedb
```
</CodeGroup>

## 2. Connect to a LanceDB database

Using LanceDB's open source version is as simple as running
the following import statement -- no servers needed!

<CodeGroup>
<ImportLanceDBPy/>
</CodeGroup>

### Connect to a database

Once you import LanceDB as a library, you can connect to a LanceDB database by specifying a
local file path.

<CodeGroup>
<ConnectToLanceDBPy/>
</CodeGroup>

### LanceDB Cloud or Enterprise versions

If you want a fully-managed solution, you can opt for LanceDB Cloud, which provides managed infrastructure,
security, and automatic backups. Simply replace the local path with a remote `uri`
that points to where your data is stored.

<CodeGroup >
```py Python icon=Python
db = lancedb.connect(
uri="db://your-project-slug",
api_key="your-api-key",
region="us-east-1"
)
```
</CodeGroup >

If you're operating at enormous scale and are looking for more advanced use cases beyond just search, like
feature engineering, model training and more, check out [LanceDB Enterprise](/enterprise).

## 3. Obtain some data

LanceDB uses the notion of tables, where each row represents a record, and each column represents a field
and/or its metadata. The simplest way to begin is to define a list of objects, where each object
contains a vector field (list of floats) and optional fields for metadata.

Let's look at an example. We have the following records of characters in an adventure board game. The `vector` field is
a list of floats. In the real world, these would contain hundreds of floating-point values and be generated via an embedding model, but the example below shows a simple version with just 3 values.

<CodeGroup >
```py Python icon=Python
data = [
{"id": "1", "text": "knight", "vector": [0.9, 0.4, 0.8]},
{"id": "2", "text": "ranger", "vector": [0.8, 0.4, 0.7]},
{"id": "9", "text": "priest", "vector": [0.6, 0.2, 0.6]},
{"id": "4", "text": "rogue", "vector": [0.7, 0.4, 0.7]},
]
```
</CodeGroup >

## 5. Create a table

Next, let's create a `Table` in LanceDB and ingest the data into it.
If not provided explicitly, the table infers the schema from the data you provide.
If the table already exists, you'll get an error message.

```py
table = db.create_table("adventurers", data=data)
# If the table already exists, you'll get a ValueError: Table 'my_table' already exists
```

To overwrite the existing table, you can use the `mode=overwrite` parameter.

<CodeGroup >
```py Python icon=Python
table = db.create_table("adventurers", data=data, mode="overwrite")
# No ValueError!
```
</CodeGroup >

LanceDB requires that you provide either data or a schema (e.g., PyArrow) during table creation.
You can learn more about that in the "[working with tables](/tutorials/tables)" page.

## 6. Vector search

Now, let's perform a vector similarity search. The query vector should have the same dimensionality as your data vectors. The search returns the most similar vectors based on Euclidean distance.

Our query is a vector that represents a `warrior`, which isn't in the data we ingested.
We'll find the result that's most similar to it!

<CodeGroup >
```py Python icon=Python
# Let's search for vectors similar to "warrior"
query_vector = [0.8, 0.3, 0.8]

# Ensure you run `pip install polars` beforehand
results = table.search(query_vector).limit(2).to_polars()
```
</CodeGroup >

It's often convenient to display the query results as a table. You can output the results of a vector search query
as a Pandas or Polars DataFrame. The example below shows the output as a Polars DataFrame.

It looks like the `knight` is the most similar adventurer to the `warrior` from our query!

| id | text | vector | _distance |
| --- | --- | --- | --- |
| 1 | knight | [0.9, 0.4, 0.8] | 0.02 |
| 2 | ranger | [0.8, 0.4, 0.7] | 0.02 |

If you prefer Pandas, you can use the `to_pandas()` method to display the results as a Pandas DataFrame.

<CodeGroup>
```py Python icon=Python
results = table.search(query_vector).limit(2).to_pandas()
```
</CodeGroup>



## 7. Add data and run more queries

If you obtain more data, it's simple to add it to an existing table. In the same script or a new one, you
can connect to the LanceDB database, open an existing table, and use the `table.add` command.

```py Python icon=Python
import lancedb

# Connect to an existing database
uri = "./ex_lancedb"
db = lancedb.connect(uri)

# Open the existing table that we created earlier
table = db.open_table("my_table")

more_data = [
{"id": "7", "text": "mage", "vector": [0.6, 0.3, 0.4]},
{"id": "8", "text": "bard", "vector": [0.3, 0.8, 0.4]},
]

# Add data to table
table.add(more_data)
```

To verify that our new data was added, we can run a different query that looks for adventurers similar to `wizard`.

```py Python icon=Python
# Let's search for vectors similar to "wizard"
query_vector = [0.7, 0.3, 0.5]

results = table.search(query_vector).limit(2).to_polars()
print(results)
```

| id | text | vector | _distance |
| --- | --- | --- | --- |
| 7 | mage | [0.6, 0.3, 0.4] | 0.02 |
| 9 | priest | [0.6, 0.2, 0.6] | 0.03 |

The `mage` is the most magical of all our characters!

## Temp

here's a snippet:

<AddDataPy/>

1 change: 1 addition & 0 deletions docs/tutorials/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Explore tutorials organized by use case:
| [Retrieval]() | Learn how to use advanced retrieval techniques |
| [RAG and Agents]() | Build Retrieval-Augmented Generation (RAG) applications and agents with LanceDB. |
| [Multimodal Lakehouse]() | Explore automated multimodal feature engineering with LanceDB. |
| [Working with Tables](./tables/) | Manage LanceDB tables end-to-end: creation, ingestion, schema evolution, versioning, and consistency. |

<Tip>
Explore all the code samples in our [VectorDB Recipes](https://github.com/lancedb/vectordb-recipes) repository.
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/rag/time-travel-rag/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ is to standardize the FAA's criteria for addressing cybersecurity threats, reduc
costs and time while maintaining the same level of safety provided by current special conditions.
--------------------------------------

✅ Date-based audit complete. Results show how knowledge evolves over time. This demonstrates LanceDB's powerful [versioning capabilities](/docs/tables/consistency#versioning) for maintaining audit trails.
✅ Date-based audit complete. Results show how knowledge evolves over time. This demonstrates LanceDB's powerful [versioning capabilities](/tutorials/tables/consistency#versioning) for maintaining audit trails.


=============================================================
Expand Down Expand Up @@ -184,6 +184,6 @@ To learn more about the concepts and features used in this tutorial:
- **[RAG Fundamentals](/docs/tutorials/rag/)** - Explore other RAG techniques and applications
- **[Vector Search](/docs/search/)** - Learn about LanceDB's search capabilities
- **[Embedding Models](/docs/integrations/embedding/)** - Understand different embedding strategies
- **[Table Management](/docs/tables/)** - Master LanceDB table operations and versioning
- **[Table Management](/tutorials/tables/)** - Master LanceDB table operations and versioning
- **[Enterprise Features](/docs/enterprise/)** - Discover production-ready capabilities
- **[Performance Optimization](/docs/enterprise/benchmark)** - Learn about LanceDB's performance characteristics -->
60 changes: 60 additions & 0 deletions docs/user-guides/indexing/fts-index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
title: "Full-Text Search (FTS) Index"
sidebarTitle: "Full-Text Index"
description: "Create and tune BM25-based full-text search indexes in LanceDB."
weight: 2
aliases:
- "/docs/concepts/indexing/fts-index/"
- "/docs/concepts/indexing/fts-index"
---
import FtsIndexCreate from '/snippets/py/fts_index_create.mdx';
import FtsIndexWait from '/snippets/py/fts_index_wait.mdx';

LanceDB Cloud and Enterprise provide performant full-text search based on BM25 so you can incorporate keyword-based search into retrieval solutions.

<Note>
The `create_fts_index` API returns immediately, but index building happens asynchronously.
</Note>

## Creating FTS Indexes

<CodeGroup>
<FtsIndexCreate />
</CodeGroup>

Check FTS index status using the API:

<CodeGroup>
<FtsIndexWait />
</CodeGroup>

## Configuration Options

### FTS Parameters

| Parameter | Type | Default | Description |
|:----------|:-----|:--------|:------------|
| `with_position` | bool | `False` | Store token positions (required for phrase queries) |
| `base_tokenizer` | str | `"simple"` | Text splitting method (`simple`, `whitespace`, or `raw`) |
| `language` | str | `"English"` | Language for stemming/stop words |
| `max_token_length` | int | `40` | Maximum token size; longer tokens are omitted |
| `lower_case` | bool | `True` | Lowercase tokens |
| `stem` | bool | `True` | Apply stemming (`running` → `run`) |
| `remove_stop_words` | bool | `True` | Drop common stop words |
| `ascii_folding` | bool | `True` | Normalize accented characters |

<Note title="Key parameters">
- `max_token_length` can filter out base64 blobs or long URLs.
- Disabling `with_position` reduces index size but disables phrase queries.
- `ascii_folding` helps with international text (e.g., “café” → “cafe”).
</Note>

### Phrase Query Configuration

Enable phrase queries by setting:

| Parameter | Required Value | Purpose |
|:----------|:---------------|:--------|
| `with_position` | `True` | Track token positions for phrase matching |
| `remove_stop_words` | `False` | Preserve stop words for exact phrase matching |

51 changes: 51 additions & 0 deletions docs/user-guides/indexing/gpu-indexing.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
title: "GPU-Powered Vector Indexing in LanceDB"
sidebarTitle: "GPU Indexing"
description: "Accelerate IVF and HNSW index builds with GPU acceleration in LanceDB."
weight: 4
aliases:
- "/docs/concepts/indexing/gpu-indexing/"
- "/docs/concepts/indexing/gpu-indexing"
---
import GpuIndexCuda from '/snippets/py/gpu_index_cuda.mdx';
import GpuIndexMps from '/snippets/py/gpu_index_mps.mdx';

With LanceDB's GPU-powered indexing you can build vector indexes for billions of rows in just a few hours—dramatically improving ingestion speed.

> Internal tests show GPU indexing processing billions of vectors in under four hours.

## Automatic GPU Indexing in LanceDB Enterprise

<Info>
Automatic GPU indexing is currently available only in LanceDB Enterprise. [Contact us](mailto:contact@lancedb.com) to enable the feature.
</Info>

Whenever you call `create_index`, Enterprise automatically selects GPU resources to build IVF or HNSW indexes. Indexing is asynchronous; call `wait_for_index()` to block until completion.

## Manual GPU Indexing in LanceDB OSS

Use the Python SDK with [PyTorch ≥ 2.0](https://pytorch.org/) to manually create IVF_PQ indexes on GPUs. GPU indexing currently requires the synchronous SDK. Specify the device via the `accelerator` parameter (`"cuda"` on Linux/NVIDIA, `"mps"` on Apple Silicon).

### GPU Indexing on Linux

<CodeGroup>
<GpuIndexCuda />
</CodeGroup>

### GPU Indexing on macOS (Apple Silicon)

<CodeGroup>
<GpuIndexMps />
</CodeGroup>

## Performance Considerations

- GPU memory usage scales with `num_partitions` and vector dimension.
- Ensure GPU memory comfortably exceeds the dataset you're indexing.
- Batch size is tuned automatically based on available GPU memory.
- Larger batches further improve throughput.

## Troubleshooting

If you encounter `AssertionError: Torch not compiled with CUDA enabled`, [install a PyTorch build that includes CUDA support](https://pytorch.org/get-started/locally/).

Loading