Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/docs/core/data_types.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ This is the list of all primitive types supported by CocoIndex:
| *Bytes* | `bytes` | | |
| *Str* | `str` | | |
| *Bool* | `bool` | | |
| *Enum* | `str`, `cocoindex.typing.Enum()` | | |
| *Int64* | `cocoindex.Int64`, `int`, `numpy.int64` | | |
| *Float32* | `cocoindex.Float32`, `numpy.float32` | *Float64* | |
| *Float64* | `cocoindex.Float64`, `float`, `numpy.float64` | | |
Expand Down Expand Up @@ -84,6 +85,9 @@ Notes:
In Python, it's represented by `cocoindex.Json`.
It's useful to hold data without fixed schema known at flow definition time.

#### Enum Type

*Enum* represents a string-like enumerated type. In Python, use the helper from `cocoindex.typing`.

#### Vector Types

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/examples/examples/docs_to_knowledge_graph.md
Original file line number Diff line number Diff line change
Expand Up @@ -373,4 +373,4 @@ You can open it at [http://localhost:7474](http://localhost:7474), and run the f
MATCH p=()-->() RETURN p
```

![Neo4j Browser](/img/examples/docs_to_knowledge_graph/neo4j_browser.png)
![Neo4j Browser](/img/examples/docs_to_knowledge_graph/neo4j_browser.png)
4 changes: 2 additions & 2 deletions docs/docs/sources/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ In CocoIndex, a source is the data origin you import from (e.g., files, database
| [Postgres](/docs/sources/postgres) | Relational database (Postgres) |

Related:
- [Life cycle of a indexing flow](/docs/core/basics#life-cycle-of-an-indexing-flow)
- [Live Update Tutorial](/docs/tutorials/live_updates)
- [Life cycle of a indexing flow](/docs/core/basics#life-cycle-of-an-indexing-flow)
- [Live Update Tutorial](/docs/tutorials/live_updates)
for change capture mechanisms.
3 changes: 0 additions & 3 deletions docs/docs/targets/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,3 @@ You can find end-to-end examples fitting into any of supported property graphs i
* <ExampleButton href="https://github.com/cocoindex-io/cocoindex/tree/main/examples/docs_to_knowledge_graph" text="Docs to Knowledge Graph" margin="0 0 16px 0" />

* <ExampleButton href="https://github.com/cocoindex-io/cocoindex/tree/main/examples/product_recommendation" text="Product Recommendation" margin="0 0 16px 0" />



4 changes: 2 additions & 2 deletions docs/docs/targets/kuzu.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Exports data to a [Kuzu](https://kuzu.com/) graph database.

## Get Started

Read [Property Graph Targets](./index.md#property-graph-targets) for more information to get started on how it works in CocoIndex.
Read [Property Graph Targets](./index.md#property-graph-targets) for more information to get started on how it works in CocoIndex.

## Spec

Expand Down Expand Up @@ -59,4 +59,4 @@ You can then access the explorer at [http://localhost:8124](http://localhost:812
href="https://github.com/cocoindex-io/cocoindex/tree/main/examples/docs_to_knowledge_graph"
text="Docs to Knowledge Graph"
margin="16px 0 24px 0"
/>
/>
4 changes: 2 additions & 2 deletions docs/docs/targets/neo4j.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ import { ExampleButton } from '../../src/components/GitHubButton';


## Get Started
Read [Property Graph Targets](./index.md#property-graph-targets) for more information to get started on how it works in CocoIndex.
Read [Property Graph Targets](./index.md#property-graph-targets) for more information to get started on how it works in CocoIndex.


## Spec
Expand Down Expand Up @@ -59,4 +59,4 @@ If you are building multiple CocoIndex flows from different projects to neo4j, w

This way, you can clean up the data for each flow independently.

In case you need to clean up the data in the same database, you can do it manually by running `cocoindex drop <APP_TARGET>` from the project you want to clean up.
In case you need to clean up the data in the same database, you can do it manually by running `cocoindex drop <APP_TARGET>` from the project you want to clean up.
2 changes: 1 addition & 1 deletion examples/product_recommendation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Please drop [CocoIndex on Github](https://github.com/cocoindex-io/cocoindex) a s


## Prerequisite
* [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres)
* [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres)
* Install [Neo4j](https://cocoindex.io/docs/targets/neo4j)
* [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai).

Expand Down
16 changes: 16 additions & 0 deletions python/cocoindex/typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
Literal,
NamedTuple,
Protocol,
Optional,
Sequence,
TypeVar,
overload,
Self,
Expand Down Expand Up @@ -64,6 +66,19 @@ def __init__(self, key: str, value: Any):
LocalDateTime = Annotated[datetime.datetime, TypeKind("LocalDateTime")]
OffsetDateTime = Annotated[datetime.datetime, TypeKind("OffsetDateTime")]


def Enum(*, variants: Optional[Sequence[str]] = None) -> Any:
"""
String-like enumerated type. Use `variants` to hint allowed values.
Example:
color: Enum(variants=["red", "green", "blue"])
At runtime this is a plain `str`; `variants` are emitted as schema attrs.
"""
if variants is not None:
return Annotated[str, TypeKind("Enum"), TypeAttr("variants", list(variants))]
return Annotated[str, TypeKind("Enum")]


if TYPE_CHECKING:
T_co = TypeVar("T_co", covariant=True)
Dim_co = TypeVar("Dim_co", bound=int | None, covariant=True, default=None)
Expand Down Expand Up @@ -587,6 +602,7 @@ class BasicValueType:
"OffsetDateTime",
"TimeDelta",
"Json",
"Enum",
"Vector",
"Union",
]
Expand Down
104 changes: 95 additions & 9 deletions src/base/json_schema.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use crate::prelude::*;

use crate::utils::immutable::RefList;
use indexmap::IndexMap;
use schemars::schema::{
ArrayValidation, InstanceType, ObjectValidation, Schema, SchemaObject, SingleOrVec,
SubschemaValidation,
Expand Down Expand Up @@ -74,6 +74,9 @@ impl JsonSchemaBuilder {
schema::BasicValueType::Str => {
schema.instance_type = Some(SingleOrVec::Single(Box::new(InstanceType::String)));
}
schema::BasicValueType::Enum => {
schema.instance_type = Some(SingleOrVec::Single(Box::new(InstanceType::String)));
}
schema::BasicValueType::Bytes => {
schema.instance_type = Some(SingleOrVec::Single(Box::new(InstanceType::String)));
}
Expand Down Expand Up @@ -245,15 +248,34 @@ impl JsonSchemaBuilder {
field_path.prepend(&f.name),
);
if self.options.fields_always_required && f.value_type.nullable {
if let Some(instance_type) = &mut field_schema.instance_type {
let mut types = match instance_type {
SingleOrVec::Single(t) => vec![**t],
SingleOrVec::Vec(t) => std::mem::take(t),
if field_schema.enum_values.is_some() {
// Keep the enum as-is and support null via oneOf
let non_null = Schema::Object(field_schema);
let null_branch = Schema::Object(SchemaObject {
instance_type: Some(SingleOrVec::Single(Box::new(
InstanceType::Null,
))),
..Default::default()
});
field_schema = SchemaObject {
subschemas: Some(Box::new(SubschemaValidation {
one_of: Some(vec![non_null, null_branch]),
..Default::default()
})),
..Default::default()
};
types.push(InstanceType::Null);
*instance_type = SingleOrVec::Vec(types);
} else {
if let Some(instance_type) = &mut field_schema.instance_type {
let mut types = match instance_type {
SingleOrVec::Single(t) => vec![**t],
SingleOrVec::Vec(t) => std::mem::take(t),
};
types.push(InstanceType::Null);
*instance_type = SingleOrVec::Vec(types);
}
}
}

(f.name.to_string(), field_schema.into())
})
.collect(),
Expand Down Expand Up @@ -298,9 +320,26 @@ impl JsonSchemaBuilder {
enriched_value_type: &schema::EnrichedValueType,
field_path: RefList<'_, &'_ spec::FieldName>,
) -> SchemaObject {
self.for_value_type(schema_base, &enriched_value_type.typ, field_path)
}
let mut out = self.for_value_type(schema_base, &enriched_value_type.typ, field_path);

if let schema::ValueType::Basic(schema::BasicValueType::Enum) = &enriched_value_type.typ {
if let Some(variants) = enriched_value_type.attrs.get("variants") {
if let Some(arr) = variants.as_array() {
let enum_values: Vec<serde_json::Value> = arr
.iter()
.filter_map(|v| {
v.as_str().map(|s| serde_json::Value::String(s.to_string()))
})
.collect();
if !enum_values.is_empty() {
out.enum_values = Some(enum_values);
}
}
}
}

out
}
fn build_extra_instructions(&self) -> Result<Option<String>> {
if self.extra_instructions_per_field.is_empty() {
return Ok(None);
Expand Down Expand Up @@ -458,6 +497,53 @@ mod tests {
.assert_eq(&serde_json::to_string_pretty(&json_schema).unwrap());
}

#[test]
fn test_basic_types_enum_without_variants() {
let value_type = EnrichedValueType {
typ: ValueType::Basic(BasicValueType::Enum),
nullable: false,
attrs: Arc::new(BTreeMap::new()),
};
let options = create_test_options();
let result = build_json_schema(value_type, options).unwrap();
let json_schema = schema_to_json(&result.schema);

expect![[r#"
{
"type": "string"
}"#]]
.assert_eq(&serde_json::to_string_pretty(&json_schema).unwrap());
}

#[test]
fn test_basic_types_enum_with_variants() {
let mut attrs = BTreeMap::new();
attrs.insert(
"variants".to_string(),
serde_json::json!(["red", "green", "blue"]),
);

let value_type = EnrichedValueType {
typ: ValueType::Basic(BasicValueType::Enum),
nullable: false,
attrs: Arc::new(attrs),
};
let options = create_test_options();
let result = build_json_schema(value_type, options).unwrap();
let json_schema = schema_to_json(&result.schema);

expect![[r#"
{
"enum": [
"red",
"green",
"blue"
],
"type": "string"
}"#]]
.assert_eq(&serde_json::to_string_pretty(&json_schema).unwrap());
}

#[test]
fn test_basic_types_bool() {
let value_type = EnrichedValueType {
Expand Down
4 changes: 4 additions & 0 deletions src/base/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ pub enum BasicValueType {
/// String encoded in UTF-8.
Str,

/// Enumerated symbolic value.
Enum,

/// A boolean value.
Bool,

Expand Down Expand Up @@ -71,6 +74,7 @@ impl std::fmt::Display for BasicValueType {
match self {
BasicValueType::Bytes => write!(f, "Bytes"),
BasicValueType::Str => write!(f, "Str"),
BasicValueType::Enum => write!(f, "Enum"),
BasicValueType::Bool => write!(f, "Bool"),
BasicValueType::Int64 => write!(f, "Int64"),
BasicValueType::Float32 => write!(f, "Float32"),
Expand Down
2 changes: 2 additions & 0 deletions src/base/value.rs
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ impl KeyPart {
KeyPart::Bytes(Bytes::from(BASE64_STANDARD.decode(v)?))
}
BasicValueType::Str => KeyPart::Str(Arc::from(v)),
BasicValueType::Enum => KeyPart::Str(Arc::from(v)),
BasicValueType::Bool => KeyPart::Bool(v.parse()?),
BasicValueType::Int64 => KeyPart::Int64(v.parse()?),
BasicValueType::Range => {
Expand Down Expand Up @@ -1136,6 +1137,7 @@ impl BasicValue {
BasicValue::Bytes(Bytes::from(BASE64_STANDARD.decode(v)?))
}
(serde_json::Value::String(v), BasicValueType::Str) => BasicValue::Str(Arc::from(v)),
(serde_json::Value::String(v), BasicValueType::Enum) => BasicValue::Str(Arc::from(v)),
(serde_json::Value::Bool(v), BasicValueType::Bool) => BasicValue::Bool(v),
(serde_json::Value::Number(v), BasicValueType::Int64) => BasicValue::Int64(
v.as_i64()
Expand Down
1 change: 1 addition & 0 deletions src/ops/targets/kuzu.rs
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ fn basic_type_to_kuzu(basic_type: &BasicValueType) -> Result<String> {
Ok(match basic_type {
BasicValueType::Bytes => "BLOB".to_string(),
BasicValueType::Str => "STRING".to_string(),
BasicValueType::Enum => "STRING".to_string(),
BasicValueType::Bool => "BOOL".to_string(),
BasicValueType::Int64 => "INT64".to_string(),
BasicValueType::Float32 => "FLOAT".to_string(),
Expand Down
1 change: 1 addition & 0 deletions src/ops/targets/postgres.rs
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,7 @@ fn to_column_type_sql(column_type: &ValueType) -> String {
ValueType::Basic(basic_type) => match basic_type {
BasicValueType::Bytes => "bytea".into(),
BasicValueType::Str => "text".into(),
BasicValueType::Enum => "text".into(),
BasicValueType::Bool => "boolean".into(),
BasicValueType::Int64 => "bigint".into(),
BasicValueType::Float32 => "real".into(),
Expand Down
1 change: 1 addition & 0 deletions src/py/convert.rs
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ fn basic_value_from_py_object<'py>(
value::BasicValue::Bytes(Bytes::from(v.extract::<Vec<u8>>()?))
}
schema::BasicValueType::Str => value::BasicValue::Str(Arc::from(v.extract::<String>()?)),
schema::BasicValueType::Enum => value::BasicValue::Str(Arc::from(v.extract::<String>()?)),
schema::BasicValueType::Bool => value::BasicValue::Bool(v.extract::<bool>()?),
schema::BasicValueType::Int64 => value::BasicValue::Int64(v.extract::<i64>()?),
schema::BasicValueType::Float32 => value::BasicValue::Float32(v.extract::<f32>()?),
Expand Down
Loading