You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+11-15Lines changed: 11 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ Veri also supports creating sub sample spaces of data by default.
15
15
16
16
Veri works as a cluster that can hold a Vector Space with fixed dimension and allows easy querying of k nearest neighbour search queries and also querying a sample space to be used in a machine learning algorithm.
17
17
18
-
Veri is currently in Alpha Stage
18
+
Veri is currently in Beta Stage
19
19
20
20
*Veri means data in Turkish.*
21
21
@@ -25,7 +25,7 @@ In machine learning, data scientist usually convert data into a feature label ve
25
25
26
26
I have worked in different roles as a Data Engineer, Data scientist and a Software Developer. In many projects, I wanted a scalable approach to vector space search which is not available. I wanted to optimise the data ingestion and data querying into one tool.
27
27
28
-
Veri is meant to be scale. Each Veri instance tries to synchronise its data with other peers and keep a statistically identical subset of the general vector space.
28
+
Veri is meant to scale. Each Veri instance tries to synchronise its data with other peers and keep a statistically identical subset of the general vector space.
29
29
30
30
## What does statistically identical mean?
31
31
@@ -34,11 +34,12 @@ Every instance continue, exchanging data as long as their average and histogram
34
34
35
35
## Knn querying
36
36
37
-
Veri internally has a kd-tree, but it also queries its neighbours and merges the result. It is very similar to map-reduce process done on the fly without planning.
37
+
Veri internally has an internal key-value store, but it also queries its neighbours and merges the result.
38
+
It is very similar to map-reduce process done on the fly without planning.
38
39
39
-
When a knn query is stated, veri creates a unique id,
40
+
When a knn query is stated, veri creates a unique hash,
40
41
Starts a timer,
41
-
Then do a local kd-tree search,
42
+
Then do a local knn search locally,
42
43
Then calls its peers to do the same with a smaller timeout,
43
44
Merges results into a map,
44
45
Waits for timeout and then do a refine process on the result map,
@@ -50,19 +51,14 @@ Every knn query has a timeout and timeout defines the precision of the result. U
50
51
51
52
## High Availability
52
53
53
-
Veri has a different way of approaching high availability.
54
-
Veri as a cluster try to use all the memory it is allowed to use.
55
-
If there is enough memory, all the data is replicated to every instance.
56
-
If there is not enough memory, data is split within instances using histogram balancing.
57
-
If memory is nearly full, Veri will reject insertion requests.
58
-
So if you want more high availability, use more instances.
59
-
Currently, it is recommended to use another database for long term storage. Usually vector spaces, change over time and only the original data is kept. So I didn't implement a direct backend into it. Instead, you can regularly insert new data and evict old data. So you will keep your vector space up to date. Veri can respond queries while data being inserted or deleted, unlike most knn search systems.
54
+
Veri replicates the data to its peers periodically and data is persisted to the disk for crahes.
60
55
61
56
TODO:
62
-
- Add Dump data function to allow machine learning algorithms to get a Sample Space.
63
-
- Add Query Caching and Return Cached Result instead of rejecting result.
64
-
- Add Internal classification endpoint.
57
+
- Test multinode syncranization
65
58
- Authentication.
66
59
- Documentation.
67
60
61
+
### Note:
62
+
Veri uses [badger](https://github.com/dgraph-io/badger) internally. Many functions are made possible thanks to badger.
63
+
68
64
Contact me for any questions: berkgokden@gmail.com
0 commit comments