Typesense: TODO

a) ~~Fix memory ratio (decreasing with indexing)~~ b) ~~Speed up wildcard searches further~~ c) ~~Allow int64 in default sorting field~~ d) ~~Use connection timeout for CURL rather than request timeout~~ e) Async import

Search index

~~Proper JSON as input~~
~~Storing raw JSON input to RocksDB~~
~~ART for every indexed field~~
~~Delete should remove from RocksDB~~
~~Speed up UUID generation~~
~~Make the search score computation customizable~~
~~art int search should support signed ints~~
~~Search across multiple fields~~
~~Have set inside topster itself~~
~~Persist next_seq_id~~
~~collection_id should be int, not string~~
~~API should return count~~
~~Fix documents.jsonl path in tests~~
~~Multi field search tests~~
~~storage key prefix should include collection name~~
~~Index and search on multi-valued field~~
~~range search for art_int~~
~~Restore records as well on restart (like for meta)~~
~~drop collection should remove all records from the store~~
~~Multi-key binary search during scoring~~
~~Assumption that all tokens match for scoring is no longer true~~
~~Filters~~
~~Facets~~
~~Schema validation during insertion (missing fields + type errors)~~
~~Proper score field for ranking tokens~~
~~Throw errors when schema is broken~~
~~Desc/Asc ordering with tests~~
~~Found count is wrong~~
~~Filter query in the API~~
~~Facet limit (hardcode to top 10)~~
~~Deprecate old split function~~
~~Multiple facets not working~~
~~Search snippet with highlight~~
~~Snippet should only be around surrounding matching tokens~~
~~Proper pagination~~
~~Pagination parameter~~
~~Drop collection API~~
~~JSONP response~~
~~"error":"Not found." is sent when query has no hits~~
~~Fix API response codes~~
~~List all collections~~
~~Fetch an individual document~~
~~ID field should be a string: must validate~~
~~Number of records in collection~~
~~Test for asc/desc upper/lower casing~~
~~Test for search without any sort_by given~~
~~Test for collection creation validation~~
~~Test for delete document~~
~~art float search~~
~~When prefix=true, use default_sorting_field for token ordering only for last word~~
~~only last token should be prefix searched~~
~~Prefix-search strings should not be null terminated~~
~~sort results by float field~~
~~json::parse must be wrapped in try catch~~
~~Collection Manager collections map should store plain collection name~~
~~init_collection of Collection manager should probably take seq_id as param~~
~~node score should be int32, no longer uint16 like in document struct~~
~~Typo in prefix search~~
~~When field of "id" but not string, what happens?~~
~~test for num_documents~~
~~test for string filter comparison: title < "foo"~~
~~Test for sorted_array::indexOf when length is 0~~
~~Test for pagination~~
~~search_fields, sort_fields and facet fields should be combined~~
~~facet fields should be indexed verbatim~~
~~change "search_by" to "query_by"~~
~~during index_in_memory() validations should be front loaded~~
~~Support default sorting field being a float~~
~~https support~~
~~Validate before string to int conversion in the http api layer~~
~~art bool support~~
~~Export collection~~
~~get collection should show schema~~
~~API key should be allowed as a GET parameter also (for JSONP)~~
~~Don't crash when the data directory is not found~~
~~When the first sequence ID is not zero, bail out~~
~~Proper status code when sequence number to fetch is bad~~
~~Replica should be read-only~~
~~string_utils::tokenize should not have max length~~
~~handle hyphens (replace them)~~
~~clean special chars before indexing~~
~~Add docs/explanation around ranking calc~~
~~UTF-8 normalization~~
~~Use rocksdb batch put for atomic insertion~~
~~Proper logging~~
~~Handle store-get() not finding a key~~
~~Deprecate converting integer to string verbatim~~
~~Deprecate union type punning~~
~~Replica server should fail when pointed to "old" master~~
~~gzip compress responses~~
~~Have a LOG(ERROR) level~~
~~Handle SIGTERM which is sent when process is killed~~
~~Use snappy compression for storage~~
~~Fix exclude_scalar early returns~~
~~Fix result ids length during grouped overrides~~
~~Fix override grouping (collate_included_ids)~~
~~Test for overriding result on second page~~
atleast 1 token match for proceeding with drop tokens
support wildcard query with filters
API for optimizing on disk storage
Jemalloc
Exact search
NOT operator support
Log operations
Parameterize replica's MAX_UPDATES_TO_SEND
NOT operator support
64K token limit
INT32_MAX validation for float field
highlight of string arrays?
test for token ranking on float field
test for float int field deletion during doc deletion
Test for snippets
Test for replication
Query token ids should match query token ordering
ID should not have "/"
Group results by field
Delete using range: https://github.com/facebook/rocksdb/wiki/Delete-A-Range-Of-Keys
Test for string utils
Prevent string copy during indexing
Minimum results should be a variable instead of blindly going with max_results
Handle searching for non-existing fields gracefully
test for same match score but different primary, secondary attr
Support nested fields via "."
Support search operators like +, - etc.
Space sensitivity
Use bitmap index instead of compressed array for doc list?
Primary_rank_scores and secondary_rank_scores hashmaps should be combined?
d-ary heap?

API

Support the following operations:
- ~~create a new index~~
- ~~index a single document~~
- ~~delete a document by ID~~
- ~~query an index~~
- ~~Drop an index~~
- ~~fetch a document by ID~~

Clustering

Sync every incoming write with another Typesense server

Refactoring

~~token_count in leaf is redundant: can be accessed from value~~
~~storing length in offsets is redundant: it can be found by looking up value of the next index in offset_index~~

Tech debt

~~Use GLOB file pattern for CMake (better IDE refactoring support)~~
DRY index_int64_field* methods

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO.md

TODO.md

Typesense: TODO

Files

TODO.md

Latest commit

History

TODO.md

File metadata and controls

Typesense: TODO