Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: mount the sqlite database as filesystem #195

Open
QiangF opened this issue Jan 12, 2025 · 15 comments
Open

Idea: mount the sqlite database as filesystem #195

QiangF opened this issue Jan 12, 2025 · 15 comments

Comments

@QiangF
Copy link

QiangF commented Jan 12, 2025

This is for the fulltext search capability. Since we have many package for that, eg. rg.el.
I am wondering if we can mount the database with:

https://github.com/Airsequel/SQLiteDAV

https://github.com/adamobeng/wddbfs

And by doing that, will we have fulltext search using existing package?

@ahyatt
Copy link
Owner

ahyatt commented Jan 12, 2025

It's an interesting idea, thanks for mentioning it! I think I probably should do the obvious solution and release a full text search in the triples package; which I've alreayd implemented. The only issue is that I've seen some issues with it missing things it should be getting, and was suspicious that I could actually fix them. But I think it'd be the most straightforward solution, and should be very fast. Let me try to release as an alpha feature in the next few weeks.

The idea of mounting the sqlite database is interesting, and a similar idea struck me when reading #193; maybe those who want to use the filesystem and ekg at the same time could just mount ekg as a filesystem. I'm not sure if many people would want to do that, though.

@QiangF
Copy link
Author

QiangF commented Jan 19, 2025

Maybe keep the note as separate files and only manage tags in a sqlite database. TMSU is interesting
https://github.com/oniony/TMSU

@QiangF
Copy link
Author

QiangF commented Jan 19, 2025

@ahyatt
Copy link
Owner

ahyatt commented Jan 21, 2025

Having files be stored outside the db presents some complications to the design. I think having ekg be completely in a database gives some nice benefits and distinguishes it from org-roam, which is popular, and already a filesystem/db hybrid.

I've been trying out the version with search, and it seems to be working for what I've tried so far. I'll work on releasing it soon.

@QiangF
Copy link
Author

QiangF commented Jan 21, 2025

@QiangF
Copy link
Author

QiangF commented Jan 22, 2025

NotDeft has good fulltext search. But it lacks in supporting tags.
hasu/notdeft#32

@ahyatt
Copy link
Owner

ahyatt commented Jan 24, 2025

How important is it to be able to specifically query by tags along with other text? That is, to be able to query tag:baseball score instead of just baseball score?

@QiangF
Copy link
Author

QiangF commented Jan 25, 2025

It's will be a unique feature, a very nice feature for large number of notes. denote and notdeft do not have that feature.

@ahyatt
Copy link
Owner

ahyatt commented Jan 28, 2025

I've merged a new sqlite-FTS driven query functionality into the develop branch. It's dependent on a new triples version released 2 days ago. I'll release this in a few days, but if you'd like to try it out and give feedback, that could help!

@QiangF
Copy link
Author

QiangF commented Jan 31, 2025

There is just one functionality missing: non English support.
Eastern asian language like chinese doesn't use space for word separation, there is a sqlite extension that can generate tokens for these languages: https://github.com/wangfenjin/simple
Is it possible to add that extension to ekg?

@ahyatt
Copy link
Owner

ahyatt commented Feb 1, 2025

Great point! Unfortunately, emacs doesn't allow this extension. I can bring this up with the emacs maintainers, although I don't know the area well: for example, if this is in Emacs, is there other equivalents like Japanese that also need tokenizer and also should be included?

In light of this, I wonder if there's some other thing we should do which is to specifically make another non-triple-based fts5 that allows Emacs to tokenize and do other things (such as expanding inlines) that would be useful for ekg. If we did have something like this, would you be able to tokenize in elisp?

@QiangF
Copy link
Author

QiangF commented Feb 1, 2025

Yes, Japanese and Korean also need a tokenizer, the same tokenizer will work with all CJK languages with their dictionaries. Looks like the most used is jieba,and we need external program to do it, because elisp is too slow. Here are some related links.
https://github.com/kisaragi-hiu/emacs-jieba
https://github.com/kanglmf/emacs-chinese-word-segmentation
https://github.com/xuchunyang/chinese-word-at-point.el
https://lists.gnu.org/archive/html/emacs-devel/2020-12/msg02003.html
https://github.com/yanyiwu/cppjieba

@ahyatt
Copy link
Owner

ahyatt commented Feb 1, 2025

Thanks for the links, @QiangF. Do you think it's worth it for me to release the fts branch I mentioned before? Would you be able to use it at all? Otherwise, maybe I should work on this other model, which is basically incompatible with the way the triples library supports search.

One other thing I've thought of is to to support search by just SQL query, so strict grep-style string matching, which could build up a notes buffer as you type your query. That would support any language, but the search wouldn't be ranked at all, so it's limited in use.

@QiangF
Copy link
Author

QiangF commented Feb 2, 2025

Yes, I think a "non-triple-based fts5" is the way to go, which makes us more free.
Ironically, what the FSF has done is to put self made shackles on developers and users.
But we have to adapt to reality. I have asked the "simple" project to dual license with GPL, will that be enough to make it way to the Emacs code base?
wangfenjin/simple#172

@QiangF
Copy link
Author

QiangF commented Feb 3, 2025

The author of the simple extension has agreed to dual license the code. I have created a pull request at wangfenjin/simple#173

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants