You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a couple of text files (UTF-8, with mostly ASCII and Cyrillic characters) which cindex/csearch ignore.
The worst problem is that I cannot tell why cindex ignores them, there is no "verbose" option to cindex. Maybe there is a character somewhere in the file cindex does not like but how do I tell?
iconv -f utf-8 -t utf-16 < text/book1.txt > /dev/null never complains so I presume the book1.txt file is valid UTF-8. But cindex excludes it from search.
codesearch version:
codesearch/oldstable,now 0.0~hg20120502-3+b11 amd64 on Debian 10.
I believe there is also a line length limit that causes files to not be indexed.
I've just tried glimpse on it. glimpseindex skips this file too, it can be forced to index it by glimpseindex -E
The are long lines somewhere in the file indeed.
$ file text/book1.txt text/book1.txt: UTF-8 Unicode text, with very long lines
I should probably grep the text for long lines and see what comes out.
You might have better luck switching to zoekt if possible.
I have a couple of text files (UTF-8, with mostly ASCII and Cyrillic characters) which cindex/csearch ignore.
The worst problem is that I cannot tell why cindex ignores them, there is no "verbose" option to cindex. Maybe there is a character somewhere in the file cindex does not like but how do I tell?
iconv -f utf-8 -t utf-16 < text/book1.txt > /dev/null
never complains so I presume the book1.txt file is valid UTF-8. But cindex excludes it from search.codesearch version:
codesearch/oldstable,now 0.0~hg20120502-3+b11 amd64 on Debian 10.
The problem may be related to #26
The text was updated successfully, but these errors were encountered: