Falabella is a content(PDF,PPT,XLSX,CSV etc..) loading and searching software that can be used to rank content based on the given keywords. It uses apache tika to parse the files and load them to a given elasticsearch server which can then be used for searching.
You can run it on Windows,MacOS or Linux (64 bit), download from here
- Download the binary from here
- Create a config.yaml file in the same directory and pass the configurations
services:
elasticSearch: http://localhost:9200
apacheTika: http://localhost:9998
# Path for which you want to index the documents
appConfig:
filePath: ./assets/
-
Run the binary and it will index different kinds of documents (PDF,PPT,XLSX,CSV).
-
Download the elasticvue plugin(or anything similar) from here
-
Goto the plugin and search the keywords.
For elasticsearch
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.12.0
For apacheTika
docker run -p 9998:9998 apache/tika:1.26
- Ranking huge number of research papers based on a certain keyword.
- Seaching for keywords through different kinds of documents and all at once. and more..
- Rank resume based on certain skills that you want.
- Use this to find relevant information of a keyword from heterogeneous media types.
It stores the content type, metdata and body of the documents and uses goroutines to -
- Parallely process and parse files.
- Concurrently loads them to elasticsearch without waiting for all the files to get parsed. If you want to read how elasticsearch rank documents you can read here.
- Add OCR service for dealing with text containing images.
- Add a service to deal with audio/video files.
Add tests.
if(repo.isAwesome || repo.isHelpful){
StarRepo();
}