GitHub - YashMeh/Falabella: Falabella :horse: is a content(PDF,PPT,XLSX,CSV etc.) loading and searching software that can be used to rank content based on the given keywords.

Falabella 🐴

Falabella is a content(PDF,PPT,XLSX,CSV etc..) loading and searching software that can be used to rank content based on the given keywords. It uses apache tika to parse the files and load them to a given elasticsearch server which can then be used for searching.

You can run it on Windows,MacOS or Linux (64 bit), download from here

How to run

Download the binary from here
Create a config.yaml file in the same directory and pass the configurations

services:
  elasticSearch: http://localhost:9200
  apacheTika: http://localhost:9998

# Path for which you want to index the documents
appConfig:
  filePath: ./assets/

Run the binary and it will index different kinds of documents (PDF,PPT,XLSX,CSV).
Download the elasticvue plugin(or anything similar) from here
Goto the plugin and search the keywords.

Running Demo

Setting up ElasticSearch and ApacheTika

For elasticsearch

docker run --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.12.0

For apacheTika

docker run -p 9998:9998 apache/tika:1.26

Usecases

Ranking huge number of research papers based on a certain keyword.
Seaching for keywords through different kinds of documents and all at once. and more..
Rank resume based on certain skills that you want.
Use this to find relevant information of a keyword from heterogeneous media types.

Architecture

It stores the content type, metdata and body of the documents and uses goroutines to -

Parallely process and parse files.
Concurrently loads them to elasticsearch without waiting for all the files to get parsed. If you want to read how elasticsearch rank documents you can read here.

Work Left

Add OCR service for dealing with text containing images.
Add a service to deal with audio/video files.
~~Add tests~~ .

Yash Mehrotra

if(repo.isAwesome || repo.isHelpful){
    StarRepo();
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
assets		assets
config		config
indexer		indexer
parser		parser
service		service
testdir		testdir
ui		ui
utils		utils
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Falabella 🐴

How to run

Running Demo

Setting up ElasticSearch and ApacheTika

Usecases

Architecture

Work Left

Yash Mehrotra

About

Releases 1

Languages

YashMeh/Falabella

Folders and files

Latest commit

History

Repository files navigation

Falabella 🐴

How to run

Running Demo

Setting up ElasticSearch and ApacheTika

Usecases

Architecture

Work Left

Yash Mehrotra

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Languages