Skip to content

A doc searcher of the documents on the local host that is based on: Tika+OCR, ElasticSearch and Kibana

License

Notifications You must be signed in to change notification settings

zhurlik/doc-search

Repository files navigation

doc-search

A doc searcher of the documents on the local host that is based on: Tika, ElasticSearch and Kibana

ElasticSearch

Kibana

UI dashboard for Elastic Search

Tika+OCR Server

See TikaOCR
See Recursive Metadata and Conten

Scan Server

This is a Spring Boot application the main tasks of that are:

  • scanning every 1 minute the files in the special folder
  • extracting a content of the files via Tika+OCR server API
  • storing the metadata and the content of the files in the Elasticsearch

Build

  • ./gradlew clean build
  • ./gradlew clean build; docker-compose up --force-recreate --build

Docker

  • Prune unused Docker objects: docker system prune -f
  • To clear containers: docker rm -f $(docker ps -a -q)
  • To clear images: docker rmi -f $(docker images -a -q)
  • To clear volumes: docker volume rm $(docker volume ls -q)
  • To clear networks: docker network rm $(docker network ls | tail -n+2 | awk '{if($2 !~ /bridge|none|host/){ print $1 }}')

About

A doc searcher of the documents on the local host that is based on: Tika+OCR, ElasticSearch and Kibana

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published