This application allows you to save/get/modify/delete documents with related text pieces entities in PostgreSQL database and index/search that saved document's text pieces in ElasticSearch.
-
Make sure that you have installed the latest versions of
python
andpip
on your computer. Also, you have to install Docker and Docker Compose. -
This project by default uses poetry for dependency and virtual environment management. Make sure to install it too.
-
Make sure to provide all required environment variables (via
.env
file,export
command, secrets, etc.) before running application.
-
For managing pre-commit hooks this project uses pre-commit.
-
For import sorting this project uses isort.
-
For code format checking this project uses black.
-
For type checking his project uses mypy
-
For create commits and lint commit messages this project uses commitizen. Run
make commit
to use commitizen during commits.
This project involves github actions to run all checks and unit-tests on push
to remote repository.
There are lots of useful commands in Makefile
included into this project's repo. Use make <some_command>
syntax to run each of them.
If your system doesn't support make commands - you may copy commands from Makefile
directly into terminal.
For managing migrations this project uses alembic.
-
Dockerfile already includes
alembic upgrade head
command to run all revision migrations, required by current version of application. -
Run
make upgrade
to manually upgrade database tables state. You could also manually upgrade to specific revision with.py
script (fromalembic/versions/
) by running:alembic upgrade <revision id number>
-
You could also downgrade one revision down with
make downgrade
command, to specific revision - by runningalembic downgrade <revision id number>
, or make full downgrade to initial database state with:make downgrade_full
-
To install all the required dependencies and set up a virtual environment run in the cloned repository directory use:
poetry install
You can also install project dependencies using
pip install -r requirements.txt
. -
To config pre-commit hooks for code linting, code format checking and linting commit messages run in the cloned directory:
poetry run pre-commit install
-
Build app image using
make build
-
Run Docker containers using
make up
Notes:
docker-compose.yml
specifies containers creation consideringhealthchecks
in the following order:elasticsearch -> postgresql -> web.
-
Stop and remove Docker containers using
make down
If you also want to remove log volume use
make down_volume
-
By default, web application will be accessible at http://localhost:8080, database available at http://localhost:5432 host and elasticsearch available at http://localhost:9200.
-
You can try all endpoints with SWAGGER documentation at http://localhost:8080/docs
-
Use resources with
/documents
prefix to create, read, update and delete data indocuments
database table. -
To create document you should provide values for "document_name" (must be unique) and "author". Document entity also has "document_id" - Primary key for database table, that returns as part of successful response. There will be also ElasticSearch index created (if not exists) for which "index_name" equals to created "document_id".
-
Use resources with
/text_pieces
prefix to create, read, update and delete data intext_pieces
database table. -
Each request to create new text_piece should be provided with following data in request body:
text
- required field with text data,type
- required, eithertitle
orparagraph
,page
- required, integer number of page in document, which text piece belongs,document_name
- required - link to document, which text piece belongs. Non-nullable ForeignKey to document, saved indocuments
table,meta_data
- optional field with JSON object as value, containing some metadata info about text piece.
In succesfull response you will also get:
piece_id
- Primary Key of new text piece in database table,indexed
- boolean value that show whether text piece was already indexed or not,size
- calculated length oftext
field of text piece.created_at
- timestamp for text piece object entity creation time in database. datetime.datetime() object.
-
Use resources with
/index
prefix to index and search for text pieces in Elasticsearch indices. -
Request to
/index/{index_name}/index
resource will check that index with name (document_id
) exists in ElasticSearch. If exists - all already saved text pieces in index will be removed and after that all text_pieces from PostgreSQL table, associated withdocument_id
will be indexed. For all text pieces to be indexed -indexed
field's value in database table will be updated and set totrue
. -
Request to
/index/{index_name}/search
resource will search for text pieces in ElasticSearch index with name (document_id
) if exists. Support pagination (page_num
andpage_size
). If no pagination parameters specified - returns first 15 results. Infilters
field you should specify list of filters, consists offield
,operator
andvalues
. Available text fields for search are:text
- supportmatch
operator that calculates score of relative matching andeq
that finds exact match of requested string,document_name
- supportmatch
operator that calculates score of relative matching andeq
that finds exact match of requested string,meta_data
- searcheseq
for values,type
- haseq
operator that accepts only existing text pieces types (title
orparagraph
,indexed
- haseq
and accepts onlytrue
orfalse
values.
Available countable fields for search are:
page
,size
,created_at
.
This fields are compatible with operators:
eq
,in
(array of possible values - will return result if at least one value matches) and comparations:gt
(greater than),gte
(greater than or equals),lt
(lower than),lte
(lower than or equals).If no filters provided - returns all documents in index with
index_name
.Note: Order results by descending
score
value (ifmatch
is used) and then bycreated_at
timestamp in ascending order.Returns pagination parameters (including
page_num
,page_size
andtotal
- with total number of text pieces matching query) anddata
field with list of matching text pieces in response body.
-
Use
make test
to locally run pytest checks during development. -
After all tests coverage report will be also shown.
-
Staged changes will be checked during commits via pre-commit hook.
-
All checks and tests will run on code push to remote repository as part of github actions.