You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+56-14Lines changed: 56 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -45,19 +45,54 @@ Before running the example see [getting started](#getting-started)
45
45
46
46
You might want to run the app directly on your machine for development purposes OR to use for example Apple GPUs (which are not supported by Docker at the moment).
47
47
48
+
### Prerequisites
49
+
48
50
To have it up and running please execute the following steps:
49
51
50
52
[Download and install Ollama](https://ollama.com/download)
51
53
[Download and install Docker](https://www.docker.com/products/docker-desktop/)
52
54
53
-
If you are on Mac or just need to have your dependencies well organized, create a [virtual python env](https://docs.python.org/3/library/venv.html):
55
+
> ### Setting Up Ollama on a Remote Host
56
+
>
57
+
> To connect to an external Ollama instance, set the environment variable: `OLLAMA_HOST=http://address:port`, e.g.:
58
+
> ```bash
59
+
> OLLAMA_HOST=http(s)://127.0.0.1:5000
60
+
>```
61
+
>
62
+
> If you want to disable the local Ollama model, use env `DISABLE_LOCAL_OLLAMA=1`, e.g.
63
+
>```bash
64
+
> DISABLE_LOCAL_OLLAMA=1 make install
65
+
>```
66
+
>**Note**: When local Ollama is disabled, ensure the required model is downloaded on the external instance.
67
+
>
68
+
> Currently, the `DISABLE_LOCAL_OLLAMA` variable cannot be used to disable Ollama in Docker. As a workaround, remove the `ollama` service from `docker-compose.yml` or `docker-compose.gpu.yml`.
69
+
>
70
+
> Support forusing the variablein Docker environments will be added in a future release.
71
+
72
+
73
+
### Clone the Repository
74
+
75
+
First, clone the repository and change current directory to it:
Be default application create [virtual python env](https://docs.python.org/3/library/venv.html): `.venv`. You can disable this functionality on local setup by adding `DISABLE_VENV=1` before running script:
54
85
55
86
```bash
56
-
python3 -m venv .venv
57
-
source .venv/bin/activate
58
-
# now you've got access to `python` and `pip` commands
To have multiple tasks runing at once - for concurrent processing please run the following command to start single worker process:
85
123
86
124
```bash
87
-
celery -A main.celery worker --loglevel=info --pool=solo &# to scale by concurrent processing please run this line as many times as many concurrent processess you want to have running
125
+
celery -A text_extract_api.tasks worker --loglevel=info --pool=solo & # to scale by concurrent processing please run this line as many times as many concurrent processess you want to have running
88
126
```
89
127
90
128
## Online demo
@@ -98,7 +136,7 @@ Open in the browser: <a href="https://demo.doctractor.com/">demo.doctractor.com<
You can use the `make install` and `make run` command to setup the Docker environment for `text-extract-api`. You can find the manual steps required to do so described below.
172
+
173
+
174
+
### Manual setup
133
175
134
176
Create `.env` file in the root directory and set the necessary environment variables. You can use the `.env.example` file as a template:
#APP_ENV=production # sets the app into prod mode, othervise dev mode with auto-reload on code changes
193
+
#APP_ENV=production # sets the app into prod mode, otherwise dev mode with auto-reload on code changes
152
194
REDIS_CACHE_URL=redis://localhost:6379/1
153
-
STORAGE_PROFILE_PATH=/storage_profiles
195
+
STORAGE_PROFILE_PATH=./storage_profiles
154
196
LLAMA_VISION_PROMPT="You are OCR. Convert image to markdown."
155
197
156
198
# CLI settings
@@ -182,7 +224,7 @@ docker-compose up --build
182
224
... for GPU support run:
183
225
184
226
```bash
185
-
docker-compose -f docker-compose.gpu.yml up --build
227
+
docker-compose -f docker-compose.gpu.yml -p text-extract-api-gpu up --build
186
228
```
187
229
188
230
**Note:** While on Mac - Docker does not support Apple GPUs. In this case you might want to run the application natively without the Docker Compose please check [how to run it natively with GPU support](#getting-started)
@@ -206,15 +248,15 @@ If the on-prem is too much hassle [ask us about the hosted/cloud edition](mailto
206
248
python3 -m venv .venv
207
249
source .venv/bin/activate
208
250
# now you've got access to `python` and `pip` within your virutal env.
209
-
pip install -r app/requirements.txt# install main project requirements
251
+
pip install -e .# install main project requirements
210
252
```
211
253
212
254
213
255
The project includes a CLI for interacting with the API. To make it work first run:
The `ocr`command can store the results using the `storage_profiles`:
266
-
-**storage_profile**: Used to save the result - the `default` profile (`/storage_profiles/default.yaml`) is used by default; if empty file is not saved
308
+
- **storage_profile**: Used to save the result - the `default` profile (`./storage_profiles/default.yaml`) is used by default;if empty file is not saved
267
309
- **storage_filename**: Outputting filename - relative path of the `root_path`setin the storage profile - by default a relative path to `/storage` folder; can use placeholders for dynamic formatting: `{file_name}`, `{file_extension}`, `{Y}`, `{mm}`, `{dd}` - for date formatting, `{HH}`, `{MM}`, `{SS}` - fortime formatting
- **ocr_cache**: Whether to cache the OCR result (true or false).
362
404
- **prompt**: When provided, will be used for Ollama processing the OCR result
363
405
- **model**: When provided along with the prompt - this model will be used for LLM processing
364
-
-**storage_profile**: Used to save the result - the `default` profile (`/storage_profiles/default.yaml`) is used by default; if empty file is not saved
406
+
- **storage_profile**: Used to save the result - the `default` profile (`./storage_profiles/default.yaml`) is used by default; if empty file is not saved
365
407
- **storage_filename**: Outputting filename - relative path of the `root_path` set in the storage profile - by default a relative path to `/storage` folder; can use placeholders for dynamic formatting: `{file_name}`, `{file_extension}`, `{Y}`, `{mm}`, `{dd}` - for date formatting, `{HH}`, `{MM}`, `{SS}` - for time formatting
0 commit comments