1
- PyPDFOCR
2
- ========
1
+ PyPDFOCR - Tesseract-OCR based PDF filing
2
+ =========================================
3
3
4
4
This program will help manage your scanned PDFs by doing the following:
5
5
6
- - Take a scanned PDF file and run OCR on it (using free OCR tools),
7
- generating a searchable PDF
6
+ - Take a scanned PDF file and run OCR on it (using the Tesseract OCR
7
+ software from Google), generating a searchable PDF
8
8
- Optionally, watch a folder for incoming scanned PDFs and
9
9
automatically run OCR on them
10
10
- Optionally, file the scanned PDFs into directories based on simple
11
11
keyword matching that you specify
12
- - **New: ** Evernote auto-upload and filing based on keyword search
12
+ - Evernote auto-upload and filing based on keyword search
13
+ - Email status when it files your PDF
13
14
14
15
More links:
15
16
@@ -18,6 +19,7 @@ More links:
18
19
- `Documentation @
19
20
documentup.com <http://documentup.com/virantha/pypdfocr> `__
20
21
- `Source @ github <https://www.github.com/virantha/pypdfocr >`__
22
+ - `API docs @ gitpages <http://virantha.github.com/pypdfocr/html >`__
21
23
22
24
Usage:
23
25
------
@@ -105,8 +107,8 @@ If there is any naming conflict during filing, the program will add an
105
107
underscore followed by a number to each filename, in order to avoid
106
108
overwriting files that may already be present.
107
109
108
- Evernote upload(new!) :
109
- ~~~~~~~~~~~~~~~~~~~~~~
110
+ Evernote upload:
111
+ ~~~~~~~~~~~~~~~~
110
112
111
113
Evernote authentication token
112
114
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -163,14 +165,44 @@ should just be the default Evernote upload notebook name.
163
165
receipts:
164
166
- receipt
165
167
166
- Caveats
167
- -------
168
+ Auto email
169
+ ~~~~~~~~~~
168
170
169
- This code is brand-new, and incorporation of unit-testing is just
170
- starting. I plan to improve things as time allows in the near-future.
171
- Sphinx code generation is on my TODO list. The software is distributed
172
- on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
173
- either express or implied.
171
+ You can have PyPDFOCR email you everytime it converts a file and files
172
+ it. You need to first specify the following lines in the configuration
173
+ file and then use the ``-m `` option when invoking ``pypdfocr ``:
174
+
175
+ ::
176
+
177
+ mail_smtp_server: "smtp.gmail.com:587"
178
+ mail_smtp_login: "virantha@gmail.com"
179
+ mail_smtp_password: "PASSWORD"
180
+ mail_from_addr: "virantha@gmail.com"
181
+ mail_to_list:
182
+ - "virantha@gmail.com"
183
+ - "person2@gmail.com"
184
+
185
+ Fine-tuning Tesseract/Ghostscript
186
+ ---------------------------------
187
+
188
+ At the moment, the only options allowed for Tesseract and Ghostscript
189
+ are specifying their executable locations manually. Use the following in
190
+ your configuration file:
191
+
192
+ ::
193
+
194
+ tesseract:
195
+ binary: "/usr/bin/tesseract"
196
+
197
+ ghostscript:
198
+ binary: "/usr/local/bin/gs"
199
+
200
+ Disclaimer
201
+ ----------
202
+
203
+ While test coverage is at 90% right now, Sphinx docs generation is at an
204
+ early stage. The software is distributed on an "AS IS" BASIS, WITHOUT
205
+ WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
174
206
175
207
Installation
176
208
------------
0 commit comments