Skip to content

Commit f315b8e

Browse files
committed
Forgot to convert readme
1 parent dbcf8e2 commit f315b8e

File tree

2 files changed

+46
-15
lines changed

2 files changed

+46
-15
lines changed

MANIFEST.in

-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
11
include *.txt
22
include *.rst
3-
include LICENSE.rst
43

README.rst

+46-14
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,16 @@
1-
PyPDFOCR
2-
========
1+
PyPDFOCR - Tesseract-OCR based PDF filing
2+
=========================================
33

44
This program will help manage your scanned PDFs by doing the following:
55

6-
- Take a scanned PDF file and run OCR on it (using free OCR tools),
7-
generating a searchable PDF
6+
- Take a scanned PDF file and run OCR on it (using the Tesseract OCR
7+
software from Google), generating a searchable PDF
88
- Optionally, watch a folder for incoming scanned PDFs and
99
automatically run OCR on them
1010
- Optionally, file the scanned PDFs into directories based on simple
1111
keyword matching that you specify
12-
- **New:** Evernote auto-upload and filing based on keyword search
12+
- Evernote auto-upload and filing based on keyword search
13+
- Email status when it files your PDF
1314

1415
More links:
1516

@@ -18,6 +19,7 @@ More links:
1819
- `Documentation @
1920
documentup.com <http://documentup.com/virantha/pypdfocr>`__
2021
- `Source @ github <https://www.github.com/virantha/pypdfocr>`__
22+
- `API docs @ gitpages <http://virantha.github.com/pypdfocr/html>`__
2123

2224
Usage:
2325
------
@@ -105,8 +107,8 @@ If there is any naming conflict during filing, the program will add an
105107
underscore followed by a number to each filename, in order to avoid
106108
overwriting files that may already be present.
107109

108-
Evernote upload(new!):
109-
~~~~~~~~~~~~~~~~~~~~~~
110+
Evernote upload:
111+
~~~~~~~~~~~~~~~~
110112

111113
Evernote authentication token
112114
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -163,14 +165,44 @@ should just be the default Evernote upload notebook name.
163165
receipts:
164166
- receipt
165167

166-
Caveats
167-
-------
168+
Auto email
169+
~~~~~~~~~~
168170

169-
This code is brand-new, and incorporation of unit-testing is just
170-
starting. I plan to improve things as time allows in the near-future.
171-
Sphinx code generation is on my TODO list. The software is distributed
172-
on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
173-
either express or implied.
171+
You can have PyPDFOCR email you everytime it converts a file and files
172+
it. You need to first specify the following lines in the configuration
173+
file and then use the ``-m`` option when invoking ``pypdfocr``:
174+
175+
::
176+
177+
mail_smtp_server: "smtp.gmail.com:587"
178+
mail_smtp_login: "virantha@gmail.com"
179+
mail_smtp_password: "PASSWORD"
180+
mail_from_addr: "virantha@gmail.com"
181+
mail_to_list:
182+
- "virantha@gmail.com"
183+
- "person2@gmail.com"
184+
185+
Fine-tuning Tesseract/Ghostscript
186+
---------------------------------
187+
188+
At the moment, the only options allowed for Tesseract and Ghostscript
189+
are specifying their executable locations manually. Use the following in
190+
your configuration file:
191+
192+
::
193+
194+
tesseract:
195+
binary: "/usr/bin/tesseract"
196+
197+
ghostscript:
198+
binary: "/usr/local/bin/gs"
199+
200+
Disclaimer
201+
----------
202+
203+
While test coverage is at 90% right now, Sphinx docs generation is at an
204+
early stage. The software is distributed on an "AS IS" BASIS, WITHOUT
205+
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
174206

175207
Installation
176208
------------

0 commit comments

Comments
 (0)