-
Notifications
You must be signed in to change notification settings - Fork 0
/
bank_processor.Rmd
68 lines (57 loc) · 1.29 KB
/
bank_processor.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
jupyter:
jupytext:
text_representation:
extension: .Rmd
format_name: rmarkdown
format_version: '1.2'
jupytext_version: 1.5.1
kernelspec:
display_name: Python (3.7 ocr)
language: python
name: py3.7_ocr
---
```{python}
try:
import tabula
except:
# !pip install tabula
try:
from PyPDF2 import PdfFileReader
except:
# !pip install pypdf2
```
```{python}
import tabula
import pandas as pd
import os
from PyPDF2 import PdfFileReader
```
```{python}
folders = ["Google Drive", "mortgage", "Carrigan", "Strata"] #["Dropbox", "personal", "accounts", "bank", "downloads"]
root = "~"
name = "Depreciation report final-2013-10-11 NLD Consulting" #"statement-5813532321604827-18Jul01" #"ViewEStatementDetail (1)"
ext = "pdf"
file_path = os.path.expanduser(os.path.join(root, *folders, "{}.{}".format(name, ext)))
print(file_path)
```
```{python}
with open(file_path, 'rb') as f:
pdf_obj = PdfFileReader(f)
info = pdf_obj.getDocumentInfo()
page_count = pdf_obj.numPages
page_obj = pdf_obj.getPage(0)
page_text = page_obj.extractText()
print(f"""Title: {info.title}
Pages: {page_count}
Text: {page_text}""")
```
```{python}
dfs = tabula.read_pdf(file_path, pages='93-138')
print(len(dfs))
```
```{python}
dfs[0]
```
```{python}
```